calculating the real size of a python string -
first of computer spec :
memory - https://gist.github.com/vyscond/6425304
cpu - https://gist.github.com/vyscond/6425322
so morning i've tested following 2 code snippets:
code a
a = 'a' * 1000000000 and code b
a = 'a' * 10000000000 the code works fine. code b give me error message :
traceback (most recent call last): file "<stdin>", line 1, in <module> memoryerror so started researching method measuring size of data on python.
the first thing i've found classic built-in function len().
for code function len() returned value 1000000000, code b same memory error returned.
after decided more precision on tests. i've found function sys module called getsizeof(). function made same test on code a:
sys.getsizeof( 'a' * 1000000000 ) the result return 1000000037 (in bytes)
- question 1 - means
0.9313226090744gigabytes?
so checked amount of bytes of string single character 'a'
sys.getsizeof( 'a' ) the result return 38 (in bytes)
question 02 - means if need string composed of 1000000000 character
'a'result in 38 * 1000000000 = 38.000.000.000 bytes?question 03 - means need 35.390257835388 gigabytes hold string this?
i know error in reasoning! because not sense me '-'
python objects have minimal size, overhead of keeping several pieces of bookkeeping data attached object.
a python str object no exception. take @ difference between string no, one, 2 , 3 characters:
>>> import sys >>> sys.getsizeof('') 37 >>> sys.getsizeof('a') 38 >>> sys.getsizeof('aa') 39 >>> sys.getsizeof('aaa') 40 the python str object overhead 37 bytes on machine, each character in string takes 1 byte on fixed overhead.
thus, str value 1000 million characters requires 1000 million bytes + 37 bytes overhead of memory. indeed 0.931 gigabytes.
your sample code 'b' created ten times more characters, needed 10 gigabyte of memory hold 1 string, not counting rest of python, , os , whatever else might running on machine.
Comments
Post a Comment