calculating the real size of a python string -
first of computer spec :
memory - https://gist.github.com/vyscond/6425304
cpu - https://gist.github.com/vyscond/6425322
so morning i've tested following 2 code snippets:
code a
a = 'a' * 1000000000
and code b
a = 'a' * 10000000000
the code works fine. code b give me error message :
traceback (most recent call last): file "<stdin>", line 1, in <module> memoryerror
so started researching method measuring size of data on python.
the first thing i've found classic built-in function len()
.
for code function len()
returned value 1000000000
, code b same memory error returned.
after decided more precision on tests. i've found function sys
module called getsizeof()
. function made same test on code a:
sys.getsizeof( 'a' * 1000000000 )
the result return 1000000037
(in bytes)
- question 1 - means
0.9313226090744
gigabytes?
so checked amount of bytes of string single character 'a'
sys.getsizeof( 'a' )
the result return 38
(in bytes)
question 02 - means if need string composed of 1000000000 character
'a'
result in 38 * 1000000000 = 38.000.000.000 bytes?question 03 - means need 35.390257835388 gigabytes hold string this?
i know error in reasoning! because not sense me '-'
python objects have minimal size, overhead of keeping several pieces of bookkeeping data attached object.
a python str
object no exception. take @ difference between string no, one, 2 , 3 characters:
>>> import sys >>> sys.getsizeof('') 37 >>> sys.getsizeof('a') 38 >>> sys.getsizeof('aa') 39 >>> sys.getsizeof('aaa') 40
the python str
object overhead 37 bytes on machine, each character in string takes 1 byte on fixed overhead.
thus, str
value 1000 million characters requires 1000 million bytes + 37 bytes overhead of memory. indeed 0.931 gigabytes.
your sample code 'b' created ten times more characters, needed 10 gigabyte of memory hold 1 string, not counting rest of python, , os , whatever else might running on machine.
Comments
Post a Comment