Progress indicator during pandas operations (python) -


i regularly perform pandas operations on data frames in excess of 15 million or rows , i'd love have access progress indicator particular operations.

does text based progress indicator pandas split-apply-combine operations exist?

for example, in like:

df_users.groupby(['userid', 'requestdate']).apply(feature_rollup) 

where feature_rollup involved function take many df columns , creates new user columns through various methods. these operations can take while large data frames i'd know if possible have text based output in ipython notebook updates me on progress.

so far, i've tried canonical loop progress indicators python don't interact pandas in meaningful way.

i'm hoping there's i've overlooked in pandas library/documentation allows 1 know progress of split-apply-combine. simple implementation maybe @ total number of data frame subsets upon apply function working , report progress completed fraction of subsets.

is perhaps needs added library?

to tweak jeff's answer (and have reuseable function).

def logged_apply(g, func, *args, **kwargs):     step_percentage = 100. / len(g)     import sys     sys.stdout.write('apply progress:   0%')     sys.stdout.flush()      def logging_decorator(func):         def wrapper(*args, **kwargs):             progress = wrapper.count * step_percentage             sys.stdout.write('\033[d \033[d' * 4 + format(progress, '3.0f') + '%')             sys.stdout.flush()             wrapper.count += 1             return func(*args, **kwargs)         wrapper.count = 0         return wrapper      logged_func = logging_decorator(func)     res = g.apply(logged_func, *args, **kwargs)     sys.stdout.write('\033[d \033[d' * 4 + format(100., '3.0f') + '%' + '\n')     sys.stdout.flush()     return res 

note: apply progress percentage updates inline. if function stdouts won't work.

in [11]: g = df_users.groupby(['userid', 'requestdate'])  in [12]: f = feature_rollup  in [13]: logged_apply(g, f) apply progress: 100% out[13]:  ... 

as usual can add groupby objects method:

from pandas.core.groupby import dataframegroupby dataframegroupby.logged_apply = logged_apply  in [21]: g.logged_apply(f) apply progress: 100% out[21]:  ... 

as mentioned in comments, isn't feature core pandas interested in implementing. python allows create these many pandas objects/methods (doing quite bit of work... although should able generalise approach).


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

IIS->Tomcat Redirect: multiple worker with default -