Progress indicator during pandas operations (python) -
i regularly perform pandas operations on data frames in excess of 15 million or rows , i'd love have access progress indicator particular operations.
does text based progress indicator pandas split-apply-combine operations exist?
for example, in like:
df_users.groupby(['userid', 'requestdate']).apply(feature_rollup)
where feature_rollup
involved function take many df columns , creates new user columns through various methods. these operations can take while large data frames i'd know if possible have text based output in ipython notebook updates me on progress.
so far, i've tried canonical loop progress indicators python don't interact pandas in meaningful way.
i'm hoping there's i've overlooked in pandas library/documentation allows 1 know progress of split-apply-combine. simple implementation maybe @ total number of data frame subsets upon apply
function working , report progress completed fraction of subsets.
is perhaps needs added library?
to tweak jeff's answer (and have reuseable function).
def logged_apply(g, func, *args, **kwargs): step_percentage = 100. / len(g) import sys sys.stdout.write('apply progress: 0%') sys.stdout.flush() def logging_decorator(func): def wrapper(*args, **kwargs): progress = wrapper.count * step_percentage sys.stdout.write('\033[d \033[d' * 4 + format(progress, '3.0f') + '%') sys.stdout.flush() wrapper.count += 1 return func(*args, **kwargs) wrapper.count = 0 return wrapper logged_func = logging_decorator(func) res = g.apply(logged_func, *args, **kwargs) sys.stdout.write('\033[d \033[d' * 4 + format(100., '3.0f') + '%' + '\n') sys.stdout.flush() return res
note: apply progress percentage updates inline. if function stdouts won't work.
in [11]: g = df_users.groupby(['userid', 'requestdate']) in [12]: f = feature_rollup in [13]: logged_apply(g, f) apply progress: 100% out[13]: ...
as usual can add groupby objects method:
from pandas.core.groupby import dataframegroupby dataframegroupby.logged_apply = logged_apply in [21]: g.logged_apply(f) apply progress: 100% out[21]: ...
as mentioned in comments, isn't feature core pandas interested in implementing. python allows create these many pandas objects/methods (doing quite bit of work... although should able generalise approach).
Comments
Post a Comment