python - pandas, apply multiple functions of multiple columns to groupby object -
i want apply multiple functions of multiple columns groupby object results in new pandas.dataframe
.
i know how in seperate steps:
by_user = lasts.groupby('user') elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400) running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400) user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days'))
which results in user_df
being:
however suspect there better way, like:
by_user.agg({'elapsed_days': lambda x: (x.elapsed_time * x.num_cores).sum() / 86400, 'running_days': lambda x: (x.running_time * x.num_cores).sum() / 86400})
however, doesn't work, because afaik agg()
works on pandas.series
.
i did find this question , answer, solutions rather ugly me, , considering answer 4 years old, there might better way now.
i think can avoid agg
or apply
, rather first multiple mul
, div
, last use groupby
index
aggregating
sum
:
lasts = pd.dataframe({'user':['a','s','d','d'], 'elapsed_time':[40000,50000,60000,90000], 'running_time':[30000,20000,30000,15000], 'num_cores':[7,8,9,4]}) print (lasts) elapsed_time num_cores running_time user 0 40000 7 30000 1 50000 8 20000 s 2 60000 9 30000 d 3 90000 4 15000 d
by_user = lasts.groupby('user') elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400) print (elapsed_days) running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400) user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days')) print (user_df) elapsed_days running_days user 3.240741 2.430556 d 10.416667 3.819444 s 4.629630 1.851852
lasts = lasts.set_index('user') print (lasts[['elapsed_time','running_time']].mul(lasts['num_cores'], axis=0) .div(86400) .groupby(level=0) .sum()) elapsed_time running_time user 3.240741 2.430556 d 10.416667 3.819444 s 4.629630 1.851852
Comments
Post a Comment