python - pandas, apply multiple functions of multiple columns to groupby object -
i want apply multiple functions of multiple columns groupby object results in new pandas.dataframe.
i know how in seperate steps:
by_user = lasts.groupby('user') elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400) running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400) user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days')) which results in user_df being: 
however suspect there better way, like:
by_user.agg({'elapsed_days': lambda x: (x.elapsed_time * x.num_cores).sum() / 86400, 'running_days': lambda x: (x.running_time * x.num_cores).sum() / 86400}) however, doesn't work, because afaik agg() works on pandas.series.
i did find this question , answer, solutions rather ugly me, , considering answer 4 years old, there might better way now.
i think can avoid agg or apply , rather first multiple mul, div , last use groupby index aggregating sum:
lasts = pd.dataframe({'user':['a','s','d','d'], 'elapsed_time':[40000,50000,60000,90000], 'running_time':[30000,20000,30000,15000], 'num_cores':[7,8,9,4]}) print (lasts) elapsed_time num_cores running_time user 0 40000 7 30000 1 50000 8 20000 s 2 60000 9 30000 d 3 90000 4 15000 d by_user = lasts.groupby('user') elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400) print (elapsed_days) running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400) user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days')) print (user_df) elapsed_days running_days user 3.240741 2.430556 d 10.416667 3.819444 s 4.629630 1.851852 lasts = lasts.set_index('user') print (lasts[['elapsed_time','running_time']].mul(lasts['num_cores'], axis=0) .div(86400) .groupby(level=0) .sum()) elapsed_time running_time user 3.240741 2.430556 d 10.416667 3.819444 s 4.629630 1.851852
Comments
Post a Comment