python - pandas, apply multiple functions of multiple columns to groupby object -


i want apply multiple functions of multiple columns groupby object results in new pandas.dataframe.

i know how in seperate steps:

by_user = lasts.groupby('user') elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400) running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400) user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days')) 

which results in user_df being: user_df

however suspect there better way, like:

by_user.agg({'elapsed_days': lambda x: (x.elapsed_time * x.num_cores).sum() / 86400,               'running_days': lambda x: (x.running_time * x.num_cores).sum() / 86400}) 

however, doesn't work, because afaik agg() works on pandas.series.

i did find this question , answer, solutions rather ugly me, , considering answer 4 years old, there might better way now.

i think can avoid agg or apply , rather first multiple mul, div , last use groupby index aggregating sum:

lasts = pd.dataframe({'user':['a','s','d','d'],                    'elapsed_time':[40000,50000,60000,90000],                    'running_time':[30000,20000,30000,15000],                    'num_cores':[7,8,9,4]})  print (lasts)    elapsed_time  num_cores  running_time user 0         40000          7         30000    1         50000          8         20000    s 2         60000          9         30000    d 3         90000          4         15000    d 
by_user = lasts.groupby('user') elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400) print (elapsed_days) running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400) user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days')) print (user_df)       elapsed_days  running_days user                                     3.240741      2.430556 d        10.416667      3.819444 s         4.629630      1.851852 
lasts = lasts.set_index('user') print (lasts[['elapsed_time','running_time']].mul(lasts['num_cores'], axis=0)                                              .div(86400)                                              .groupby(level=0)                                              .sum())       elapsed_time  running_time user                                     3.240741      2.430556 d        10.416667      3.819444 s         4.629630      1.851852    

Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -