python - Add column with numbering of elements with respect to a groupby operation without loops -
i managed add column pandas dataframe internal numbering respect groups.
this input dataframe:
df = pd.dataframe({ 'name': ['name1','name2','name3','name4','name5','name6', 'name7', 'name8'], 'group':['groupb','groupb','groupb','groupa','groupa','groupa', 'groupc', 'groupc'], 'revenue':[1,2,3,4,5,6,11,22]} )
that looks that:
group name revenue 0 groupb name1 1 1 groupb name2 2 2 groupb name3 3 3 groupa name4 4 4 groupa name5 5 5 groupa name6 6 6 groupc name7 11 7 groupc name8 22
i want output 1
group name revenue group_internal_id 0 groupa name4 4 0 1 groupa name5 5 1 2 groupa name6 6 2 3 groupb name1 1 0 4 groupb name2 2 1 5 groupb name3 3 2 6 groupc name7 11 0 7 groupc name8 22 1
i managed output wanted in dataframe outdf following code:
numbering_function = lambda x: range(len(x.index)) outdf = pd.dataframe() ik, idf in df.groupby('group'): tempdf = idf.copy() tempdf['group_internal_id'] = numbering_function(tempdf) outdf = outdf.append(tempdf, ignore_index=true)
then outdf looks follow:
group name revenue group_internal_id 0 groupa name4 4 0 1 groupa name5 5 1 2 groupa name6 6 2 3 groupb name1 1 0 4 groupb name2 2 1 5 groupb name3 3 2 6 groupc name7 11 0 7 groupc name8 22 1
i find way obtain same output dataframe without using loop.
thanks!
you need cumcount
sort_values
:
df['new'] = df.groupby('group').cumcount() df = df.sort_values('group') print (df) group name revenue new 3 groupa name4 4 0 4 groupa name5 5 1 5 groupa name6 6 2 0 groupb name1 1 0 1 groupb name2 2 1 2 groupb name3 3 2 6 groupc name7 11 0 7 groupc name8 22 1
Comments
Post a Comment