python 3.x - Better way to replace values in DataFrame from large dictionary -
i have written code replaces values in dataframe values frame using dictionary, , working, using on large files, dictionary can long. few thousand pairs. when uses code runs slow, , have been going out of memory on few ocations.
i convinced method of doing far optimal, , there must faster ways this. have created simple example want, slow large amounts of data. hope have simpler way this.
import pandas pd #frame data want replace 'id' name df2 df1 = pd.dataframe({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42, 51, 23, 14, 111, 134]}) #frame containing names linked ids df2 = pd.dataframe({'id' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'name' : ['id1', 'id2', 'id3', 'id4', 'id5', 'id6', 'id7', 'id8', 'id9', 'id10']}) #my current "slow" way of doing this. #starts creating dictionary df2 #need create dictionaries domain , banners tables link ids df2_dict = dict(zip(df2['id'], df2['name'])) #and uses dict replace ids name in df1 df1.replace({'id' : df2_dict}, inplace=true)
i think can use map
series
converted to_dict
- nan
if not exist value in df2
:
df1['id'] = df1.id.map(df2.set_index('id')['name'].to_dict()) print (df1) id values 0 id1 12 1 id2 32 2 id3 42 3 id4 51 4 id5 23 5 id3 14 6 id5 111 7 id9 134
or replace
, if dont exist value in df2
let original values df1
:
df1['id'] = df1.id.replace(df2.set_index('id')['name']) print (df1) id values 0 id1 12 1 id2 32 2 id3 42 3 id4 51 4 id5 23 5 id3 14 6 id5 111 7 id9 134
sample:
#frame data want replace 'id' name df2 df1 = pd.dataframe({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42, 51, 23, 14, 111, 134]}) print (df1) #frame containing names linked ids df2 = pd.dataframe({'id' : [1, 2, 3, 4, 6, 7, 8, 9, 10], 'name' : ['id1', 'id2', 'id3', 'id4', 'id6', 'id7', 'id8', 'id9', 'id10']}) print (df2) df1['new_map'] = df1.id.map(df2.set_index('id')['name'].to_dict()) df1['new_replace'] = df1.id.replace(df2.set_index('id')['name']) print (df1) id values new_map new_replace 0 1 12 id1 id1 1 2 32 id2 id2 2 3 42 id3 id3 3 4 51 id4 id4 4 5 23 nan 5 5 3 14 id3 id3 6 5 111 nan 5 7 9 134 id9 id9
Comments
Post a Comment