python 3.x - Better way to replace values in DataFrame from large dictionary -


i have written code replaces values in dataframe values frame using dictionary, , working, using on large files, dictionary can long. few thousand pairs. when uses code runs slow, , have been going out of memory on few ocations.

i convinced method of doing far optimal, , there must faster ways this. have created simple example want, slow large amounts of data. hope have simpler way this.

import pandas pd  #frame data want replace 'id' name df2 df1 = pd.dataframe({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42,    51, 23, 14, 111, 134]})  #frame containing names linked ids df2 = pd.dataframe({'id' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'name' : ['id1',   'id2', 'id3', 'id4', 'id5', 'id6', 'id7', 'id8', 'id9', 'id10']})  #my current "slow" way of doing this.  #starts creating dictionary df2 #need create dictionaries domain , banners tables link ids df2_dict = dict(zip(df2['id'], df2['name']))  #and uses dict replace ids name in df1 df1.replace({'id' : df2_dict}, inplace=true) 

i think can use map series converted to_dict - nan if not exist value in df2:

df1['id'] = df1.id.map(df2.set_index('id')['name'].to_dict()) print (df1)     id  values 0  id1      12 1  id2      32 2  id3      42 3  id4      51 4  id5      23 5  id3      14 6  id5     111 7  id9     134 

or replace, if dont exist value in df2 let original values df1:

df1['id'] = df1.id.replace(df2.set_index('id')['name']) print (df1)     id  values 0  id1      12 1  id2      32 2  id3      42 3  id4      51 4  id5      23 5  id3      14 6  id5     111 7  id9     134 

sample:

#frame data want replace 'id' name df2 df1 = pd.dataframe({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42,    51, 23, 14, 111, 134]}) print (df1) #frame containing names linked ids df2 = pd.dataframe({'id' : [1, 2, 3, 4, 6, 7, 8, 9, 10], 'name' : ['id1',   'id2', 'id3', 'id4', 'id6', 'id7', 'id8', 'id9', 'id10']}) print (df2)  df1['new_map'] = df1.id.map(df2.set_index('id')['name'].to_dict()) df1['new_replace'] = df1.id.replace(df2.set_index('id')['name']) print (df1)    id  values new_map new_replace 0   1      12     id1         id1 1   2      32     id2         id2 2   3      42     id3         id3 3   4      51     id4         id4 4   5      23     nan           5 5   3      14     id3         id3 6   5     111     nan           5 7   9     134     id9         id9 

Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -