Most efficient way to replace NAs in a data frame based on a subset of other row factors (using median as an estimate) in R -


i estimate values of numeric variable in data frame based on median of same variable given other factors. replace na's numeric variable these estimates.

i have data frame this:

fac1   fac2   var1           20      b      30 b           5 b      b      10 . . . 

i have used agregate function find these medians each combination of factors:

a = 22 b = 28 b = 12 b b = 8 

so na's in var1 replaced corresponding median based on combinations of factors.

i understand may done replacing missing values each subset of data individually, become tedious given more 2 factors. wondering if there more efficient ways result.

you haven't provided sample data based on question, think should work.

as @roland mentioned no need calculate median separately.

assuming dataframe df. every group (here fac1 , fac2) calculate median removing na values. further select indices has na values , replace groups median value.

df$var1[is.na(df$var1)] <- ave(df$var1,df$fac1, df$fac2, fun=function(x)                                    median(x, na.rm = t)[is.na(df$var1)] 

update

on request of op adding information ave function.

the first parameter in ave 1 on want operation. here first parameter var1 want find median. other parameters following grouping variables. number. here grouping variables have fac1 , fac2. comes function want apply on our first parameter (var1) every group (fac1 , fac2) have defined in grouping variable. here every unique group finding median group.


Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -