Most efficient way to replace NAs in a data frame based on a subset of other row factors (using median as an estimate) in R -
i estimate values of numeric variable in data frame based on median of same variable given other factors. replace na's numeric variable these estimates.
i have data frame this:
fac1 fac2 var1 20 b 30 b 5 b b 10 . . . i have used agregate function find these medians each combination of factors:
a = 22 b = 28 b = 12 b b = 8 so na's in var1 replaced corresponding median based on combinations of factors.
i understand may done replacing missing values each subset of data individually, become tedious given more 2 factors. wondering if there more efficient ways result.
you haven't provided sample data based on question, think should work.
as @roland mentioned no need calculate median separately.
assuming dataframe df. every group (here fac1 , fac2) calculate median removing na values. further select indices has na values , replace groups median value.
df$var1[is.na(df$var1)] <- ave(df$var1,df$fac1, df$fac2, fun=function(x) median(x, na.rm = t)[is.na(df$var1)] update
on request of op adding information ave function.
the first parameter in ave 1 on want operation. here first parameter var1 want find median. other parameters following grouping variables. number. here grouping variables have fac1 , fac2. comes function want apply on our first parameter (var1) every group (fac1 , fac2) have defined in grouping variable. here every unique group finding median group.
Comments
Post a Comment