Most efficient way to replace NAs in a data frame based on a subset of other row factors (using median as an estimate) in R -
i estimate values of numeric variable in data frame based on median of same variable given other factors. replace na's numeric variable these estimates.
i have data frame this:
fac1 fac2 var1 20 b 30 b 5 b b 10 . . .
i have used agregate function find these medians each combination of factors:
a = 22 b = 28 b = 12 b b = 8
so na's in var1 replaced corresponding median based on combinations of factors.
i understand may done replacing missing values each subset of data individually, become tedious given more 2 factors. wondering if there more efficient ways result.
you haven't provided sample data based on question, think should work.
as @roland mentioned no need calculate median
separately.
assuming dataframe df
. every group (here fac1
, fac2
) calculate median removing na
values. further select indices has na
values , replace groups median value.
df$var1[is.na(df$var1)] <- ave(df$var1,df$fac1, df$fac2, fun=function(x) median(x, na.rm = t)[is.na(df$var1)]
update
on request of op adding information ave
function.
the first parameter in ave
1 on want operation. here first parameter var1
want find median
. other parameters following grouping variables. number. here grouping variables have fac1
, fac2
. comes function want apply on our first parameter (var1
) every group (fac1
, fac2
) have defined in grouping variable. here every unique group finding median
group.
Comments
Post a Comment