loops - R take ten unique samples and break into training/test sets? -
so task break dataframe of 506 observations ten different samples of training , test sets (with replacement). i'm doing can put through model , see average mse on ten samples. far, i've got following idiotically complicated loop:
temp_train<- setnames(lapply(1:10, function(x) {x <-homeprices[sample(1:nrow(homeprices), .8*n, replace = false), ]; x }), paste0("tr_sample.", 1:10)) (i in 1:length(temp_train)) { assign(paste0("df_train_", i), as.data.frame(temp_train[i])) name<-assign(paste('df_train_', i, sep=''), x[i]) temp_test<- setnames(homeprices[-name], paste0("te_sample.", 1:10)) alpha<-assign(paste0("df_test_", i), as.data.frame(temp_test[i])) }
this loop produces df_test_2, data frame of 506 observations of 1 variable. should dataframe of 102 obvs of 13 variables, namely 102 observations not in df_train_2. question therefore what's better way works? prefer not install packages if possible since want grasp of base r.
a common (and efficient) strategy handling type of task in base r not create each individual data frame, create set of indices define partition.
for example,
x <- replicate(n = 10,expr = {sample(506,404)})
creates matrix each of ten columns filled row indices of random selection of 404 rows (80% or of 506). you'd loop through model fitting , use columns of x
select training subset of data pass model. negative indexing of same indices yield corresponding 20% testing.
this way don't have tons of copies of data frames lying about.
Comments
Post a Comment