loops - R take ten unique samples and break into training/test sets? -

February 15, 2012

so task break dataframe of 506 observations ten different samples of training , test sets (with replacement). i'm doing can put through model , see average mse on ten samples. far, i've got following idiotically complicated loop:

temp_train<- setnames(lapply(1:10, function(x) {x <-homeprices[sample(1:nrow(homeprices),  .8*n, replace = false), ]; x }), paste0("tr_sample.", 1:10)) (i in 1:length(temp_train)) {   assign(paste0("df_train_", i), as.data.frame(temp_train[i]))   name<-assign(paste('df_train_', i, sep=''), x[i])   temp_test<- setnames(homeprices[-name], paste0("te_sample.", 1:10))   alpha<-assign(paste0("df_test_", i), as.data.frame(temp_test[i])) }

this loop produces df_test_2, data frame of 506 observations of 1 variable. should dataframe of 102 obvs of 13 variables, namely 102 observations not in df_train_2. question therefore what's better way works? prefer not install packages if possible since want grasp of base r.

a common (and efficient) strategy handling type of task in base r not create each individual data frame, create set of indices define partition.

for example,

x <- replicate(n = 10,expr = {sample(506,404)})

creates matrix each of ten columns filled row indices of random selection of 404 rows (80% or of 506). you'd loop through model fitting , use columns of x select training subset of data pass model. negative indexing of same indices yield corresponding 20% testing.

this way don't have tons of copies of data frames lying about.

Search This Blog

CSS

loops - R take ten unique samples and break into training/test sets? -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -