machine learning - R mlr - Wrapper feature selection + hyperparameter tuning without nested-nested cross validation? -
in mlr, possible filter feature selection hyperparameter tuning using nested cross validation, e.g. following code.
lrn = makefilterwrapper(learner = "regr.kknn", fw.method = "chi.squared") ps = makeparamset(makediscreteparam("fw.abs", values = 10:13), makediscreteparam("k", values = c(2, 3, 4))) ctrl = maketunecontrolgrid() inner = makeresampledesc("cv", iter = 2) outer = makeresampledesc("subsample", iter = 3) lrn = maketunewrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = false) res = resample(lrn, bh.task, outer, mse, extract = gettuneresult)
but far know, not possible using wrapper feature selection, e.g.:
lrn = makefeatselwrapper(learner = "regr.kknn", ww.method = "random") # imaginary code ps = makeparamset(makediscreteparam("maxit", 15), makediscreteparam("k", values = c(2, 3, 4))) # imaginary code, no method parameter & no resampling provided ctrl = maketunecontrolgrid() inner = makeresampledesc("cv", iter = 2) outer = makeresampledesc("subsample", iter = 3) lrn = maketunewrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = false) res = resample(lrn, bh.task, outer, mse, extract = gettuneresult)
is there way achieve this? especially, in order avoid nested-nested cross validation? there methodological reason why not appropriate? because actually, using filter feature selection tuning parameter (number of features) looks quite similar wrapper approach, is, additional hyperparameter set of features, either derived filter (e.g. "chi-squared") + threshold (top 90%, 80%, 70%) or output wrapper algorithm (random, ga, exhaustive, sequential), , best set of features based on inner-cv performance in both cases.
i believe both approaches (nested additional parameters filtering , nested-nested) similar regard computing complexity, might not want reduce training dataset further nested-nested cv, , achievable first approach.
is methodological error making or lack of (probably not popular) feature?
if got right asking how tune featselwrapper
? bit complex feature selection (in mlr
) depends on resampling because tuning. don't tune learner parameters tune selection of features optimize performance measure. caluclate measure need resampling.
so propose in other words tune "feature tuning" choosing best parameter feature tuning algorithm. naturally brings layer of nested resampling.
but debatable if necessary choice of feature selection depends on available resources , other circumstances.
what can benchmark different feature selection methods:
inner = makeresampledesc("cv", iter = 2) outer = makeresampledesc("subsample", iter = 3) settings = list(random1 = makefeatselcontrolrandom(maxit = 15), random2 = makefeatselcontrolrandom(maxit = 20)) lrns = map(function(x, xn) { lrn = makefeatselwrapper(learner = "regr.lm", control = x, resampling = inner) lrn$id = paste0(lrn$id, ".", xn) lrn }, x = settings, xn = names(settings)) benchmark(lrns, bh.task, outer, list(mse, timeboth))
Comments
Post a Comment