--- title: "Stability selection - Using stabs" author: "Benjamin Hofner" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Stability selection - Using stabs} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) required <- c("lars", "mboost") if (!all(sapply(required, function(pkg) requireNamespace(pkg, quietly = TRUE)))) knitr::opts_chunk$set(eval = FALSE) ``` **stabs** implements resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Bühlmann, 2010, [doi:10.1111/j.1467-9868.2010.00740.x](http://dx.doi.org/10.1111/j.1467-9868.2010.00740.x)) and complementarty pairs stability selection with improved error bounds (Shah & Samworth, 2013, [doi:10.1111/j.1467-9868.2011.01034.x](http://dx.doi.org/10.1111/j.1467-9868.2011.01034.x)) are implemented. The package can be combined with arbitrary user specified variable selection approaches. ## Installation - Current version (from CRAN): ```{r, eval = FALSE} install.packages("stabs") ``` - Latest development version from [GitHub](https://github.com/hofnerb/stabs): ```{r, eval = FALSE} library("devtools") install_github("hofnerb/stabs") ``` To be able to use the `install_github()` command, one needs to install **devtools** first: ```{r, eval = FALSE} install.packages("devtools") ``` ## Using stabs A simple example of how to use **stabs** with package **lars**: ```{r} library("stabs") library("lars") ## make data set available data("bodyfat", package = "TH.data") ## set seed set.seed(1234) ## lasso (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2], fitfun = lars.lasso, cutoff = 0.75, PFER = 1)) ## stepwise selection (stab.stepwise <- stabsel(x = bodyfat[, -2], y = bodyfat[,2], fitfun = lars.stepwise, cutoff = 0.75, PFER = 1)) ``` Now plot the results ```{r plot1, fig.height=7, fig.width=14, out.width="90%"} ## plot results par(mfrow = c(1, 2)) plot(stab.lasso, main = "Lasso") plot(stab.stepwise, main = "Stepwise Selection") ``` We can see that stepwise selection seems to be quite unstable, even in this low dimensional example! ### User-specified variable selection approaches To use **stabs** with user specified functions, one can specify an own `fitfun`. These need to take arguments `x` (the predictors), `y` (the outcome) and `q` the number of selected variables as defined for stability selection. Additional arguments to the variable selection method can be handled by `...`. In the function `stabsel()` these can then be specified as a named list which is given to `args.fitfun`. The `fitfun` function then needs to return a named list with two elements `selected` and `path`: * `selected` is a vector that indicates which variable was selected. * `path` is a matrix that indicates which variable was selected in which step. Each row represents one variable, the columns represent the steps. The latter is optional and only needed to draw the complete selection paths. The following example shows how `lars.lasso` is implemented: ```{r} lars.lasso ``` To see more examples simply print, e.g., `lars.stepwise`, `glmnet.lasso`, or `glmnet.lasso_maxCoef`. Please contact me if you need help to integrate your method of choice. ### Using boosting with stability selection Instead of specifying a fitting function, one can also use `stabsel` directly on computed boosting models from [mboost](https://cran.r-project.org/package=mboost). ```{r} library("stabs") library("mboost") ### low-dimensional example mod <- glmboost(DEXfat ~ ., data = bodyfat) ## compute cutoff ahead of running stabsel to see if it is a sensible ## parameter choice. ## p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept) stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1, sampling.type = "MB") ## the same: stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE) ## now run stability selection (sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB")) ``` Now let us plot the results, as paths and as maximum selection frequencies: ```{r plot2, fig.height=7, fig.width=14, out.width="90%"} opar <- par(mai = par("mai") * c(1, 1, 1, 2.7), mfrow = c(1, 2)) plot(sbody, type = "paths") plot(sbody, type = "maxsel", ymargin = 6) par(opar) ``` ## Citation To cite the package in publications please use ```{r, results='hide'} citation("stabs") ``` which will currently give you ```{r, echo = FALSE} citation("stabs") ``` To obtain BibTeX references use ```{r, eval = FALSE} toBibtex(citation("stabs")) ```