Package 'stabs' reference manual

Package 'stabs'

Title:	Stability Selection with Error Control
Description:	Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.
Authors:	Benjamin Hofner [aut, cre], Torsten Hothorn [aut]
Maintainer:	Benjamin Hofner <[email protected]>
License:	GPL-2
Version:	0.6-4
Built:	2025-03-13 03:30:51 UTC
Source:	https://github.com/hofnerb/stabs

Title:

Stability Selection with Error Control

Description:

Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.

Authors:

Benjamin Hofner [aut, cre], Torsten Hothorn [aut]

Maintainer:

Benjamin Hofner <[email protected]>

License:

GPL-2

Version:

0.6-4

Built:

2025-03-13 03:30:51 UTC

Source:

https://github.com/hofnerb/stabs

Help Index

Check if folds result from subsampling with p = 0.5.

Description

(Internal) function that checks if folds result from subsampling with p = 0.5 and adds complementary pairs if needed.

Usage

check_folds(folds, B, n, sampling.type)
check_folds(folds, B, n, sampling.type)

Arguments

`folds`	a weight matrix that represents the subsamples.
`B`	number of subsampling replicates.
`n`	the number of observations; needed for internal checks.
`sampling.type`	sampling type to be used.

Details

This is an internal function used to check if folds are specified correctly. For details (e.g. on arguments) see stabsel.

Value

A matrix containing the folds, possibly after adding the complementary pairs.

References

B. Hofner, L. Boccuto and M. Goeker (2015), Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16:144.
doi:10.1186/s12859-015-0575-3.

Fit Functions for Stability Selection

Description

Functions that fit a model until $q$ variables are selected and that returns the indices (and names) of the selected variables.

Usage

## package lars:
lars.lasso(x, y, q, ...)
lars.stepwise(x, y, q, ...)

## package glmnet:
glmnet.lasso(x, y, q, type = c("conservative", "anticonservative"), ...)
glmnet.lasso_maxCoef(x, y, q, ...)
## package lars:
lars.lasso(x, y, q, ...)
lars.stepwise(x, y, q, ...)

## package glmnet:
glmnet.lasso(x, y, q, type = c("conservative", "anticonservative"), ...)
glmnet.lasso_maxCoef(x, y, q, ...)

Arguments

`x`	a matrix containing the predictors or an object of class `"mboost"`.
`y`	a vector or matrix containing the outcome.
`q`	number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.
`type`	a charachter vector specifying if the number of selected variables per subsample is $\leq q$ (`type = "conservative"`) or $\geq q$ (`type = "anticonservative"`). The conservative version ensures that the PFER is controlled.
`...`	additional arguments passed to the underlying fitting function. See the example on `glmnet.lasso_maxCoef` in `stabsel` for the specification of additional arguments via `stabsel`.

Details

All fitting functions are named after the package and the type of model that is fitted: package_name.model, e.g., glmnet.lasso stands for a lasso model that is fitted using the package glmnet.

glmnet.lasso_maxCoef fits a lasso model with a given penalty parameter and returns the q largest coefficients. If one wants to use glmnet.lasso_maxCoef, one must specify the penalty parameter lambda (via the ... argument) or in stabsel via args.fitfun(lambda = ). Note that usually, the penalty parameter cannot be specified but is chosen such that q variables are selected. For an example on how to use glmnet.lasso_maxCoef see stabsel.

Value

A named list with elements

`selected`	logical. A vector that indicates which variable was selected.
`path`	logical. A matrix that indicates which variable was selected in which step. Each row represents one variable, the columns represent the steps.

Examples

  if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  if (require("lars")) {
      ## selected variables
      lars.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$selected
      lars.stepwise(bodyfat[, -2], bodyfat[,2], q = 3)$selected
  }
  
  if (require("glmnet")) {
      glmnet.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$selected
      ## selection path
      glmnet.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$path
  
      ## Using the anticonservative glmnet.lasso (see args.fitfun):
      stab.glmnet <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = glmnet.lasso, 
                             args.fitfun = list(type = "anticonservative"), 
                             cutoff = 0.75, PFER = 1)
  }
if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  if (require("lars")) {
      ## selected variables
      lars.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$selected
      lars.stepwise(bodyfat[, -2], bodyfat[,2], q = 3)$selected
  }
  
  if (require("glmnet")) {
      glmnet.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$selected
      ## selection path
      glmnet.lasso(bodyfat[, -2], bodyfat[,2], q = 3)$path
  
      ## Using the anticonservative glmnet.lasso (see args.fitfun):
      stab.glmnet <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = glmnet.lasso, 
                             args.fitfun = list(type = "anticonservative"), 
                             cutoff = 0.75, PFER = 1)
  }

Method to Extract Parameters

Description

Extract stability selection parameters, i.e., tuning parameters, from a stabsel object.

Usage

parameters(object)

## extract parameters from a stabsel model
## (same as parameters(p) )
## S3 method for class 'stabsel'
stabsel_parameters(p, ...)
parameters(object)

## extract parameters from a stabsel model
## (same as parameters(p) )
## S3 method for class 'stabsel'
stabsel_parameters(p, ...)

Arguments

`object`	an object of class `"stabsel"` or `"stabsel_parameters"`.
`p`	an object of class `"stabsel"`.
`...`	additional arguments, currently not used.

Value

An object of class stabsel_parameters with a special print method. See there for details.

Plot and Print Methods for Stability Selection

Description

Display results of stability selection.

Usage

## S3 method for class 'stabsel'
plot(x, main = deparse(x$call), type = c("maxsel", "paths"),
     xlab = NULL, ylab = NULL, col = NULL, ymargin = 10, np = sum(x$max > 0),
     labels = NULL, ...)
## S3 method for class 'stabsel'
print(x, decreasing = FALSE, print.all = TRUE, ...)
## S3 method for class 'stabsel'
plot(x, main = deparse(x$call), type = c("maxsel", "paths"),
     xlab = NULL, ylab = NULL, col = NULL, ymargin = 10, np = sum(x$max > 0),
     labels = NULL, ...)
## S3 method for class 'stabsel'
print(x, decreasing = FALSE, print.all = TRUE, ...)

Arguments

`x`	object of class `stabsel`.
`main`	main title for the plot.
`type`	plot type; either stability paths (`"paths"`) or a plot of the maximum selection frequency (`"maxsel"`).
`xlab`, `ylab`	labels for the x- and y-axis of the plot. Per default, sensible labels are used depending on the `type` of the plot.
`col`	a vector of colors; Typically, one can specify a single color or one color for each variable. Per default, colors depend on the maximal selection frequency of the variable and range from grey to red.
`ymargin`	(temporarily) specifies the y margin of of the plot in lines (see argument `"mar"` of function `par`). This only affects the right margin for `type = "paths"` and the left margin for `type = "maxsel"`. Explicit user specified margins are kept and are not overwritten.
`np`	number of variables to plot for the maximum selection frequency plot (`type = "maxsel"`); the first `np` variables with highest selection frequency are plotted.
`labels`	variable labels for the plot; one label per variable / effect must be specified. Per default, the names of `x$max` are used.
`decreasing`	logical. Should the selection frequencies be printed in descending order (`TRUE`) or in ascending order (`FALSE`)?
`print.all`	logical. Should all selection frequencies be displayed or only those that are greater than zero?
`...`	additional arguments to `plot` and `print` functions.

Details

This function implements the stability selection procedure by Meinshausen and Buehlmann (2010) and the improved error bounds by Shah and Samworth (2013).

Two of the three arguments cutoff, q and PFER must be specified. The per-family error rate (PFER), i.e., the expected number of false positives $E(V)$ , where $V$ is the number of false positives, is bounded by the argument PFER.

As controlling the PFER is more conservative as controlling the family-wise error rate (FWER), the procedure also controlls the FWER, i.e., the probability of selecting at least one non-influential variable (or model component) is less than PFER.

Value

An object of class stabsel with a special print method. The object has the following elements:

`phat`	selection probabilities.
`selected`	elements with maximal selection probability greater `cutoff`.
`max`	maximum of selection probabilities.
`cutoff`	cutoff used.
`q`	average number of selected variables used.
`PFER`	per-family error rate.
`sampling.type`	the sampling type used for stability selection.
`assumption`	the assumptions made on the selection probabilities.
`call`	the call.

References

N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473.

R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80.

Examples

  if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  ## set seed
  set.seed(1234)

  ####################################################################
  ### using stability selection with Lasso methods:

  if (require("lars")) {
      (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = lars.lasso, cutoff = 0.75,
                             PFER = 1))
      par(mfrow = c(2, 1))
      plot(stab.lasso, ymargin = 6)
      opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
      plot(stab.lasso, type = "paths")
  }
if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  ## set seed
  set.seed(1234)

  ####################################################################
  ### using stability selection with Lasso methods:

  if (require("lars")) {
      (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = lars.lasso, cutoff = 0.75,
                             PFER = 1))
      par(mfrow = c(2, 1))
      plot(stab.lasso, ymargin = 6)
      opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
      plot(stab.lasso, type = "paths")
  }

Run Stability Selection

Description

(Internal) function that is used to run stability selection (i.e. to apply the fit-function to the subsamples. This function is not intended to be directly called.

Usage

run_stabsel(fitter, args.fitter, n, p, cutoff, q, PFER, folds, B, assumption,
            sampling.type, papply, verbose, FWER, eval, names,
            mc.preschedule = FALSE, ...)
run_stabsel(fitter, args.fitter, n, p, cutoff, q, PFER, folds, B, assumption,
            sampling.type, papply, verbose, FWER, eval, names,
            mc.preschedule = FALSE, ...)

Arguments

`fitter`	a function to fit the model on subsamples. See argument `fitfun` of `stabsel` for details.
`args.fitter`	a named list containing additional arguments that are passed to `fitter`. See argument `args.fitfun` `stabsel` for details.
`n`	the number of observations; needed for internal checks.
`p`	number of possible predictors (including intercept if applicable).
`cutoff`	cutoff between 0.5 and 1.
`q`	number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.
`PFER`	upper bound for the per-family error rate.
`folds`	a weight matrix that represents the subsamples.
`B`	number of subsampling replicates.
`assumption`	distributional assumption.
`sampling.type`	sampling type to be used.
`papply`	(parallel) apply function.
`verbose`	logical (default: `TRUE`) that determines wether `warnings` should be issued.
`FWER`	deprecated. Only for compatibility with older versions, use PFER instead.
`eval`	logical. Determines whether stability selection is evaluated.
`names`	variable names that are used to label the results.
`mc.preschedule`	preschedule tasks?
`...`	additional arguments to be passed to next function.

Details

This is an internal function that fits the actual models to the subsamples, i.e., this is the work horse that runs stability selection. Usually, one should use stabsel, which internally calls run_stabsel.

run_stabsel can be used by expert users to implement stability selection methods for new model types.

For details (e.g. on arguments) see stabsel.

Value

An object of class stabsel with the following elements:

`phat`	selection probabilities.
`selected`	elements with maximal selection probability greater `cutoff`.
`max`	maximum of selection probabilities.
`cutoff`	cutoff used.
`q`	average number of selected variables used.
`PFER`	per-family error rate.
`p`	the number of effects subject to selection.
`sampling.type`	the sampling type used for stability selection.
`assumption`	the assumptions made on the selection probabilities.

References

Method to Extract Selected Variables

Description

Extract selected variables from a stabsel object.

Usage

selected(object, ...)
## S3 method for class 'stabsel'
selected(object, ...)
selected(object, ...)
## S3 method for class 'stabsel'
selected(object, ...)

Arguments

`object`	an object of class `"stabsel"`.
`...`	additional arguments passed to specific `selected` methods.

Details

The ids of variables selected during the stability selection process can be extracted using selected().

Stability Selection

Description

Selection of influential variables or model components with error control.

Usage

## generic stability selection funcion
stabsel(x, ...)

## a method to fit models with stability selection
## S3 method for class 'matrix'
stabsel(x, y, fitfun = lars.lasso,
        args.fitfun = list(), cutoff, q, PFER,
        folds = subsample(rep(1, nrow(x)), B = B),
        B = ifelse(sampling.type == "MB", 100, 50),
        assumption = c("unimodal", "r-concave", "none"),
        sampling.type = c("SS", "MB"),
        papply = mclapply, mc.preschedule = FALSE,
        verbose = TRUE, FWER, eval = TRUE, ...)

## essentially a wrapper for data.frames (see details)
## S3 method for class 'data.frame'
stabsel(x,  y, intercept = FALSE, ...)
## generic stability selection funcion
stabsel(x, ...)

## a method to fit models with stability selection
## S3 method for class 'matrix'
stabsel(x, y, fitfun = lars.lasso,
        args.fitfun = list(), cutoff, q, PFER,
        folds = subsample(rep(1, nrow(x)), B = B),
        B = ifelse(sampling.type == "MB", 100, 50),
        assumption = c("unimodal", "r-concave", "none"),
        sampling.type = c("SS", "MB"),
        papply = mclapply, mc.preschedule = FALSE,
        verbose = TRUE, FWER, eval = TRUE, ...)

## essentially a wrapper for data.frames (see details)
## S3 method for class 'data.frame'
stabsel(x,  y, intercept = FALSE, ...)

Arguments

`x`	a `matrix` or a `data.frame` containing the predictors.
`y`	a vector or matrix containing the outcome.
`intercept`	logical. If `x` is a `data.frame`, this argument determines if the resulting model matrix should contain a separate intercept or not.
`fitfun`	a function that takes the arguments `x`, `y` as above, and additionally the number of variables to include in each model `q`. The function then needs to fit the model and to return a logical vector that indicates which variable was selected (among the `q` selected variables).
`args.fitfun`	a named list containing additional arguments that are passed to the fitting function; see also argument `args` in `do.call`.
`cutoff`	cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.
`q`	number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.
`PFER`	upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.
`folds`	a weight matrix with number of rows equal to the number of observations, see `subsample`. Usually one should not change the default here as subsampling with a fraction of $1/2$ is needed for the error bounds to hold. One usage scenario where specifying the folds by hand might be the case when one has dependent data (e.g. clusters) and thus wants to draw clusters (i.e., multiple rows together) not individuals.
`assumption`	Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for `sampling.type = "SS"`. Per default, `"unimodality"` is assumed. For `sampling.type = "MB"` we always use `"none"`.
`sampling.type`	use sampling scheme of of Shah & Samworth (2013), i.e., with complementarty pairs (`sampling.type = "SS"`), or the original sampling scheme of Meinshausen & Buehlmann (2010).
`B`	number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use $B$ complementray pairs in the former case this leads to $2B$ subsamples.
`papply`	(parallel) apply function, defaults to `mclapply`. Alternatively, `parLapply` can be used. In the latter case, usually more setup is needed (see example of `cvrisk` for some details).
`mc.preschedule`	preschedule tasks if `papply = mclapply` (default: `mc.preschedule = FALSE`)? For details see `mclapply`.
`verbose`	logical (default: `TRUE`) that determines wether `warnings` should be issued.
`FWER`	deprecated. Only for compatibility with older versions, use PFER instead.
`eval`	logical. Determines whether stability selection is evaluated (`eval = TRUE`; default) or if only the parameter combination is returned.
`...`	additional arguments to parallel apply methods such as `mclapply`.

Details

This function implements the stability selection procedure by Meinshausen and Buehlmann (2010) and the improved error bounds by Shah and Samworth (2013). For details see also Hofner et al. (2014). The error bounds are implemented in the function stabsel_parameters. Two of the three arguments cutoff, q and PFER must be specified. The per-family error rate (PFER), i.e., the expected number of false positives $E(V)$ , where $V$ is the number of false positives, is bounded by the argument PFER.

Predefined fitfuns functions exist but more can be easily implemented. Note that stepwise regression methods are usually not advised as they tend to be relatively unstable. See example below.

The function stabsel for data.frames is essentially just a wrapper to the matrix function with the same argments. The only difference is that in a pre-processing step, the data set is converted to a model matrix using the function model.matrix. The additional argument intercept determines if an explicit intercept should be added to the model matrix. This is often not neccessary but depends on the fitfun.

Value

An object of class stabsel with a special print method. The object has the following elements:

`phat`	selection probabilities.
`selected`	elements with maximal selection probability greater `cutoff`.
`max`	maximum of selection probabilities.
`cutoff`	cutoff used.
`q`	average number of selected variables used.
`PFER`	(realized) upper bound for the per-family error rate.
`specifiedPFER`	specified upper bound for the per-family error rate.
`p`	the number of effects subject to selection.
`B`	the number of subsamples.
`sampling.type`	the sampling type used for stability selection.
`assumption`	the assumptions made on the selection probabilities.
`call`	the call.

References

N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473.

R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80.

Examples

  
  if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  ## set seed
  set.seed(1234)
  
  ####################################################################
  ### using stability selection with Lasso methods:

  if (require("lars")) {
      (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = lars.lasso, cutoff = 0.75,
                             PFER = 1))
      (stab.stepwise <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                                fitfun = lars.stepwise, cutoff = 0.75,
                                PFER = 1))
      par(mfrow = c(2, 1))
      plot(stab.lasso, main = "Lasso")
      plot(stab.stepwise, main = "Stepwise Selection")
      ## --> stepwise selection seems to be quite unstable even in this low
      ##     dimensional example!
  }

  ## set seed (again to make results comparable)
  set.seed(1234)
  if (require("glmnet")) {
      (stab.glmnet <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                              fitfun = glmnet.lasso, cutoff = 0.75,
                              PFER = 1))
      par(mfrow = c(2, 1))
      plot(stab.glmnet, main = "Lasso (glmnet)")
      if (exists("stab.lasso"))
          plot(stab.lasso, main = "Lasso (lars)")    
  }
  
  
  ## Select variables with maximum coefficients based on lasso estimate
  
  set.seed(1234) # reset seed
  if (require("glmnet")) {
      ## use cross-validated lambda 
      lambda.min <- cv.glmnet(x = as.matrix(bodyfat[, -2]), y = bodyfat[,2])$lambda.min
      (stab.maxCoef <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                               fitfun = glmnet.lasso_maxCoef, 
                               # specify additional parameters to fitfun
                               args.fitfun = list(lambda = lambda.min),
                               cutoff = 0.75, PFER = 1))
                               
      ## WARNING: Using a fixed penalty (lambda) is usually not permitted and 
      ##          not sensible. See ?fitfun for details.
      
      ## now compare standard lasso with "maximal parameter estimates" from lasso
      par(mfrow = c(2, 1))
      plot(stab.maxCoef, main = "Lasso (glmnet; Maximum Coefficients)")
      plot(stab.glmnet, main = "Lasso (glmnet)")
      ## --> very different results.
  }

  ####################################################################
  ### using stability selection directly on computed boosting models
  ### from mboost


  if (require("mboost")) {
      ### low-dimensional example
      mod <- glmboost(DEXfat ~ ., data = bodyfat)

      ## compute cutoff ahead of running stabsel to see if it is a sensible
      ## parameter choice.
      ##   p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept)
      stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1,
                         sampling.type = "MB")
      ## the same:
      stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)

      ### Do not test the following code per default on CRAN as it takes some time to run:
      ## now run stability selection
      (sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB"))
      opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
      plot(sbody)
      par(opar)
      plot(sbody, type = "maxsel", ymargin = 6)
      
  }
if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  ## set seed
  set.seed(1234)
  
  ####################################################################
  ### using stability selection with Lasso methods:

  if (require("lars")) {
      (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = lars.lasso, cutoff = 0.75,
                             PFER = 1))
      (stab.stepwise <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                                fitfun = lars.stepwise, cutoff = 0.75,
                                PFER = 1))
      par(mfrow = c(2, 1))
      plot(stab.lasso, main = "Lasso")
      plot(stab.stepwise, main = "Stepwise Selection")
      ## --> stepwise selection seems to be quite unstable even in this low
      ##     dimensional example!
  }

  ## set seed (again to make results comparable)
  set.seed(1234)
  if (require("glmnet")) {
      (stab.glmnet <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                              fitfun = glmnet.lasso, cutoff = 0.75,
                              PFER = 1))
      par(mfrow = c(2, 1))
      plot(stab.glmnet, main = "Lasso (glmnet)")
      if (exists("stab.lasso"))
          plot(stab.lasso, main = "Lasso (lars)")    
  }
  
  
  ## Select variables with maximum coefficients based on lasso estimate
  
  set.seed(1234) # reset seed
  if (require("glmnet")) {
      ## use cross-validated lambda 
      lambda.min <- cv.glmnet(x = as.matrix(bodyfat[, -2]), y = bodyfat[,2])$lambda.min
      (stab.maxCoef <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                               fitfun = glmnet.lasso_maxCoef, 
                               # specify additional parameters to fitfun
                               args.fitfun = list(lambda = lambda.min),
                               cutoff = 0.75, PFER = 1))
                               
      ## WARNING: Using a fixed penalty (lambda) is usually not permitted and 
      ##          not sensible. See ?fitfun for details.
      
      ## now compare standard lasso with "maximal parameter estimates" from lasso
      par(mfrow = c(2, 1))
      plot(stab.maxCoef, main = "Lasso (glmnet; Maximum Coefficients)")
      plot(stab.glmnet, main = "Lasso (glmnet)")
      ## --> very different results.
  }

  ####################################################################
  ### using stability selection directly on computed boosting models
  ### from mboost


  if (require("mboost")) {
      ### low-dimensional example
      mod <- glmboost(DEXfat ~ ., data = bodyfat)

      ## compute cutoff ahead of running stabsel to see if it is a sensible
      ## parameter choice.
      ##   p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept)
      stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1,
                         sampling.type = "MB")
      ## the same:
      stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)

      ### Do not test the following code per default on CRAN as it takes some time to run:
      ## now run stability selection
      (sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB"))
      opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
      plot(sbody)
      par(opar)
      plot(sbody, type = "maxsel", ymargin = 6)
      
  }

Compute Error Bounds for Stability Selection

Description

Compute the missing parameter from the two given parameters in order to assess suitability of the parameter constellation

Usage

stabsel_parameters(p, ...)

## Default S3 method:
stabsel_parameters(p, cutoff, q, PFER,
                   B = ifelse(sampling.type == "MB", 100, 50),
                   assumption = c("unimodal", "r-concave", "none"),
                   sampling.type = c("SS", "MB"),
                   verbose = FALSE, FWER, ...)

## S3 method for class 'stabsel_parameters'
print(x, heading = TRUE, ...)
stabsel_parameters(p, ...)

## Default S3 method:
stabsel_parameters(p, cutoff, q, PFER,
                   B = ifelse(sampling.type == "MB", 100, 50),
                   assumption = c("unimodal", "r-concave", "none"),
                   sampling.type = c("SS", "MB"),
                   verbose = FALSE, FWER, ...)

## S3 method for class 'stabsel_parameters'
print(x, heading = TRUE, ...)

Arguments

`p`	number of possible predictors (including intercept if applicable).
`cutoff`	cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.
`q`	number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.
`PFER`	upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.
`B`	number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use $B$ complementray pairs in the former case this leads to $2B$ subsamples.
`assumption`	Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for `sampling.type = "SS"`. For `sampling.type = "MB"` we always use code"none".
`sampling.type`	use sampling scheme of of Shah & Samworth (2013), i.e., with complementarty pairs (`sampling.type = "SS"`), or the original sampling scheme of Meinshausen & Buehlmann (2010).
`verbose`	logical (default: `TRUE`) that determines wether `warnings` should be issued.
`FWER`	deprecated. Only for compatibility with older versions, use PFER instead.
`x`	an object of class `"stabsel_parameters"`.
`heading`	logical. Specifies if a heading line should be printed.
`...`	additional arguments to be passed to next function.

Details

This function implements the error bounds for stability selection by Meinshausen and Buehlmann (2010) and the improved error bounds by Shah and Samworth (2013). For details see also Hofner et al. (2014).

For more details see also stabsel.

Value

An object of class stabsel_parameters with a special print method. The object has the following elements:

`cutoff`	cutoff used.
`q`	average number of selected variables used.
`PFER`	(realized) upper bound for the per-family error rate.
`specifiedPFER`	specified upper bound for the per-family error rate.
`p`	the number of effects subject to selection.
`B`	the number of subsamples.
`sampling.type`	the sampling type used for stability selection.
`assumption`	the assumptions made on the selection probabilities.

References

N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473.

R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80.

Change Parameters of Stability Selection

Description

Method to change the parameters cutoff, PFER and assumption of stability selection that can be altered without the need to re-run the subsampling process.

Usage

## S3 method for class 'stabsel'
stabsel(x, cutoff, PFER, assumption = x$assumption, ...)
## S3 method for class 'stabsel'
stabsel(x, cutoff, PFER, assumption = x$assumption, ...)

Arguments

`x`	an object that results from a call to `stabsel`.
`cutoff`	cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.
`PFER`	upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.
`assumption`	Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for `sampling.type = "SS"`. For `sampling.type = "MB"` we always use code"none".
`...`	additional arguments that are currently ignored.

Details

This function allows to alter the parameters cutoff, PFER and assumption of a fitted stability selection result. All other parameters are re-used from the original stability selection results. The missing paramter is computed and the selected variables are updated accordingly.

Value

An object of class stabsel. For details see there.

Examples

  if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  ## set seed
  set.seed(1234)

  ####################################################################
  ### using stability selection with Lasso methods:

  if (require("lars")) {
      (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = lars.lasso, cutoff = 0.75,
                             PFER = 1))

      par(mfrow = c(2, 1))
      plot(stab.lasso)

      ## now change the PFER and the assumption:
      (stab.lasso_cf0.93_rconc <- stabsel(stab.lasso, cutoff = 0.93,
                                          assumption = "r-concave"))
      plot(stab.lasso_cf0.93_rconc)
      ## the cutoff did change and hence the PFER and the selected
      ## variables
  }
if (require("TH.data")) {
      ## make data set available
      data("bodyfat", package = "TH.data")
  } else {
      ## simulate some data if TH.data not available. 
      ## Note that results are non-sense with this data.
      bodyfat <- matrix(rnorm(720), nrow = 72, ncol = 10)
  }
  
  ## set seed
  set.seed(1234)

  ####################################################################
  ### using stability selection with Lasso methods:

  if (require("lars")) {
      (stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2],
                             fitfun = lars.lasso, cutoff = 0.75,
                             PFER = 1))

      par(mfrow = c(2, 1))
      plot(stab.lasso)

      ## now change the PFER and the assumption:
      (stab.lasso_cf0.93_rconc <- stabsel(stab.lasso, cutoff = 0.93,
                                          assumption = "r-concave"))
      plot(stab.lasso_cf0.93_rconc)
      ## the cutoff did change and hence the PFER and the selected
      ## variables
  }

Draw Random Subsamples

Description

Set up weight matrix for subsampling with sample proportion $1/2$ to be used with stabsel.

Usage

subsample(weights, B = 100, strata = NULL)
subsample(weights, B = 100, strata = NULL)

Arguments

`weights`	a numeric vector of weights for the model to be cross-validated.
`B`	number of folds, per default 25 for `bootstrap` and `subsampling` and 10 for `kfold`.
`strata`	a factor of the same length as `weights` for stratification.

Details

The function subsample can be used to build an appropriate weight matrix to be used with stabsel. See there for more details.

If strata is defined sampling is performed in each stratum separately thus preserving the distribution of the strata variable in each fold.

Examples

  ## just a low-dimensional example
  subsample(weights = rep(1, 10), B = 50)
## just a low-dimensional example
  subsample(weights = rep(1, 10), B = 50)

Package 'stabs'

Help Index

Check if folds result from subsampling with p = 0.5.

Description

Usage

Arguments

Details

Value

References

See Also

Fit Functions for Stability Selection

Description

Usage

Arguments

Details

Value

See Also

Examples

Method to Extract Parameters

Description

Usage

Arguments

Value

See Also

Plot and Print Methods for Stability Selection

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Run Stability Selection

Description

Usage

Arguments

Details

Value

References

See Also

Method to Extract Selected Variables

Description

Usage

Arguments

Details

Stability Selection

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Compute Error Bounds for Stability Selection

Description

Usage

Arguments

Details

Value

References

See Also

Change Parameters of Stability Selection

Description

Usage

Arguments

Details

Value

See Also

Examples

Draw Random Subsamples

Description

Usage

Arguments

Details

See Also

Examples