| Title: | Measures of Uncertainty for Model Selection |
|---|---|
| Description: | Following the common types of measures of uncertainty for parameter estimation, two measures of uncertainty were proposed for model selection, see Liu, Li and Jiang (2020) <doi:10.1007/s11749-020-00737-9>. The first measure is a kind of model confidence set that relates to the variation of model selection, called Mac. The second measure focuses on error of model selection, called LogP. They are all computed via bootstrapping. This package provides functions to compute these two measures. Furthermore, a similar model confidence set adapted from Bayesian Model Averaging can also be computed using this package. |
| Authors: | Yuanyuan Li [aut, cre], Jiming Jiang [ths] |
| Maintainer: | Yuanyuan Li <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-05-20 06:53:40 UTC |
| Source: | https://github.com/yuanyuanli96/maclogp |
This function allows you to obtain a bayesian model confidence set with approximate posterior model probability.
bms(data, alpha, eps = 1e-06)bms(data, alpha, eps = 1e-06)
data |
a list including
|
alpha |
a vector of significance levels. The confidence levels are 1- |
eps |
toterance level in choosing models with total posteriors
at least |
Returns a list containing:
models |
A list with one entry for each model. Each entry is an integer
vector that specifies the columns of matrix |
con_sets |
a list with with one entry for a |
length_con |
lengths of confidence sets. |
probs_inorder |
Model posteriors in decreasing order. |
beta_ls |
a list with one entry for each model. Each entry is a vector of estimated coefficients for that model. |
Liu, X., Li, Y. & Jiang, J.(2020). Simple measures of uncertainty for model selection. TEST, 1-20.
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196.
n= 50 B= 100 p= 5 x = matrix(rnorm(n*p, mean=0, sd=1), n, p) true_b = c(1:3, rep(0,p-3)) y = x%*% true_b+rnorm(n) alpha=c(0.1,0.05,0.01) data=list(x=x,y=y) result=bms(data,alpha)n= 50 B= 100 p= 5 x = matrix(rnorm(n*p, mean=0, sd=1), n, p) true_b = c(1:3, rep(0,p-3)) y = x%*% true_b+rnorm(n) alpha=c(0.1,0.05,0.01) data=list(x=x,y=y) result=bms(data,alpha)
These data consist of observations on 442 patients, with the response of interest being a quantitative measure of disease progression one year after baseline. There are ten baseline variables and have been normalized to have mean 0 and Euclidean norm 1. The response variable has been centered (mean 0).
diabetesdiabetes
A data frame with 442 rows and 11 variables:
age
sex
body-mass index
average blood pressure
blood serum measurement 1
blood serum measurement 2
blood serum measurement 3
blood serum measurement 4
blood serum measurement 5
blood serum measurement 6
disease progression
https://web.stanford.edu/~hastie/Papers/LARS/diabetes.sdata.txt
Efron, Hastie, Johnstone and Tibshirani (2003), Least Angle Regression. Annals of Statistics.
This function allows you to obtain a model confidence set using Mac procedure and the LogP uncertainty measure for a selection method based on an information criterion.
MAC(models, data, B, alpha, method = "bic", delta = 1e-04, eps = 1e-06)MAC(models, data, B, alpha, method = "bic", delta = 1e-04, eps = 1e-06)
models |
A list with one entry for each model. Each entry is an
integer vector that specifies the columns of matrix |
data |
a list including
|
B |
number of bootstrap replicates to perform; Default value is 200. |
alpha |
a vector of significance levels. The confidence levels of the model confidence sets
are 1- |
method |
Information criterion. Users can choose from |
delta |
A small positive number added inside of LogP when the bootstrap
probability of selected model is 1. Default value is |
eps |
toterance level in choosing models with total bootstrap probabilities
at least |
Returns an object of class “MAC”. An object of class “MAC” is a list containing at least the following components:
hat_M |
numeric index of selected model. |
con_sets |
a list with with one entry for a |
length_con |
lengths of confidence sets. |
order |
Model indexes with increasing information scores based on original data. |
probs_inorder |
Bootstrap probabilities for the models in |
beta_ls |
a list with one entry for each model. Each entry is a vector of estimated coefficients based on original data for that model. |
hat_prob |
the Bootstrap probability for single selected model. |
hat_logp |
the LogP measure. |
Liu, X., Li, Y. & Jiang, J.(2020). Simple measures of uncertainty for model selection. TEST, 1-20.
set.seed(0) n= 50 B= 100 p= 5 x = matrix(rnorm(n*p, mean=0, sd=1), n, p) true_b = c(1:3, rep(0,p-3)) y = x%*% true_b+rnorm(n) alpha=c(0.1,0.05,0.01) data=list(x=x,y=y) models=Models_gen(1:p) result=MAC(models, data, B, alpha)set.seed(0) n= 50 B= 100 p= 5 x = matrix(rnorm(n*p, mean=0, sd=1), n, p) true_b = c(1:3, rep(0,p-3)) y = x%*% true_b+rnorm(n) alpha=c(0.1,0.05,0.01) data=list(x=x,y=y) models=Models_gen(1:p) result=MAC(models, data, B, alpha)
This function generates a list including all subset models given a vector of candidate predictors.
Models_gen(predictors)Models_gen(predictors)
predictors |
a vector including the indexes of all predictors,
such as |
Returns a list with one entry for each model. Each entry is an integer
vector that specifies the columns of matrix x to be used as a regressor in that model.
Models_gen(1:5)Models_gen(1:5)
This funcion generates a heat map for a given model confidence set. Each row represents a model in the confidence set, and colored cell represents the variables in that model.
plot_MAC(models, alpha, con_sets, p, xnames = NULL, color = "lightblue")plot_MAC(models, alpha, con_sets, p, xnames = NULL, color = "lightblue")
models |
A list with one entry for each model. Each entry is an
integer vector that specifies the columns of matrix X without intercept to be used
as a regressor in that model. Intercept will be fitted automatically for every model.
such as |
alpha |
Significance levels. The confidence levels for confidence sets
are |
con_sets |
a list with with one entry for a |
p |
the number of candidate variables. |
xnames |
variable names of all candidate variables. Default is |
color |
the color that indicates a variable is selected. Default is "lightblue". |
Returns a logical matrix per confidence set with one row per model and one column per variable indicating whether that variable is in the model.
Generates a corresponding heat map per confidence set with one row per model and one column per variable indicating whether that variable is in the model. A cell in white means the variable is not in that model; a cell in user-specified color means the variable is in that model.
n= 50 B= 100 p= 5 x = matrix(rnorm(n*p, mean=0, sd=1), n, p) true_b = c(1:3, rep(0,p-3)) y = x%*% true_b+rnorm(n) alpha=c(0.1,0.05,0.01) data=list(x=x,y=y) models=Models_gen(1:p) result=MAC(models, data, B, alpha) plot_MAC(models, alpha, result$con_sets, p) result2=bms(data, alpha) plot_MAC(result2$models, alpha, result2$con_sets, p)n= 50 B= 100 p= 5 x = matrix(rnorm(n*p, mean=0, sd=1), n, p) true_b = c(1:3, rep(0,p-3)) y = x%*% true_b+rnorm(n) alpha=c(0.1,0.05,0.01) data=list(x=x,y=y) models=Models_gen(1:p) result=MAC(models, data, B, alpha) plot_MAC(models, alpha, result$con_sets, p) result2=bms(data, alpha) plot_MAC(result2$models, alpha, result2$con_sets, p)