Fits sparse regression, classification, and survival models built on a linear predictor, using a family-specific loss whose gradient at the null is (asymptotically) pivotal, combined with a sparsity-inducing penalty. Supported families are Gaussian, binomial, Poisson, exponential, Gumbel, and Cox proportional hazards.
Usage
pic(
X,
y,
family = c("gaussian", "binomial", "poisson", "exponential", "gumbel", "cox"),
penalty = "lasso",
standardize = TRUE,
intercept = TRUE,
lambda_method = c("mc_exact", "mc_gaussian", "analytical"),
relax = FALSE,
mcp_gamma = 3,
scad_a = 3.7,
lambda = NULL,
lambda_alpha = 0.05,
lambda_n_simu = 2000,
tol = 1e-05,
path_length = 10L,
maxit = 1000
)Arguments
- X
Numeric design matrix of shape
(n, p), where rows are observations and columns are candidate predictors.Xmay be a matrix or a data frame coercible to a numeric matrix. It must contain at least two columns and noNA,NaN, or infinite values.- y
Response variable with length
n. For Gaussian and Gumbel models,ymust be numeric. For Binomial models,ymust contain only0and1. For Poisson and Exponential models,ymust be non-negative. Forfamily = "cox",ymust be a two-column numeric matrix(time, event), wheretimeis non-negative andeventis coded as0or1.- family
A character name determining the response distribution and the associated loss function used during optimization. See Details on family-specific losses below for the exact objective functions associated with each family. Default
"gaussian".- penalty
Penalty name: one of
"lasso","scad", or"mcp"(lowercase). Default"lasso". See pic_penalties for the precise form of each penalty and the role ofscad_a/mcp_gamma.- standardize
Whether to standardize columns of
Xto zero mean and unit variance prior to computing the PDB regularization parameter and fitting the model. Strongly recommended for proper PDB calibration. WhenTRUE, the optimization is performed on the standardized design matrix and the stored fitted coefficients therefore correspond to the standardized scale. Usecoef.pic()to recover coefficients on the original scale ofX. Default isTRUE. Standardization centersX; for non-Cox families the resulting column-mean shift is absorbed by the intercept, so combiningstandardize = TRUEwithintercept = FALSEassumes the data are already centered (seeintercept).- intercept
Whether to fit an unpenalized intercept. Forced to
FALSEfor the Cox family. We assume the data has been centered if intercept = FALSE.- lambda_method
One of
"mc_exact","mc_gaussian","analytical". Seelambda_pdb()for details. The default"mc_exact"is a safe choice for small to moderate designs; for very large \(n\) or \(p\),"mc_gaussian"matches"mc_exact"closely at a fraction of the cost (it amortises only one BLAS \(\tt gemm\), no family-specific residual draws), and"analytical"is closed-form.- relax
If
TRUE, run an unpenalized refit on the selected support after the regularization path (debiasing).- mcp_gamma
MCP concavity parameter when
penalty = "mcp". Default 3.0.- scad_a
SCAD concavity parameter when
penalty = "scad". Default 3.7.- lambda
Optional user-supplied regularization parameter. When
NULL,lambda_pdb()is used to select it automatically. Providinglambdaavoids recomputing the PDB calibration, which is useful for example when fitting several penalties ("lasso","scad","mcp") on the same dataset, since the PDB choice of \(\lambda\) does not depend on the penalty itself.- lambda_alpha
Nominal level for PDB. See
lambda_pdb().- lambda_n_simu
Number of Monte Carlo draws for PDB. See
lambda_pdb().- tol
FISTA convergence tolerance at the final path step.
- path_length
Number of points in the warm-start regularization path running from \(\lambda_{\max}\) down to \(\lambda^{\rm PDB}\). Default 10.
- maxit
Maximum number of iterations for FISTA if convergence is not yet reached.
Value
An object of class c("pic.<family>", "pic").
- beta
Fitted coefficients (length
p).- intercept
Fitted intercept or
NULL.- df
The number of nonzero coefficients.
- selected
Indices of the nonzero coefficients.
- lambda
\(\lambda\) used during training (PDB or user-supplied).
- family
The family object used.
- penalty
The penalty object used.
- lambda_pdb
Result from
lambda_pdb()if \(\lambda\) is not user-supplied.- n_samples
The number of observations in the training set.
- n_features
The number of variables in the training set.
For family = "cox", the fit additionally carries:
- baseline_cumulative_hazard
Estimated Breslow baseline cumulative hazard function.
- baseline_survival
Estimated baseline survival function.
- unique_times
Sorted unique event times used in the baseline estimation.
- censoring_rate
Percentage of censored observations in the training set.
Details
Minimises the objective function
$$\hat\beta = \arg\min_\beta\; L(\beta)
\;+\; \lambda_\alpha^{\rm PDB}\,\mathrm{pen}(\beta)
\quad\text{with}\quad
L(\beta) = \phi\!\left(\ell_n\!\left(g(\beta)\right)\right),$$
where the regularization parameter is selected automatically using
the Pivotal Detection Boundary (PDB) principle implemented in
lambda_pdb(). The PDB choice is not just a substitute for
cross-validation: by calibrating \(\lambda\) against a pivotal
null-distribution statistic rather than out-of-sample prediction
error, it yields sharper support recovery - fewer false positives
on noise variables and more accurate identification of the true
non-zero coefficients. Here, \(\phi\) and \(g\) are
family-specific transformations chosen so that the gradient at the
null has a distribution free of the nuisance parameter, which is
precisely what makes the use of \(\lambda^{\rm PDB}_\alpha\)
valid. Optimization is performed using a warm-started FISTA
regularization path.
Details on family-specific losses
"gaussian"Uses the mean squared error for \(\ell_n\), with \(\phi(\cdot) = \sqrt{\cdot}\) and \(g(\cdot) = \mathrm{Id}\), giving the square-root lasso loss $$L(\beta) = \sqrt{\frac{1}{n}\sum_{i=1}^n \left(y_i - (\beta_0 + \mathbf{x}_i^\top\beta)\right)^2}.$$ This makes the gradient at \(\beta = 0\) scale-free, so the PDB threshold needs no estimate of the noise standard deviation.
"binomial"A variance-stabilized transformation of the Bernoulli likelihood. With \(\phi(\cdot) = \mathrm{Id}\) and the logistic link \(\theta_i = g(\eta_i) = (1 + e^{-\eta_i})^{-1}\), the loss is $$L(\beta) = \frac{1}{n}\sum_{i=1}^n \left(2 y_i \sqrt{\frac{1 - \theta_i}{\theta_i}} + 2 (1 - y_i) \sqrt{\frac{\theta_i}{1 - \theta_i}}\right).$$ The classical logistic link is preserved; only the loss itself is modified to obtain a pivotal gradient at the null.
"poisson"The same pivotalization applied to the Poisson likelihood. With \(\phi(\cdot) = \mathrm{Id}\) and the canonical log link \(\theta_i = g(\eta_i) = e^{\eta_i}\), the loss is $$L(\beta) = \frac{1}{n}\sum_{i=1}^n \left(\frac{2 y_i}{\sqrt{\theta_i}} + 2 \sqrt{\theta_i}\right).$$ The canonical log link is kept unchanged; pivotality is obtained through the loss rather than through the link.
"exponential"Uses the standard Exponential negative log-likelihood directly (no transformation). With \(\phi(\cdot) = \mathrm{Id}\) and \(\theta_i = g(\eta_i) = e^{\eta_i}\), the loss is $$L(\beta) = \frac{1}{n}\sum_{i=1}^n \left(\log\theta_i + \frac{y_i}{\theta_i}\right).$$
"gumbel"A location-scale model with extreme-value noise. The base log-likelihood is $$\ell_n(\theta, \sigma) = \log(\sigma) + \frac{1}{n}\sum_{i=1}^n \left(z_i + e^{-z_i}\right), \qquad z_i = \frac{y_i - \theta_i}{\sigma}.$$ With \(\phi(\cdot) = \exp(\cdot)\) and the identity link \(\theta_i = g(\eta_i) = \eta_i\), the objective is \(L(\beta, \sigma) = \exp(\ell_n(\theta, \sigma))\). The scale \(\sigma\) is re-estimated internally by maximum likelihood at every iteration; the user does not supply it.
"cox"A square-root-transformed Cox partial log-likelihood. With \(\phi(\cdot) = \sqrt{\cdot}\) and the identity link \(\theta_i = g(\eta_i) = \eta_i\), the objective is $$L(\beta) = \sqrt{-\frac{1}{n}\left( \sum_{i=1}^n \delta_i \eta_i - \sum_{i=1}^n \delta_i \log\left(\sum_{j \in R_i} e^{\eta_j}\right)\right)},$$ where \(\delta_i\) is the event indicator and \(R_i\) the risk set at event time \(t_i\). The Breslow approximation is used for tied event times, and the model is always fitted without an intercept.
Examples
data(QuickStartExample)
X <- QuickStartExample$X
y <- QuickStartExample$y
fit <- pic(X, y, family = "gaussian", penalty = "scad")
fit$beta
#> [,1]
#> [1,] -0.6371864
#> [2,] 0.0000000
#> [3,] 0.0000000
#> [4,] 0.0000000
#> [5,] -0.2308360
#> [6,] 0.0000000
#> [7,] 0.0000000
#> [8,] 0.0000000
#> [9,] 0.0000000
#> [10,] -1.1585369
#> [11,] 0.0000000
#> [12,] 0.0000000
#> [13,] 0.0000000
#> [14,] 0.0000000
#> [15,] 0.0000000
#> [16,] 0.0000000
#> [17,] -0.9759369
#> [18,] 0.0000000
#> [19,] 0.0000000
#> [20,] 0.0000000
#> [21,] 0.0000000
#> [22,] 0.0000000
#> [23,] 0.0000000
#> [24,] 0.0000000
#> [25,] 0.2894805
#> [26,] 0.0000000
#> [27,] 0.0000000
#> [28,] 0.0000000
#> [29,] 0.0000000
#> [30,] 0.0000000
fit$selected
#> [1] "gene_1" "gene_2" "gene_3" "gene_4" "gene_5"