Sparse regression

June 23, 2016 — January 24, 2025

estimator distribution

functional analysis

linear algebra

model selection

probability

signal processing

sparser than thou

statistics

Suspiciously similar content

Penalised regression where the penalties are sparsifying. This is implicitly a sparse model selection procedure, where we are trying to estimate a model with fewer predictors than the original. The prediction losses could be anything — likelihood, least-squares, robust Huberized losses, absolute deviation, etc. Modern overviews of the whole field are, e.g. Desboulets (2018) and J. Fan and Lv (2010).

Why might be want to be sparse? Many reasons! We might want to be able to interpret the model more easily, we might want to compute it efficiently with constrained resources, we might want to compress it, we might think the real work truly is parismonious…

Mostly we implicitly assume a predictive loss minimisation framework in this notebook; Things look surprisingly different in Bayesian sparsity and even in frequentist Maximum Likelihood estimation.

Sparse regression in its most popular flavours is closely coupled to the clever optimisation methods which are used to make it go.

Interesting other concepts which connect: compressed sensing, matrix factorisations, multiple testing, concentration inequalities, …

1 LASSO

Linear or generalised linear regression under quadratic predictive loss penalty and absolute coefficient penalty.

In the classic form we estimate the regression coefficients \(\beta\) by solving

\[\begin{aligned} \hat{\beta} = \underset{\beta \in \mathbb{R}^p}{\text{argmin}} \: \frac{1}{2} \| y - {\bf X} \beta \|_2^2 + \lambda \| \beta \|_1, \end{aligned}\]

The penalty coefficient \(\lambda\) is left for you to choose, but one of the magical properties of the lasso is that it is easy to test many possible values of \(\lambda\) at low marginal cost.

Popular because, amongst other reasons, it turns out to be fast and convenient, and amenable to various performance accelerations e.g. aggressive approximate variable selection.

2 In Model Interpretation

Sparse regression as a universal classifier explainer?

Local Interpretable Model-agnostic Explanations (Ribeiro, Singh, and Guestrin 2016) is a sparse regression model that explains the predictions of a black-box classifier; see model interpretation.

3 Adaptive LASSO

🏗 This is the one with famous oracle properties if you choose \(\lambda\) correctly. Hsi Zou’s explanation (Zou 2006) is readable. I am having trouble digesting van de Geer’s paper (S. A. van de Geer 2008) on the Generalised Lasso, but it seems to offer me guarantees for something very similar to the Adaptive Lasso, with more general assumptions on the model and loss functions, and some finite sample guarantees.

4 LARS

A confusing one; LASSO and LARS are not the same thing but you can use one to calculate the other? Something like that? I need to work this one through with pencil and paper.

5 Graph LASSO

As used in graphical models. 🏗

6 Elastic net

Combination of \(L_1\) and \(L_2\) penalties. 🏗

7 Grouped LASSO

AFAICT this is the usual LASSO but with grouped factors. See (Yuan and Lin 2006).

8 Relaxed Lasso

🏗

9 Dantzig Selector

🏗

10 Garotte

🏗

11 Degrees-of-freedom penalties

See degrees of freedom.

12 FOCI

FOCI, a sparse model selection procedure (Azadkia and Chatterjee 2019) based on Chatterjee’s ξ statistic as an independence test. (Chatterjee 2020). Looks interesting.

13 FRISO

FRISO (Tucker, Wu, and Müller 2023) is a sparse model selection procedure for functional data.

14 Stability selection

🏗

For now see mplot for an introduction.

15 Debiased LASSO

There exist a few versions, but the one I have needed is (S. A. van de Geer 2008), section 2.1. See also (S. van de Geer 2014b). (🏗 relation to (S. A. van de Geer 2008)?)

16 Sparse basis expansions

Wavelets etc; mostly handled under sparse dictionary bases.

17 Sparse neural nets

That is, sparse regressions as the layers in a neural network? Sure thing. (Wisdom et al. 2016)

18 Other coefficient penalties

Put a weird penalty on the coefficients! E.g. “Smoothly Clipped Absolute Deviation” (SCAD). 🏗

19 Other prediction losses

Put a weird penalty on the error! MAD prediction penalty, lasso-coefficient penalty, etc.

See (H. Wang, Li, and Jiang 2007; Portnoy and Koenker 1997) for some implementations using e.g. maximum absolute prediction error.

20 Implementations

Hastie, Friedman et al.’s glmnet for R is fast and well-regarded, and has a MATLAB version. Here’s how to use it for adaptive lasso. Kenneth Tay has implemented elasticnet penalty for any GLM in glmnet.

SPAMS (C++, MATLAB, R, python) by Mairal looks interesting. It’s an optimisation library for many, many sparse problems.

liblinear also includes lasso-type solvers, as well as support-vector regression.

21 References

Abramovich, Benjamini, Donoho, et al. 2006. “Adapting to Unknown Sparsity by Controlling the False Discovery Rate.” The Annals of Statistics.

Aghasi, Nguyen, and Romberg. 2016. “Net-Trim: A Layer-Wise Convex Pruning of Deep Neural Networks.” arXiv:1611.05162 [Cs, Stat].

Aragam, Amini, and Zhou. 2015. “Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression.” arXiv:1511.08963 [Cs, Math, Stat].

Azadkia, and Chatterjee. 2019. “A Simple Measure of Conditional Dependence.” arXiv:1910.12327 [Cs, Math, Stat].

Azizyan, Krishnamurthy, and Singh. 2015. “Extreme Compressive Sampling for Covariance Estimation.” arXiv:1506.00898 [Cs, Math, Stat].

Bach. 2009. “Model-Consistent Sparse Estimation Through the Bootstrap.” arXiv:0901.3202 [Cs, Stat].

Bach, Jenatton, and Mairal. 2011. Optimization With Sparsity-Inducing Penalties. Foundations and Trends(r) in Machine Learning 1.0.

Bahmani, and Romberg. 2014. “Lifting for Blind Deconvolution in Random Mask Imaging: Identifiability and Convex Relaxation.” arXiv:1501.00046 [Cs, Math, Stat].

Banerjee, Arindam, Chen, Fazayeli, et al. 2014. “Estimation with Norm Regularization.” In Advances in Neural Information Processing Systems 27.

Banerjee, Onureena, Ghaoui, and d’Aspremont. 2008. “Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.” Journal of Machine Learning Research.

Barber, and Candès. 2015. “Controlling the False Discovery Rate via Knockoffs.” The Annals of Statistics.

Barbier. 2015. “Statistical Physics and Approximate Message-Passing Algorithms for Sparse Linear Estimation Problems in Signal Processing and Coding Theory.” arXiv:1511.01650 [Cs, Math].

Baron, Sarvotham, and Baraniuk. 2010. “Bayesian Compressive Sensing via Belief Propagation.” IEEE Transactions on Signal Processing.

Barron, Cohen, Dahmen, et al. 2008. “Approximation and Learning by Greedy Algorithms.” The Annals of Statistics.

Barron, Huang, Li, et al. 2008. “MDL, Penalized Likelihood, and Statistical Risk.” In Information Theory Workshop, 2008. ITW’08. IEEE.

Battiti. 1992. “First-and Second-Order Methods for Learning: Between Steepest Descent and Newton’s Method.” Neural Computation.

Bayati, and Montanari. 2012. “The LASSO Risk for Gaussian Matrices.” IEEE Transactions on Information Theory.

Bellec, and Tsybakov. 2016. “Bounds on the Prediction Error of Penalized Least Squares Estimators with Convex Penalty.” arXiv:1609.06675 [Math, Stat].

Belloni, Chernozhukov, and Wang. 2011. “Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.” Biometrika.

Berk, Brown, Buja, et al. 2013. “Valid Post-Selection Inference.” The Annals of Statistics.

Bertin, Pennec, and Rivoirard. 2011. “Adaptive Dantzig Density Estimation.” Annales de l’Institut Henri Poincaré, Probabilités Et Statistiques.

Bertsimas, King, and Mazumder. 2016. “Best Subset Selection via a Modern Optimization Lens.” The Annals of Statistics.

Bertsimas, Pauphilet, and Van Parys. 2020. “Rejoinder: Sparse Regression: Scalable Algorithms and Empirical Performance.”

Bian, Chen, and Ye. 2014. “Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization.” Mathematical Programming.

Bien, Gaynanova, Lederer, et al. 2018. “Non-Convex Global Minimization and False Discovery Rate Control for the TREX.” Journal of Computational and Graphical Statistics.

Bloniarz, Liu, Zhang, et al. 2015. “Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments.” arXiv:1507.03652 [Math, Stat].

Bondell, Krishna, and Ghosh. 2010. “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models.” Biometrics.

Borgs, Chayes, Cohn, et al. 2014. “An \(L^p\) Theory of Sparse Graph Convergence I: Limits, Sparse Random Graph Models, and Power Law Distributions.” arXiv:1401.2906 [Math].

Bottou, Curtis, and Nocedal. 2016. “Optimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838 [Cs, Math, Stat].

Breiman. 1995. “Better Subset Regression Using the Nonnegative Garrote.” Technometrics.

Bruckstein, Elad, and Zibulevsky. 2008. “On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations.” IEEE Transactions on Information Theory.

Brunton, Proctor, and Kutz. 2016. “Discovering Governing Equations from Data by Sparse Identification of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences.

Bühlmann, and van de Geer. 2011. “Additive Models and Many Smooth Univariate Functions.” In Statistics for High-Dimensional Data. Springer Series in Statistics.

———. 2015. “High-Dimensional Inference in Misspecified Linear Models.” arXiv:1503.06426 [Stat].

Bu, and Lederer. 2017. “Integrating Additional Knowledge Into Estimation of Graphical Models.” arXiv:1704.02739 [Stat].

Bunea, Tsybakov, and Wegkamp. 2007a. “Sparsity Oracle Inequalities for the Lasso.” Electronic Journal of Statistics.

Bunea, Tsybakov, and Wegkamp. 2007b. “Sparse Density Estimation with ℓ1 Penalties.” In Learning Theory. Lecture Notes in Computer Science.

Candès, and Davenport. 2011. “How Well Can We Estimate a Sparse Vector?” arXiv:1104.5246 [Cs, Math, Stat].

Candès, Fan, Janson, et al. 2016. “Panning for Gold: Model-Free Knockoffs for High-Dimensional Controlled Variable Selection.” arXiv Preprint arXiv:1610.02351.

Candès, and Fernandez-Granda. 2013. “Super-Resolution from Noisy Data.” Journal of Fourier Analysis and Applications.

Candès, and Plan. 2010. “Matrix Completion With Noise.” Proceedings of the IEEE.

Candès, Romberg, and Tao. 2006. “Stable Signal Recovery from Incomplete and Inaccurate Measurements.” Communications on Pure and Applied Mathematics.

Candès, Wakin, and Boyd. 2008. “Enhancing Sparsity by Reweighted ℓ 1 Minimization.” Journal of Fourier Analysis and Applications.

Carmi. 2013. “Compressive System Identification: Sequential Methods and Entropy Bounds.” Digital Signal Processing.

———. 2014. “Compressive System Identification.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.

Cevher, Duarte, Hegde, et al. 2009. “Sparse Signal Recovery Using Markov Random Fields.” In Advances in Neural Information Processing Systems.

Chartrand, and Yin. 2008. “Iteratively Reweighted Algorithms for Compressive Sensing.” In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008.

Chatterjee. 2020. “A New Coefficient of Correlation.” arXiv:1909.10140 [Math, Stat].

Chen, Xiaojun. 2012. “Smoothing Methods for Nonsmooth, Nonconvex Minimization.” Mathematical Programming.

Chen, Y., and Hero. 2012. “Recursive ℓ1,∞ Group Lasso.” IEEE Transactions on Signal Processing.

Chen, Minhua, Silva, Paisley, et al. 2010. “Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.” IEEE Transactions on Signal Processing.

Chen, Yen-Chi, and Wang. n.d. “Discussion on ‘Confidence Intervals and Hypothesis Testing for High-Dimensional Regression’.”

Chernozhukov, Chetverikov, Demirer, et al. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal.

Chernozhukov, Hansen, Liao, et al. 2018. “Inference For Heterogeneous Effects Using Low-Rank Estimations.” arXiv:1812.08089 [Math, Stat].

Chernozhukov, Newey, and Singh. 2018. “Learning L2 Continuous Regression Functionals via Regularized Riesz Representers.” arXiv:1809.05224 [Econ, Math, Stat].

Chetverikov, Liao, and Chernozhukov. 2016. “On Cross-Validated Lasso.” arXiv:1605.02214 [Math, Stat].

Chichignoud, Lederer, and Wainwright. 2014. “A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees.” arXiv:1410.0247 [Math, Stat].

Dai, and Barber. 2016. “The Knockoff Filter for FDR Control in Group-Sparse and Multitask Regression.” arXiv Preprint arXiv:1602.03589.

Daneshmand, Gomez-Rodriguez, Song, et al. 2014. “Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-Thresholding Algorithm.” In ICML.

Desboulets. 2018. “A Review on Variable Selection in Regression Analysis.” Econometrics.

Descloux, and Sardy. 2018. “Model Selection with Lasso-Zero: Adding Straw to the Haystack to Better Find Needles.” arXiv:1805.05133 [Stat].

Diaconis, and Freedman. 1984. “Asymptotics of Graphical Projection Pursuit.” The Annals of Statistics.

Dossal, Kachour, Fadili, et al. 2011. “The Degrees of Freedom of the Lasso for General Design Matrix.” arXiv:1111.1162 [Cs, Math, Stat].

Efron, Hastie, Johnstone, et al. 2004. “Least Angle Regression.” The Annals of Statistics.

El Karoui. 2008. “Operator Norm Consistent Estimation of Large Dimensional Sparse Covariance Matrices.” University of California, Berkeley.

Elhamifar, and Vidal. 2013. “Sparse Subspace Clustering: Algorithm, Theory, and Applications.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Engebretsen, and Bohlin. 2019. “Statistical Predictions with Glmnet.” Clinical Epigenetics.

Ewald, and Schneider. 2015. “Confidence Sets Based on the Lasso Estimator.” arXiv:1507.05315 [Math, Stat].

Fan, Rong-En, Chang, Hsieh, et al. 2008. “LIBLINEAR: A Library for Large Linear Classification.” Journal of Machine Learning Research.

Fan, Jianqing, and Li. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association.

Fan, Jianqing, and Lv. 2010. “A Selective Overview of Variable Selection in High Dimensional Feature Space.” Statistica Sinica.

Flynn, Hurvich, and Simonoff. 2013. “Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models.” arXiv:1302.2068 [Stat].

Foygel, and Srebro. 2011. “Fast-Rate and Optimistic-Rate Error Bounds for L1-Regularized Regression.” arXiv:1108.0373 [Math, Stat].

Freijeiro-González, Febrero-Bande, and González-Manteiga. 2022. “A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates.” International Statistical Review.

Friedman, Hastie, Höfling, et al. 2007. “Pathwise Coordinate Optimization.” The Annals of Applied Statistics.

Friedman, Hastie, and Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics.

Fu, and Zhou. 2013. “Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent.” Journal of the American Statistical Association.

Gasso, Rakotomamonjy, and Canu. 2009. “Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming.” IEEE Transactions on Signal Processing.

Ghadimi, and Lan. 2013a. “Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming.” SIAM Journal on Optimization.

———. 2013b. “Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming.” arXiv:1310.3787 [Math].

Girolami. 2001. “A Variational Method for Learning Sparse and Overcomplete Representations.” Neural Computation.

Giryes, Sapiro, and Bronstein. 2014. “On the Stability of Deep Networks.” arXiv:1412.5896 [Cs, Math, Stat].

Greenhill, Isaev, Kwan, et al. 2016. “The Average Number of Spanning Trees in Sparse Graphs with Given Degrees.” arXiv:1606.01586 [Math].

Gu, Fu, and Zhou. 2014. “Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data.” arXiv:1403.2310 [Stat].

Gui, and Li. 2005. “Penalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Applications to Microarray Gene Expression Data.” Bioinformatics.

Gupta, and Pensky. 2016. “Solution of Linear Ill-Posed Problems Using Random Dictionaries.” arXiv:1605.07913 [Math, Stat].

Hallac, Leskovec, and Boyd. 2015. “Network Lasso: Clustering and Optimization in Large Graphs.” arXiv:1507.00280 [Cs, Math, Stat].

Hall, Jin, and Miller. 2014. “Feature Selection When There Are Many Influential Features.” Bernoulli.

Hall, and Xue. 2014. “On Selecting Interacting Features from High-Dimensional Data.” Computational Statistics & Data Analysis.

Hansen, Reynaud-Bouret, and Rivoirard. 2015. “Lasso and Probabilistic Inequalities for Multivariate Point Processes.” Bernoulli.

Hastie, Trevor J., Tibshirani, Rob, and Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations.

Hastie, Trevor, Tibshirani, and Tibshirani. 2017. “Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso.”

Hawe, Kleinsteuber, and Diepold. 2013. “Analysis Operator Learning and Its Application to Image Reconstruction.” IEEE Transactions on Image Processing.

Hebiri, and van de Geer. 2011. “The Smooth-Lasso and Other ℓ1+ℓ2-Penalized Methods.” Electronic Journal of Statistics.

Hegde, and Baraniuk. 2012. “Signal Recovery on Incoherent Manifolds.” IEEE Transactions on Information Theory.

Hegde, Indyk, and Schmidt. 2015. “A Nearly-Linear Time Framework for Graph-Structured Sparsity.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15).

He, Rish, and Parida. 2014. “Transductive HSIC Lasso.” In Proceedings of the 2014 SIAM International Conference on Data Mining. Proceedings.

Hesterberg, Choi, Meier, et al. 2008. “Least Angle and ℓ1 Penalized Regression: A Review.” Statistics Surveys.

Hirose, Tateishi, and Konishi. 2011. “Efficient Algorithm to Select Tuning Parameters in Sparse Regression Modeling with Regularization.” arXiv:1109.2411 [Stat].

Hormati, Roy, Lu, et al. 2010. “Distributed Sampling of Signals Linked by Sparse Filtering: Theory and Applications.” IEEE Transactions on Signal Processing.

Hsieh, Sustik, Dhillon, et al. 2014. “QUIC: Quadratic Approximation for Sparse Inverse Covariance Estimation.” Journal of Machine Learning Research.

Huang, Cheang, and Barron. 2008. “Risk of Penalized Least Squares, Greedy Selection and L1 Penalization for Flexible Function Libraries.”

Hu, Pehlevan, and Chklovskii. 2014. “A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.

Ishwaran, and Rao. 2005. “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics.

Janková, and van de Geer. 2016. “Confidence Regions for High-Dimensional Generalized Linear Models Under Sparsity.” arXiv:1610.01353 [Math, Stat].

Janson, Fithian, and Hastie. 2015. “Effective Degrees of Freedom: A Flawed Metaphor.” Biometrika.

Javanmard, and Montanari. 2014. “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research.

Jung. 2013. “An RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29.

Kabán. 2014. “New Bounds on Compressive Linear Least Squares Regression.” In Journal of Machine Learning Research.

Kato. 2009. “On the Degrees of Freedom in Shrinkage Estimation.” Journal of Multivariate Analysis.

Kim, Kwon, and Choi. 2012. “Consistent Model Selection Criteria on High Dimensions.” Journal of Machine Learning Research.

Koltchinskii. 2011. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics École d’Été de Probabilités de Saint-Flour 2033.

Koppel, Warnell, Stump, et al. 2016. “Parsimonious Online Learning with Kernels via Sparse Projections in Function Space.” arXiv:1612.04111 [Cs, Stat].

Kowalski, and Torrésani. 2009. “Structured Sparsity: From Mixed Norms to Structured Shrinkage.” In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations.

Krämer, Schäfer, and Boulesteix. 2009. “Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models.” BMC Bioinformatics.

Lambert-Lacroix, and Zwald. 2011. “Robust Regression Through the Huber’s Criterion and Adaptive Lasso Penalty.” Electronic Journal of Statistics.

Lam, and Fan. 2009. “Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.” Annals of Statistics.

Langford, Li, and Zhang. 2009. “Sparse Online Learning via Truncated Gradient.” In Advances in Neural Information Processing Systems 21.

Lederer, and Vogt. 2020. “Estimating the Lasso’s Effective Noise.” arXiv:2004.11554 [Stat].

Lee, Sun, Sun, et al. 2013. “Exact Post-Selection Inference, with Application to the Lasso.” arXiv:1311.6238 [Math, Stat].

Lemhadri, Ruan, Abraham, et al. 2021. “LassoNet: A Neural Network with Feature Sparsity.” Journal of Machine Learning Research.

Li, and Lederer. 2019. “Tuning Parameter Calibration for ℓ1-Regularized Logistic Regression.” Journal of Statistical Planning and Inference.

Lim, and Lederer. 2016. “Efficient Feature Selection With Large and High-Dimensional Data.” arXiv:1609.07195 [Stat].

Lockhart, Taylor, Tibshirani, et al. 2014. “A Significance Test for the Lasso.” The Annals of Statistics.

Lu, Goldberg, and Fine. 2012. “On the Robustness of the Adaptive Lasso to Model Misspecification.” Biometrika.

Lundberg, and Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems.

Mahoney. 2016. “Lecture Notes on Spectral Graph Methods.” arXiv Preprint arXiv:1608.04845.

Mairal. 2015. “Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning.” SIAM Journal on Optimization.

Matabuena. 2025. “Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age.”

Mazumder, Friedman, and Hastie. 2009. “SparseNet: Coordinate Descent with Non-Convex Penalties.”

Meier, van de Geer, and Bühlmann. 2008. “The Group Lasso for Logistic Regression.” Group.

Meinshausen, and Bühlmann. 2006. “High-Dimensional Graphs and Variable Selection with the Lasso.” The Annals of Statistics.

Meinshausen, and Yu. 2009. “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics.

Molchanov, Ashukha, and Vetrov. 2017. “Variational Dropout Sparsifies Deep Neural Networks.” In Proceedings of ICML.

Montanari. 2012. “Graphical Models Concepts in Compressed Sensing.” Compressed Sensing: Theory and Applications.

Mousavi, and Baraniuk. 2017. “Learning to Invert: Signal Recovery via Deep Convolutional Networks.” In ICASSP.

Müller, and van de Geer. 2015. “Censored Linear Model in High Dimensions: Penalised Linear Regression on High-Dimensional Data with Left-Censored Response Variable.” TEST.

Naik, and Tsai. 2001. “Single‐index Model Selections.” Biometrika.

Nam, and Gribonval. 2012. “Physics-Driven Structured Cosparse Modeling for Source Localization.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Needell, and Tropp. 2008. “CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples.” arXiv:0803.2392 [Cs, Math].

Nesterov. 2012. “Gradient Methods for Minimizing Composite Functions.” Mathematical Programming.

Neville, Ormerod, and Wand. 2014. “Mean Field Variational Bayes for Continuous Sparse Signal Shrinkage: Pitfalls and Remedies.” Electronic Journal of Statistics.

Ngiam, Chen, Bhaskar, et al. 2011. “Sparse Filtering.” In Advances in Neural Information Processing Systems 24.

Nickl, and van de Geer. 2013. “Confidence Sets in Sparse Regression.” The Annals of Statistics.

Oymak, Jalali, Fazel, et al. 2013. “Noisy Estimation of Simultaneously Structured Models: Limitations of Convex Relaxation.” In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC).

Peleg, Eldar, and Elad. 2010. “Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery.” IEEE Transactions on Signal Processing.

Portnoy, and Koenker. 1997. “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators.” Statistical Science.

Pouget-Abadie, and Horel. 2015. “Inferring Graphs from Cascades: A Sparse Recovery Framework.” In Proceedings of The 32nd International Conference on Machine Learning.

Pourahmadi. 2011. “Covariance Estimation: The GLM and Regularization Perspectives.” Statistical Science.

Qian, and Yang. 2012. “Model Selection via Standard Error Adjusted Adaptive Lasso.” Annals of the Institute of Statistical Mathematics.

Qin, Scheinberg, and Goldfarb. 2013. “Efficient Block-Coordinate Descent Algorithms for the Group Lasso.” Mathematical Programming Computation.

Rahimi, and Recht. 2009. “Weighted Sums of Random Kitchen Sinks: Replacing Minimization with Randomization in Learning.” In Advances in Neural Information Processing Systems.

Ravikumar, Wainwright, Raskutti, et al. 2011. “High-Dimensional Covariance Estimation by Minimizing ℓ1-Penalized Log-Determinant Divergence.” Electronic Journal of Statistics.

Ravishankar, Saiprasad, and Bresler. 2015. “Efficient Blind Compressed Sensing Using Sparsifying Transforms with Convergence Guarantees and Application to MRI.” arXiv:1501.02923 [Cs, Stat].

Ravishankar, S., and Bresler. 2015. “Sparsifying Transform Learning With Efficient Optimal Updates and Convergence Guarantees.” IEEE Transactions on Signal Processing.

Reynaud-Bouret. 2003. “Adaptive Estimation of the Intensity of Inhomogeneous Poisson Processes via Concentration Inequalities.” Probability Theory and Related Fields.

Reynaud-Bouret, and Schbath. 2010. “Adaptive Estimation for Hawkes Processes; Application to Genome Analysis.” The Annals of Statistics.

Ribeiro, Singh, and Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16.

Rish, and Grabarnik. 2014. “Sparse Signal Recovery with Exponential-Family Noise.” In Compressed Sensing & Sparse Filtering. Signals and Communication Technology.

Rish, and Grabarnik. 2015. Sparse Modeling: Theory, Algorithms, and Applications. Chapman & Hall/CRC Machine Learning & Pattern Recognition Series.

Ročková, and George. 2018. “The Spike-and-Slab LASSO.” Journal of the American Statistical Association.

Sashank J. Reddi, Suvrit Sra, Barnabás Póczós, et al. 1995. “Stochastic Frank-Wolfe Methods for Nonconvex Optimization.”

Schelldorfer, Bühlmann, and van de Geer. 2011. “Estimation for High-Dimensional Linear Mixed-Effects Models Using ℓ1-Penalization.” Scandinavian Journal of Statistics.

Semenova, Rudin, and Parr. 2021. “A Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning.” arXiv:1908.01755 [Cs, Stat].

Shen, and Huang. 2006. “Optimal Model Assessment, Selection, and Combination.” Journal of the American Statistical Association.

Shen, Huang, and Ye. 2004. “Adaptive Model Selection and Assessment for Exponential Family Distributions.” Technometrics.

Shen, and Ye. 2002. “Adaptive Model Selection.” Journal of the American Statistical Association.

She, and Owen. 2010. “Outlier Detection Using Nonconvex Penalized Regression.”

Simon, Friedman, Hastie, et al. 2011. “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software.

Smith, Forte, Jordan, et al. 2015. “L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.” arXiv:1512.04011 [Cs].

Soh, and Chandrasekaran. 2017. “A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.” arXiv:1701.01207 [Cs, Math, Stat].

Soltani, and Hegde. 2016. “Demixing Sparse Signals from Nonlinear Observations.” Statistics.

Starck, Elad, and Donoho. 2005. “Image Decomposition via the Combination of Sparse Representations and a Variational Approach.” IEEE Transactions on Image Processing.

Stine. 2004. “Discussion of ‘Least Angle Regression’ by Efron Et Al.” The Annals of Statistics.

Su, Bogdan, and Candès. 2015. “False Discoveries Occur Early on the Lasso Path.” arXiv:1511.01957 [Cs, Math, Stat].

Taddy. 2013. “One-Step Estimator Paths for Concave Regularization.” arXiv:1308.5623 [Stat].

Tarr, Müller, and Welsh. 2018. “Mplot: An R Package for Graphical Model Stability and Variable Selection Procedures.” Journal of Statistical Software.

Thisted. 1997. “[The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error Versus Absolute-Error Estimators]: Comment.” Statistical Science.

Thrampoulidis, Abbasi, and Hassibi. 2015. “LASSO with Non-Linear Measurements Is Equivalent to One With Linear Measurements.” In Advances in Neural Information Processing Systems 28.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society. Series B (Methodological).

———. 2011. “Regression Shrinkage and Selection via the Lasso: A Retrospective.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Tibshirani, Ryan J. 2014. “A General Framework for Fast Stagewise Algorithms.” arXiv:1408.5801 [Stat].

Trofimov, and Genkin. 2015. “Distributed Coordinate Descent for L1-Regularized Logistic Regression.” In Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science 542.

———. 2016. “Distributed Coordinate Descent for Generalized Linear Models with Regularization.” arXiv:1611.02101 [Cs, Stat].

Tropp, and Wright. 2010. “Computational Methods for Sparse Solution of Linear Inverse Problems.” Proceedings of the IEEE.

Tschannen, and Bölcskei. 2016. “Noisy Subspace Clustering via Matching Pursuits.” arXiv:1612.03450 [Cs, Math, Stat].

Tucker, Wu, and Müller. 2023. “Variable Selection for Global Fréchet Regression.” Journal of the American Statistical Association.

Uematsu. 2015. “Penalized Likelihood Estimation in High-Dimensional Time Series Models and Its Application.” arXiv:1504.06706 [Math, Stat].

Unser, Michael A., and Tafti. 2014. An Introduction to Sparse Stochastic Processes.

Unser, M., Tafti, Amini, et al. 2014. “A Unified Formulation of Gaussian Vs Sparse Stochastic Processes - Part II: Discrete-Domain Theory.” IEEE Transactions on Information Theory.

Unser, M., Tafti, and Sun. 2014. “A Unified Formulation of Gaussian Vs Sparse Stochastic Processes—Part I: Continuous-Domain Theory.” IEEE Transactions on Information Theory.

Geer, Sara van de. 2007. “The Deterministic Lasso.”

Geer, Sara A. van de. 2008. “High-Dimensional Generalized Linear Models and the Lasso.” The Annals of Statistics.

Geer, Sara van de. 2014a. “Weakly Decomposable Regularization Penalties and Structured Sparsity.” Scandinavian Journal of Statistics.

———. 2014b. “Worst Possible Sub-Directions in High-Dimensional Models.” In arXiv:1403.7023 [Math, Stat].

———. 2014c. “Statistical Theory for High-Dimensional Models.” arXiv:1409.8557 [Math, Stat].

———. 2016. Estimation and Testing Under Sparsity. Lecture Notes in Mathematics.

Geer, Sara van de, Bühlmann, Ritov, et al. 2014. “On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models.” The Annals of Statistics.

Geer, Sara A. van de, Bühlmann, and Zhou. 2011. “The Adaptive and the Thresholded Lasso for Potentially Misspecified Models (and a Lower Bound for the Lasso).” Electronic Journal of Statistics.

Veitch, and Roy. 2015. “The Class of Random Graphs Arising from Exchangeable Random Measures.” arXiv:1512.03099 [Cs, Math, Stat].

Wahba. 1990. Spline Models for Observational Data.

Wang, Zhangyang, Chang, Ling, et al. 2016. “Stacked Approximated Regression Machine: A Simple Deep Learning Approach.” In.

Wang, L., Gordon, and Zhu. 2006. “Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning.” In Sixth International Conference on Data Mining (ICDM’06).

Wang, Hansheng, Li, and Jiang. 2007. “Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso.” Journal of Business & Economic Statistics.

Wang, Gao, Sarkar, Carbonetto, et al. 2020. “A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping.” Journal of the Royal Statistical Society Series B: Statistical Methodology.

Wasserman, and Roeder. 2009. “High-Dimensional Variable Selection.” Annals of Statistics.

Wisdom, Powers, Pitton, et al. 2016. “Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29.

Woodworth, and Chartrand. 2015. “Compressed Sensing Recovery via Nonconvex Shrinkage Penalties.” arXiv:1504.02923 [Cs, Math].

Wright, Nowak, and Figueiredo. 2009. “Sparse Reconstruction by Separable Approximation.” IEEE Transactions on Signal Processing.

Wu, Yichao. 2021. “Can’t Ridge Regression Perform Variable Selection?” Technometrics.

Wu, Tong Tong, and Lange. 2008. “Coordinate Descent Algorithms for Lasso Penalized Regression.” The Annals of Applied Statistics.

Xu, Caramanis, and Mannor. 2010. “Robust Regression and Lasso.” IEEE Transactions on Information Theory.

———. 2012. “Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Yaghoobi, Nam, Gribonval, et al. 2012. “Noise Aware Analysis Operator Learning for Approximately Cosparse Signals.” In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Yang, and Xu. 2013. “A Unified Robust Regression Model for Lasso-Like Algorithms.” In ICML (3).

Yoshida, and West. 2010. “Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing.” Journal of Machine Learning Research.

Yuan, and Lin. 2006. “Model Selection and Estimation in Regression with Grouped Variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

———. 2007. “Model Selection and Estimation in the Gaussian Graphical Model.” Biometrika.

Yun, and Toh. 2009. “A Coordinate Gradient Descent Method for ℓ 1-Regularized Convex Minimization.” Computational Optimization and Applications.

Zhang, Cun-Hui. 2010. “Nearly Unbiased Variable Selection Under Minimax Concave Penalty.” The Annals of Statistics.

Zhang, Yiyun, Li, and Tsai. 2010. “Regularization Parameter Selections via Generalized Information Criterion.” Journal of the American Statistical Association.

Zhang, Lijun, Yang, Jin, et al. 2015. “Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach.” arXiv:1511.03766 [Cs].

Zhang, Cun-Hui, and Zhang. 2014. “Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Zhao, Tuo, Liu, and Zhang. 2018. “Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory.” The Annals of Statistics.

Zhao, Peng, Rocha, and Yu. 2006. “Grouped and Hierarchical Model Selection Through Composite Absolute Penalties.”

———. 2009. “The Composite Absolute Penalties Family for Grouped and Hierarchical Variable Selection.” The Annals of Statistics.

Zhao, Peng, and Yu. 2006. “On Model Selection Consistency of Lasso.” Journal of Machine Learning Research.

Zhou, Tao, and Wu. 2011. “Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction.” Data Mining and Knowledge Discovery.

Zou. 2006. “The Adaptive Lasso and Its Oracle Properties.” Journal of the American Statistical Association.

Zou, and Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Zou, Hastie, and Tibshirani. 2007. “On the ‘Degrees of Freedom’ of the Lasso.” The Annals of Statistics.

Zou, and Li. 2008. “One-Step Sparse Estimates in Nonconcave Penalized Likelihood Models.” The Annals of Statistics.