--- title: "LAVA: Recovering Sums of Sparse and Dense Signals" subtitle: "The lava() and group_lava() estimators in quadrupen" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{lava} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ## Motivation Classical sparse estimators such as the Lasso assume that the true coefficient vector $\theta$ is itself sparse: only a few entries are nonzero. This assumption fails when the signal has a **mixed structure** — a small number of large, identifiable effects superimposed on a broad background of many small effects. In this case neither the Lasso (too sparse) nor ridge regression (too dense) is well-suited. The **LAVA** estimator (Chernozhukov, Hansen & Liao, *Annals of Statistics*, 2017) explicitly decomposes the coefficient vector as $$\theta = \beta + \delta,$$ where $\beta$ is the **sparse** component (penalized by $\lambda_1 \|\beta\|_1$) and $\delta$ is the **dense** component (penalized by $\frac{\lambda_2}{2} \delta^\top S\, \delta$). The joint criterion is $$\min_{\theta = \beta + \delta} \frac{1}{2n} \|y - X(\beta + \delta)\|^2 + \lambda_1 \|\beta\|_1 + \frac{\lambda_2}{2}\, \delta^\top S\, \delta,$$ where $S$ is a positive semidefinite structuring matrix (identity by default). By varying $\lambda_1$ along a regularization path (with $\lambda_2$ fixed), **quadrupen** fits this model via `lava()` and returns an object of class `LavaFit` that gives access to both components separately. ## Setup ```{r setup, message = FALSE} library(quadrupen) library(ggplot2) ``` ### Simulation: a mixed sparse + dense signal We simulate a linear model where the true signal is the sum of a sparse part $\beta$ (five isolated large effects) and a dense part $\delta$ (small Gaussian perturbations on every predictor). Neither component alone characterizes the signal. ```{r simulation} set.seed(42) n <- 100 p <- 50 ## Sparse component: 5 large effects, the rest zero beta <- numeric(p) beta[c(5, 15, 25, 35, 45)] <- c(3, -3, 3, -3, 3) ## Dense component: small effects on all predictors delta <- rnorm(p, mean = 0, sd = 0.2) ## Total signal theta <- beta + delta ## Predictors with moderate AR(1) correlation rho <- 0.5 Sigma <- toeplitz(rho^(0:(p - 1))) X <- matrix(rnorm(n * p), n, p) %*% chol(Sigma) y <- X %*% theta + rnorm(n, sd = 2) ``` ```{r plot-truth, fig.height = 5} df_truth <- data.frame( index = rep(1:p, 3), value = c(beta, delta, theta), component = factor( rep(c("Sparse (β)", "Dense (δ)", "Total (θ = β + δ)"), each = p), levels = c("Sparse (β)", "Dense (δ)", "Total (θ = β + δ)") ) ) ggplot(df_truth, aes(x = index, y = value, fill = component)) + geom_col() + facet_wrap(~ component, ncol = 1, scales = "free_y") + scale_fill_manual(values = c("#56B4E9", "#E69F00", "#009E73"), guide = "none") + labs(x = "Predictor index", y = "Value", title = "True signal decomposition") + theme_bw() ``` ## Fitting with `lava()` `lava()` takes the same interface as `sparse_lm()`. The `lambda2` argument controls the ridge-like penalty on the dense component. ```{r lava-fit} fit_lava <- lava(X, y, lambda2 = 1) fit_lava ``` The returned object is an R6 instance of class `LavaFit`, inheriting from `QuadrupenFit`. In addition to all the standard fields and methods (`$coefficients`, `$criteria()`, `$cross_validate()`, etc.), it exposes two component-specific fields: - `$sparse_coef`: the sparse part $\hat\beta$ — a $p \times |\lambda_1|$ matrix along the path. - `$dense_coef`: the dense part $\hat\delta = \hat\theta - \hat\beta$ — same dimensions. ## Regularization paths by component `$plot_path()` accepts a `component` argument (`"both"`, `"sparse"`, `"dense"`) to display each part of the decomposition independently. ```{r paths, fig.show='hold', out.width='33%'} fit_lava$plot_path(component = "both", labels = NULL) fit_lava$plot_path(component = "sparse", labels = NULL) fit_lava$plot_path(component = "dense", labels = NULL) ``` As $\lambda_1$ increases (right to left on the path): - The **sparse** path shows the usual Lasso-like variable selection: large effects enter early and small ones remain zero. - The **dense** path is smooth and non-zero everywhere: the ridge penalty ensures that $\hat\delta$ is distributed across all predictors. - The **total** path combines both: it never reaches exactly zero because the dense component picks up residual signal. ## Model Selection Let us plot the information criteria along the path. The BIC and eBIC select a model with a couple of large effects in the sparse part, while AIC selects a more complex model with more non-zero entries in the sparse component. mBIC is more conservative, as expected. ```{r lava-criteria} fit_lava$information_criteria$plot(c("AIC", "BIC", "eBIC", "mBIC")) ``` ## Model extraction and decomposition `$get_model()` extracts the total estimated coefficient $\hat\theta$ at a selected $\lambda_1$. The corresponding sparse and dense components can be recovered from `$sparse_coef` and `$dense_coef` at the same index. ```{r extraction} idx <- fit_lava$get_model("BIC", type = "index") theta_hat <- fit_lava$get_model("BIC")[-1] # drop intercept beta_hat <- as.numeric(fit_lava$sparse_coef[, idx]) delta_hat <- as.numeric(fit_lava$dense_coef[, idx]) cat("Non-zero entries in sparse component (BIC):", sum(beta_hat != 0), "\n") cat("Non-zero entries in dense component (BIC): ", sum(delta_hat != 0), "\n") cat("Positions with |beta_hat| > 1:", which(abs(beta_hat) > 1), "\n") ``` The sparse component correctly locates the five large effects — those with `|beta_hat| > 1` match the true positions exactly. The dense component is non-zero everywhere, absorbing the background of small signals. ## Comparing estimators We compare LAVA against the Lasso and Elastic-net at their BIC-selected models. In the LAVA panel, bars are split into two layers drawn with `geom_col`: the **total** estimate $\hat\theta = \hat\beta + \hat\delta$ is drawn first (pink, dense component), and the **sparse** component $\hat\beta$ is overlaid on top (green). The remaining pink portion at the tips of active bars represents the dense contribution $\hat\delta$; the small pink bars at inactive positions are the dense signal alone. ```{r comparison-fit} fit_lasso <- lasso(X, y) fit_enet <- elastic_net(X, y, lambda2 = 1) ``` ```{r comparison-plot, fig.height = 6} get_coef <- function(fit) { b <- fit$get_model("BIC") if ("Intercept" %in% names(b)) b <- b[-1] as.numeric(b) } method_levels <- c("True signal", "Lasso (BIC)", "Elastic-net (BIC)", "LAVA (BIC)") ## Non-LAVA panels: one bar per predictor df_base <- rbind( data.frame(method = "True signal", index = 1:p, value = theta, fill_grp = "True signal"), data.frame(method = "Lasso (BIC)", index = 1:p, value = get_coef(fit_lasso), fill_grp = "Lasso"), data.frame(method = "Elastic-net (BIC)", index = 1:p, value = get_coef(fit_enet), fill_grp = "Elastic-net") ) df_base$method <- factor(df_base$method, levels = method_levels) ## LAVA panel: dense total drawn first, sparse component overlaid on top df_lava_dense <- data.frame( method = factor("LAVA (BIC)", levels = method_levels), index = 1:p, value = as.numeric(theta_hat), fill_grp = "LAVA dense (δ̂)" ) df_lava_sparse <- data.frame( method = factor("LAVA (BIC)", levels = method_levels), index = 1:p, value = beta_hat, fill_grp = "LAVA sparse (β̂)" ) fill_palette <- c( "True signal" = "grey20", "Lasso" = "#56B4E9", "Elastic-net" = "#E69F00", "LAVA dense (δ̂)" = "#CC79A7", "LAVA sparse (β̂)" = "#009E73" ) ggplot(mapping = aes(x = index, y = value, fill = fill_grp)) + geom_col(data = df_base) + geom_col(data = df_lava_dense) + geom_col(data = df_lava_sparse) + facet_wrap(~ method, ncol = 2) + scale_fill_manual( values = fill_palette, breaks = c("LAVA sparse (β̂)", "LAVA dense (δ̂)"), name = "LAVA decomposition" ) + labs(x = "Predictor index", y = "Estimated coefficient", title = "Coefficient recovery: Lasso vs. Elastic-net vs. LAVA") + theme_bw() + theme(legend.position = "bottom") ``` - The **Lasso** selects a handful of predictors but cannot represent the dense background. - The **Elastic-net** spreads a small weight across more predictors, but conflates the two structural components. - **LAVA** explicitly decomposes the signal: the green bars identify the large sparse effects precisely; the pink portions (small rims at active positions, full bars at inactive positions) capture the diffuse dense background. ## Cross-validation `$cross_validate()` works identically to the other `QuadrupenFit` subclasses. ```{r cv, message = FALSE} set.seed(42) fit_lava$cross_validate(K = 10, verbose = FALSE) fit_lava$plot(type = "crossval") ``` ```{r cv-model} set.seed(42) fit_lasso$cross_validate(K = 10, verbose = FALSE) fit_enet$cross_validate(K = 10, verbose = FALSE) idx_cv <- fit_lava$get_model("CV_min", type = "index") beta_cv <- as.numeric(fit_lava$sparse_coef[, idx_cv]) cat("Non-zero entries in sparse component (CV_min):", sum(beta_cv != 0), "\n") cat("Positions with |beta_cv| > 1:", which(abs(beta_cv) > 1), "\n") ``` --- ## Group LAVA: group-sparse + dense decomposition `group_lava()` extends LAVA by replacing the element-wise $\ell_1$ penalty on $\beta$ with a group-wise mixed norm $\Omega_g(\beta)$: $$\min_{\theta = \beta + \delta} \frac{1}{2n} \|y - X(\beta + \delta)\|^2 + \lambda_1\, \Omega_g(\beta) + \frac{\lambda_2}{2}\, \delta^\top S\, \delta.$$ This is appropriate when the sparse component has **group structure**: entire groups of predictors are either active or silent. The `type` argument selects the group norm ($\ell_1/\ell_2$ by default, or $\ell_1/\ell_\infty$). ### Simulation with a group-sparse signal ```{r group-simulation} ## 5 groups of 10 predictors; groups 2 and 4 are active in the sparse part group <- rep(1:5, each = 10) beta_g <- rep(c(0, 2, 0, -2, 0), each = 10) delta_g <- rnorm(p, mean = 0, sd = 0.2) theta_g <- beta_g + delta_g y2 <- X %*% theta_g + rnorm(n, sd = 2) group_names <- paste0("grp", 1:5) var_labels <- group_names[group] ``` ### Fitting ```{r group-lava-fit} fit_gl <- group_lava(X, y2, group = group, lambda2 = 1) fit_gl ``` ### Paths of the sparse and dense components ```{r group-paths, fig.show='hold', out.width='50%'} fit_gl$plot_path(component = "sparse", labels = var_labels) fit_gl$plot_path(component = "dense", labels = var_labels) ``` The sparse path shows entire groups entering the model simultaneously (Group Lasso behaviour), while the dense path remains diffuse and non-zero everywhere. ### Model selection For group selection problems, eBIC (which carries a stronger penalty of $\log(n) + 2\log(p)$) is generally more conservative than BIC and better adapted to identifying which groups are truly active. ### Group identification ```{r group-extraction} idx_ebic <- fit_gl$get_model("eBIC", type = "index") beta_g_hat <- as.numeric(fit_gl$sparse_coef[, idx_ebic]) active_groups <- unique(group[beta_g_hat != 0]) cat("Active groups in sparse component (eBIC):", group_names[active_groups], "\n") ``` --- ## Reference Chernozhukov, V., Hansen, C., & Liao, Y. (2017). A lava attack on the recovery of sums of dense and sparse signals. *The Annals of Statistics*, 45(1), 39–76.