Reducing Uncertainty with Bayesian and Penalized Regression

Regression

Regression

\[ Y = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

\[ \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix} 1 & X_{11} & X_{12} & \dots & X_{1p} \\ 1 & X_{21} & X_{22} & \dots & X_{2p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & X_{n1} & X_{n2} & \dots & X_{np} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \vdots \\ \epsilon_n \end{bmatrix} \]

Beta

\[ \boldsymbol{\hat{\beta}} = (\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{Y} \]

Curse of Dimensionality

Solutions

Weakly Informative Priors

Penalized Regression

Weakly Informative Priors

Bayes' Theorem

\[ \text{P}(A|B) = \frac{\text{P}(B|A)\text{P}(A)}{\text{P}(B)} \]

Bayes' Theorem

\[ \text{P}(AB) = \text{P}(A)\text{P}(B|A) = \text{P}(B)\text{P}(A|B) \]

\[ \text{P}(A|B) = \frac{\text{P}(B|A)\text{P}(A)}{\text{P}(B)} \]

Bayes' Theorem

\[ \pi(\theta | x) = \frac{f(x|\theta)\pi(\theta)}{m(x)} \]

Bayes' Theorem

\[ \pi(\theta|x) = \frac{f(x|\theta)\pi(\theta)}{\int_\theta f(x|\theta)\pi(\theta)d\theta} \]

Bayes' Theorem

\[ \color{red}{\pi(\theta|x)} = \frac{\color{blue}{f(x|\theta)}\color{green}{\pi(\theta)}}{\color{gray}{\int_\theta f(x|\theta)\pi(\theta)d\theta}} \]

Bayes' Theorem

\[ \color{red}{\text{Posterior}} = \frac{\color{blue}{\text{Likelihood}} * \color{green}{\text{Prior}}}{\color{gray}{\text{Normalizing Constant}}} \]

Bayes' Theorem

\[ \color{red}{\text{Posterior}} \propto \color{blue}{\text{Likelihood}} * \color{green}{\text{Prior}} \]

Bayesian Regression

\[ Y \sim{} \text{N}(\boldsymbol{X}\boldsymbol{\beta}, \sigma) \]

\[ \beta \sim{} \text{cauchy}(l, s) \]

Priors

Posterior

\[ \pi(\theta | x) \propto \frac{1}{\sigma}e^{\frac{-(x-\theta)^2}{2\sigma^2}} \dfrac{s}{\big(s^2 + (\theta - l)^2\big)} \]

Coefficient Posterior Density

Coefplot

Secret Weapon

Stan

data
{
    int<lower=0> N;
    ...
}

parameters
{
    real alpha_std;
    ...
}

model
{
    alpha ~ normal(0, 10);
    ...
}

Penalized Regression

OLS

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[\sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 \right] \]

Penalized Regression

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[\sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 + \lambda \sum_{j=1}^p |\beta_j|^q \right] \]

Penalty Shapes

Most Common Penalties

\(\text{ Lasso: } |\beta_j|\) \(\text{Ridge: } \beta_j^2\)

Elastic Net

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[ \frac{1}{2N} \sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 + \lambda P_{\alpha} \left(\beta \right) \right] \] where \[ P_{\alpha} \left(\beta \right) = \left(1 - \alpha \right) \frac{1}{2}||\Gamma\beta||_{\mathit{l}_2}^2 + \alpha ||\Gamma\beta||_{\mathit{l}_1} \]

Coefficient Path

Coefplot

Bayesian Interpretation of Penalized Regression

Bayesian Interpretation

\[ \text{Posterior Mode: } \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[\sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 + \lambda \sum_{j=1}^p |\beta_j|^q \right] \]

\[ \text{Log-prior: } \lambda \sum_{j=1}^p |\beta_j|^q \]

Priors

	Ridge	Lasso
Penalty	\(\beta_j^2\)	\(\|\beta_j\|\)
Prior	\(\text{N}(\boldsymbol{0}, \frac{1}{2p} \boldsymbol{I}_p)\)	\(\frac{\lambda}{2}e^{-\lambda \|\beta\|}\)

Priors

Benefits

Tighter Confidence (or Credible) Intervals
Stronger Predictive Power
Greater Interpretability

Jared P. Lander

CEO of Lander Analytics - Advanced Data Science
Author of R for Everyone
Adjunct Professor at Columbia University
Organizer of New York Open Statistical Programming (The R) Meetup
Website: http://www.jaredlander.com

Regression

Regression

Regression

Beta

Curse of Dimensionality

Curse of Dimensionality

Solutions

Bayes' Theorem

Bayes' Theorem

Bayes' Theorem

Bayes' Theorem

Bayes' Theorem

Bayes' Theorem

Bayes' Theorem

Bayesian Regression

Priors

Posterior

Coefficient Posterior Density

Coefplot

Secret Weapon

Secret Weapon

Secret Weapon

Stan

Stan

OLS

Penalized Regression

Penalty Shapes

Most Common Penalties

Elastic Net

Coefficient Path

Coefplot

Coefplot

Bayesian Interpretation

Priors

Priors

Benefits

Further Reading

Jared P. Lander

The Tools