Linear Models

This section describes multivariate regression analysis using a variety of methods. When the data at hand is fat-tailed or cannot be transformed to be normally distributed, then the assumptions for ordinary least squares regression are violated. In such cases, we can use robust regression and ols-t regression. In other scenarios perhaps we are dealing with issues of multicollinearity or variable proliferation, then we have Lasso and Ridge to choose from.


OLS Regression

Linear Model OLS

Description

Suppose we have the following system: $$ Y = X \beta $$

where

$$ X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} , \beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} , Y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} $$

We solve the quadratic minimization problem given by: $$ \mathrm{\hat{\beta} = \underset{\beta}{\operatorname{arg\min}}\ F(\beta)} $$

where the objective function $ F $ is given by: $$ \mathrm{ F(\beta) = \sum_{i=1}^n \biggl| y_i - \sum_{j=1}^p X_{ij}\beta_j\biggr|^2 = | y - X \beta |^2 } $$

We obtain: $$ \mathrm{ \hat{\beta}= \left( X^{T} X \right)^{-1} X^{T} y } $$

Since matrix inversions are extremely high in time complexity, we use a QR decomposition to solve the system.

Returns

Main regression table and model metrics:

Linear Model OLS output1

Goodness of fit measures and diagnostics:

Linear Model OLS output2

  • coef: coefficients
  • serr: standard errors
  • tstat: t-statistic
  • pval: p-value
  • rse: residual standard error
  • dof: degrees of freedom
  • rsq: r-squared
  • rsqAdj: adjusted r-squared
  • fStat: F-statistic
  • fProb: p-value for model
  • Resid: model residuals
  • StResid: model standardized residuals
  • HatDiag: hat diagonal
  • DFFITS: studentized influence on predicted values
  • DFBETAS: studentized influence on coefficients

OLS with t-distributed errors

Linear Model OLST

Description

We carry out a multivariate regression analysis but we assume that model errors are t-distributed.

$$ y = \mathrm{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \boldsymbol{\epsilon} \sim t(\mu, \sigma, \nu) $$

An iterative method known as iteratively reweighted least squares (IRLS) is carried out until the estimates converge to an acceptable tolerance.

$$ \hat{\beta}^{(t+1)}=(X^{T}(W^{-1})^{t}X)^{-1}X^{T}(W^{-1})^{t}y $$

Returns
  • coef: coefficients
  • serr: standard errors
  • tstat: t-statistic
  • pval: p-value
  • rse: residual standard error
  • dof: degrees of freedom
  • rsq: r-squared
  • rsqAdj: adjusted r-squared
  • fStat: F-statistic
  • fProb: p-value for model
  • Resid: model residuals
  • StResid: model standardized residuals

Robust Regression

Description

The purpose of robust regression methods is to dampen the influence of outliers in data by specifying a weight function. A number of these weight functions have been proposed. We implement four of these:

  • Huber
  • Andrew
  • Ramsay
  • Tukey

An iterative method known as iteratively reweighted least squares (IRLS) is carried out until the estimates converge to an acceptable tolerance.

$$ \mathrm{\hat{\beta}^{t+1}=(X^{\textrm{T}}(W^{-1})^{t}X)^{-1}X^{T}(W^{-1})^{t}y} $$

where

$$ \mathrm{w_{i}^{t}= \begin{cases}\dfrac{\psi((y_{i}-x_{i}^{t}\beta^{t})/\hat{\tau}^{t})}{(y_{i} x_{i}^{t}\beta^{t})/\hat{\tau}^{t}} & {if (y_{i} \neq x_{i}^{\textrm{T}}\beta^{t})} \\ 1 & {if (y_{i}=x_{i}^{\textrm{T}}\beta^{t})} \end{cases}} $$

Returns
  • coef: coefficients
  • serr: standard errors
  • tstat: t-statistic
  • pval: p-value
  • rse: residual standard error
  • dof: degrees of freedom
  • rsq: r-squared
  • rsqAdj: adjusted r-squared
  • fStat: F-statistic
  • fProb: p-value for model
  • Resid: model residuals
  • StResid: model standardized residuals

Ridge Regression and CV

Description

Ridge regression is a form of penalized regression. The parameter that controls this penalty is $\alpha$ which can range from 0 (no penalty) to 1. The penalty prevents against a variable having an outsized coefficient compared to others.

$$ \hat{\beta}_{ridge} = (X^T X + \lambda I_p)^{-1} X^T Y $$

We solve for coefficients that minimize:

$$ \sum_{i=1}^n (y_i - \sum_{j=1}^p x_{ij}\beta_j)^2 + \lambda \sum_{j=1}^p \beta_j^2 $$

Returns
  • coefficients
  • MSE
  • predicted values
Cross Validation Returns
  • best $\alpha$
  • best model MSE
  • coefficients
  • predicted values
  • $\alpha$ path
  • regularization path
  • MSE grid

Lasso Regression and CV

Elastic Net combines $L_{1}$ and $L_{2}$ penalties. The two parameters that control this penalty are $\lambda \in [0, \infty)$ and $\alpha \in [0, 1]$.

$\alpha$ balances Ridge and LASSO penalties with $\alpha = 1$ being LASSO. We use coordinate descent for the updates.

$$ \min_{(\beta_0, \beta) \in \mathbb{R}^{p+1}}\frac{1}{2N} \sum_{i=1}^N (y_i -\beta_0-x_i^T \beta)^2+\lambda \left[ (1-\alpha)||\beta||_2^2/2 + \alpha||\beta||_1\right], $$

where $ \lambda \geq 0 $ is known as the complexity parameter and $ \alpha \in [0, 1]\ $ is the ridge parameter.

If the cross validation option is checked, a 10-fold cross validation is performed. The $\lambda$ path used is generated using the method described by Hastie-Tibshirani [1].

Returns
  • $\alpha$
  • $\lambda$
  • MSE
  • coefficients
  • predicted values
Cross Validation Returns
  • $\alpha = 1$
  • best model $\lambda$
  • best model MSE
  • coefficients
  • predicted values
  • $\lambda$ path
  • regularization path
  • MSE grid

[1] Regularization Paths for Generalized Linear Models via Coordinate Descent (Journal of Statistical Software - Jan 2010)


Least Absolute Deviation Regression

Description

Where OLS regression seeks to minimize the L2 norm, least absolute deviation (LAD) regression seeks to minimize the L1 norm $ S = \sum_{i=1}^n |y_i - f(x_i)| $ . We use iteratively weighted least squares (IRLS) to solve this problem. LAD is more resistant to outliers than OLS and is therefore typically considered a form of robust regression.

Returns
  • coef: coefficients
  • serr: standard errors
  • tstat: t-statistic
  • pval: p-value
  • rse: residual standard error
  • dof: degrees of freedom
  • rsq: r-squared
  • rsqAdj: adjusted r-squared
  • fStat: F-statistic
  • fProb: p-value for model
  • Resid: model residuals
  • StResid: model standardized residuals