# Linear Models

This section describes multivariate regression analysis using a variety of methods. When the data at hand is fat-tailed or cannot be transformed to be normally distributed, then the assumptions for ordinary least squares regression are violated. In such cases, we can use robust regression and ols-t regression. In other scenarios perhaps we are dealing with issues of multicollinearity or variable proliferation, then we have Lasso and Ridge to choose from.

## OLS Regression

##### Description

Suppose we have the following system: $$ Y = X \beta $$

where

$$ X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} , \beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} , Y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} $$

We solve the quadratic minimization problem given by: $$ \mathrm{\hat{\beta} = \underset{\beta}{\operatorname{arg\min}}\ F(\beta)} $$

where the objective function $ F $ is given by: $$ \mathrm{ F(\beta) = \sum_{i=1}^n \biggl| y_i - \sum_{j=1}^p X_{ij}\beta_j\biggr|^2 = | y - X \beta |^2 } $$

We obtain: $$ \mathrm{ \hat{\beta}= \left( X^{T} X \right)^{-1} X^{T} y } $$

Since matrix inversions are extremely high in time complexity, we use a QR decomposition to solve the system.

##### Returns

Main regression table and model metrics:

Goodness of fit measures and diagnostics:

- coef: coefficients
- serr: standard errors
- tstat: t-statistic
- pval: p-value
- rse: residual standard error
- dof: degrees of freedom
- rsq: r-squared
- rsqAdj: adjusted r-squared
- fStat: F-statistic
- fProb: p-value for model
- Resid: model residuals
- StResid: model standardized residuals
- HatDiag: hat diagonal
- DFFITS: studentized influence on predicted values
- DFBETAS: studentized influence on coefficients

## OLS with t-distributed errors

##### Description

We carry out a multivariate regression analysis but we assume that model errors are t-distributed.

$$ y = \mathrm{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \boldsymbol{\epsilon} \sim t(\mu, \sigma, \nu) $$

An iterative method known as iteratively reweighted least squares (IRLS) is carried out until the estimates converge to an acceptable tolerance.

$$ \hat{\beta}^{(t+1)}=(X^{T}(W^{-1})^{t}X)^{-1}X^{T}(W^{-1})^{t}y $$

##### Returns

- coef: coefficients
- serr: standard errors
- tstat: t-statistic
- pval: p-value
- rse: residual standard error
- dof: degrees of freedom
- rsq: r-squared
- rsqAdj: adjusted r-squared
- fStat: F-statistic
- fProb: p-value for model
- Resid: model residuals
- StResid: model standardized residuals

## Robust Regression

##### Description

The purpose of robust regression methods is to dampen the influence of outliers in data by specifying a weight function. A number of these weight functions have been proposed. We implement four of these:

- Huber
- Andrew
- Ramsay
- Tukey

An iterative method known as iteratively reweighted least squares (IRLS) is carried out until the estimates converge to an acceptable tolerance.

$$ \mathrm{\hat{\beta}^{t+1}=(X^{\textrm{T}}(W^{-1})^{t}X)^{-1}X^{T}(W^{-1})^{t}y} $$

where

$$ \mathrm{w_{i}^{t}= \begin{cases}\dfrac{\psi((y_{i}-x_{i}^{t}\beta^{t})/\hat{\tau}^{t})}{(y_{i} x_{i}^{t}\beta^{t})/\hat{\tau}^{t}} & {if (y_{i} \neq x_{i}^{\textrm{T}}\beta^{t})} \\ 1 & {if (y_{i}=x_{i}^{\textrm{T}}\beta^{t})} \end{cases}} $$

##### Returns

- coef: coefficients
- serr: standard errors
- tstat: t-statistic
- pval: p-value
- rse: residual standard error
- dof: degrees of freedom
- rsq: r-squared
- rsqAdj: adjusted r-squared
- fStat: F-statistic
- fProb: p-value for model
- Resid: model residuals
- StResid: model standardized residuals

## Ridge Regression and CV

##### Description

Ridge regression is a form of penalized regression. The parameter that controls this penalty is $\alpha$ which can range from 0 (no penalty) to 1. The penalty prevents against a variable having an outsized coefficient compared to others.

$$ \hat{\beta}_{ridge} = (X^T X + \lambda I_p)^{-1} X^T Y $$

We solve for coefficients that minimize:

$$ \sum_{i=1}^n (y_i - \sum_{j=1}^p x_{ij}\beta_j)^2 + \lambda \sum_{j=1}^p \beta_j^2 $$

##### Returns

- coefficients
- MSE
- predicted values

##### Cross Validation Returns

- best $\alpha$
- best model MSE
- coefficients
- predicted values
- $\alpha$ path
- regularization path
- MSE grid

## Lasso Regression and CV

Elastic Net combines $L_{1}$ and $L_{2}$ penalties. The two parameters that control this penalty are $\lambda \in [0, \infty)$ and $\alpha \in [0, 1]$.

$\alpha$ balances Ridge and LASSO penalties with $\alpha = 1$ being LASSO. We use coordinate descent for the updates.

$$ \min_{(\beta_0, \beta) \in \mathbb{R}^{p+1}}\frac{1}{2N} \sum_{i=1}^N (y_i -\beta_0-x_i^T \beta)^2+\lambda \left[ (1-\alpha)||\beta||_2^2/2 + \alpha||\beta||_1\right], $$

where $ \lambda \geq 0 $ is known as the complexity parameter and $ \alpha \in [0, 1]\ $ is the ridge parameter.

If the cross validation option is checked, a 10-fold cross validation is performed. The $\lambda$ path used is generated using the method described by Hastie-Tibshirani [1].

##### Returns

- $\alpha$
- $\lambda$
- MSE
- coefficients
- predicted values

##### Cross Validation Returns

- $\alpha = 1$
- best model $\lambda$
- best model MSE
- coefficients
- predicted values
- $\lambda$ path
- regularization path
- MSE grid

[1] Regularization Paths for Generalized Linear Models via Coordinate Descent (Journal of Statistical Software - Jan 2010)

## Least Absolute Deviation Regression

##### Description

Where OLS regression seeks to minimize the L2 norm, least absolute deviation (LAD) regression seeks to minimize the L1 norm $ S = \sum_{i=1}^n |y_i - f(x_i)| $ . We use iteratively weighted least squares (IRLS) to solve this problem. LAD is more resistant to outliers than OLS and is therefore typically considered a form of robust regression.

##### Returns

- coef: coefficients
- serr: standard errors
- tstat: t-statistic
- pval: p-value
- rse: residual standard error
- dof: degrees of freedom
- rsq: r-squared
- rsqAdj: adjusted r-squared
- fStat: F-statistic
- fProb: p-value for model
- Resid: model residuals
- StResid: model standardized residuals