Imputing Missing Data
This page describes routines that deal with missing data.
When parsing a selection in Excel, any value that cannot be coerced into a numeric value will be interpreted as NaN and therefore will be imputed.
Method: Value
Description
User provides a value that is filled into missing elements.
Returns
- Matrix with the missing values filled.
Method: Column Mean
Description
Fills in the columnwise means calculated from the non-zero elements.
Returns
- Matrix with the missing values filled.
Method: LOCF
Description
Return a matrix where the last valid observation is carried forward.
Returns
- Matrix with the missing values filled.
Method: Regress
Description
Return a matrix where the NaN values have been imputed by iterative regression.
We use deterministic regression imputation as opposed to stochastic regression imputation.
There are three methods available:
- OLS
- Ridge {$0 \leq \alpha \leq 1$}
- Lasso {$0 \leq \alpha \leq 1$; $\lambda > 0$}
Lasso will standardize the data.
Generally speaking it is not recommended to have an intercept in your data if you use Ridge or Lasso regression. If you include one, make sure it fits your use case.
Returns
- Matrix with the missing values imputed.
Method: Remove Rows with NaN values
Description
Return a matrix where every row with an NaN value is found is removed.
Returns
- Matrix with no missing values.