Encoding

This page describes routines that deal with handling categorical data so that it can be fed to an algorithm for analysis.


Method: One-Hot

Smooth HP Filter

Description

One-Hot encoder for categorical variables. The ‘Numerical (optional)’ textbox is provided for convenience. Any variables included here (which should be numerical in type) will be return unmutated along with the encoded matrix. This is so that the user does not have to manually copy and paste numerical variables along with the encoded matrix.

Max Cols:

If the resulting columns of the returned sparse matrix gets larger than this number, the routine will terminate and return with an error. This is included as a safeguard against sending in numerical variables as categorical which could potentially create an extremely large result.

Missing Identifier:

Any data entries matching this string will be treating as missing values. In addition, any empty cells will be treating as missing by default.

To Lowercase:

All entries will be transformed to lowercase before being encoded as new variable names.

Returns
  • One-Hot encoded matrix
  • Numerical matrix