This page describes routines that deal with handling categorical data so that it can be fed to an algorithm for analysis.
One-Hot encoder for categorical variables. The ‘Numerical (optional)’ textbox is provided for convenience. Any variables included here (which should be numerical in type) will be return unmutated along with the encoded matrix. This is so that the user does not have to manually copy and paste numerical variables along with the encoded matrix.
If the resulting columns of the returned sparse matrix gets larger than this number, the routine will terminate and return with an error. This is included as a safeguard against sending in numerical variables as categorical which could potentially create an extremely large result.
Any data entries matching this string will be treating as missing values. In addition, any empty cells will be treating as missing by default.
All entries will be transformed to lowercase before being encoded as new variable names.
- One-Hot encoded matrix
- Numerical matrix