First, ordinal is a special case of categorical feature but with values sorted in some meaningful order.
- for e.g. 1st class, 2nd class in railways.
Second, label encoding, basically replace the unique values of categorical features with numbers.
- either by sorting them alphabetically or assigning a code in order of appearance.
Third, frequency encoding - maps unique values to their frequencies.
- for e.g. how many times 1st class occurred.
Fourth, label encoding and frequency encoding are often used for tree-based methods.
Fifth, One-hot encoding is often used for non-tree-based-methods.
And finally, applying One-hot encoding combination on combinations of categorical features allows non-tree- based-models to take into consideration interactions between features, and improve.
- for e.g. in titanic dataset - you could create a new categorical feature by combining sex and pclass.
If pclass = 1,2,3 and sex = M,F
then features could be:
1M, 1F, 2M, 2F, 3M, 3F and we could use one-hot encoding here.
One-hot encodings can be stored as Sparse metrices(which use the storage efficiently when number of non-zero values are less than half of total values).
- for e.g. 1st class, 2nd class in railways.
Second, label encoding, basically replace the unique values of categorical features with numbers.
- either by sorting them alphabetically or assigning a code in order of appearance.
Third, frequency encoding - maps unique values to their frequencies.
- for e.g. how many times 1st class occurred.
Fourth, label encoding and frequency encoding are often used for tree-based methods.
Fifth, One-hot encoding is often used for non-tree-based-methods.
And finally, applying One-hot encoding combination on combinations of categorical features allows non-tree- based-models to take into consideration interactions between features, and improve.
- for e.g. in titanic dataset - you could create a new categorical feature by combining sex and pclass.
If pclass = 1,2,3 and sex = M,F
then features could be:
1M, 1F, 2M, 2F, 3M, 3F and we could use one-hot encoding here.
One-hot encodings can be stored as Sparse metrices(which use the storage efficiently when number of non-zero values are less than half of total values).
No comments:
Post a Comment