Preprocessing:
1. Tree based models - preprocessing doesn't matter for numeric features. Since they are only trying to split features irrespective of the scale.
2. Non-tree based models - numeric feature preprocessing matters. For e.g. in kNN - scaling one feature alone would result in completely different distances and hence predictions. Same goes for NNs and Linear models.
Most often used preprocessings:
MinMaxScaler => Value - min/(Max - min)
StandardScaler => Using Mean/Std
Rank => sets spaces between sorted values to be equal (handles outliers well)
log(1+x) sqrt(1+x)
Feature generation:
Fraction of price => for product pricing, for e.g. what's the impact of fractional price? So, fractional price is a new feature. .49 in 2.49.
Social media post interval => humans won't post at regular intervals of 1 second.
1. Tree based models - preprocessing doesn't matter for numeric features. Since they are only trying to split features irrespective of the scale.
2. Non-tree based models - numeric feature preprocessing matters. For e.g. in kNN - scaling one feature alone would result in completely different distances and hence predictions. Same goes for NNs and Linear models.
Most often used preprocessings:
MinMaxScaler => Value - min/(Max - min)
StandardScaler => Using Mean/Std
Rank => sets spaces between sorted values to be equal (handles outliers well)
log(1+x) sqrt(1+x)
Feature generation:
Fraction of price => for product pricing, for e.g. what's the impact of fractional price? So, fractional price is a new feature. .49 in 2.49.
Social media post interval => humans won't post at regular intervals of 1 second.
No comments:
Post a Comment