Sunday, January 6, 2019

Numeric features - Competitive DS course

Preprocessing:
1. Tree based models - preprocessing doesn't matter for numeric features. Since they are only trying to split features irrespective of the scale.

2. Non-tree based models - numeric feature preprocessing matters. For e.g. in kNN - scaling one feature alone would result in completely different distances and hence predictions. Same goes for NNs and Linear models.

Most often used preprocessings:
MinMaxScaler => Value - min/(Max - min)
StandardScaler => Using Mean/Std
Rank => sets spaces between sorted values to be equal (handles outliers well)
log(1+x) sqrt(1+x)

Feature generation:
Fraction of price => for product pricing, for e.g. what's the impact of fractional price? So, fractional price is a new feature. .49 in 2.49.

Social media post interval => humans won't post at regular intervals of 1 second.


No comments:

Blog Archive