Thursday, May 9, 2019

grep exclude file extensions

grep SendBuildStatusUpdate -r --exclude=\*.{ini,cs} .

Thursday, May 2, 2019

Advanced features

https://www.coursera.org/learn/competitive-data-science/lecture/mpCps/statistics-and-distance-based-features

Statistics and distance based features: groupby and nearest neighbor methods
Neighbors - for e.g. to predict rental prices, features could be number of schools/hospitals in a radius.
CTR example - ad price, ad position, user_id, page_id - you can use group by on user/page to add new features. Or even the previous history of the user.

Bray curtis metric.
-------------------------------------
Matrix factorizations: documents/words - dimensionality reduction.

mean encoding

A very popular/important video:
https://www.coursera.org/learn/competitive-data-science/lecture/LGYQ2/regularization

Mean encoding regularization
CV loop
LOO - Leave one out - using target variable to generate the new feature makes our encoding biased.

Smoothing.
Noise.
Expanding mean.
------------------
generalizations and extensions of mean encodings: for regression/multiclass.
Many to many relations: for e.g. classification of users based on the apps installed on their phones. Each user can have multiple apps, each app can be installed by many users. Hence, many-to-many relation.

In this case, convert data to long representation. So that, each row will have <user_id, app_id, target> like <uid1, app_id1, target(0 or 1)>. Now you can take mean of targets for every app. But how to map it back to users?

Interactions and numerical features -?


Blog Archive