Software Troubles and Troubleshooting: mean encoding

Thursday, May 2, 2019

mean encoding

A very popular/important video:
https://www.coursera.org/learn/competitive-data-science/lecture/LGYQ2/regularization

Mean encoding regularization
CV loop
LOO - Leave one out - using target variable to generate the new feature makes our encoding biased.

Smoothing.
Noise.
Expanding mean.
------------------
generalizations and extensions of mean encodings: for regression/multiclass.
Many to many relations: for e.g. classification of users based on the apps installed on their phones. Each user can have multiple apps, each app can be installed by many users. Hence, many-to-many relation.

In this case, convert data to long representation. So that, each row will have <user_id, app_id, target> like <uid1, app_id1, target(0 or 1)>. Now you can take mean of targets for every app. But how to map it back to users?

Interactions and numerical features -?

Software Troubles and Troubleshooting

Thursday, May 2, 2019

mean encoding

No comments:

Blog Archive