Anomaly detection vs Supervised learning - when negative examples are

too few go for anamoly detection

Anamoly detection - choosing features - features should have Normal

distribution. Plot histogram and see. If not, try log(x), log(x+c),

x^0.5, x^0.2 etc. Try combination of features : CPU/Net traffic,

CPU^2/Network traffic etc

Multivariate Normal distribution - let's say memory is unusually high

for a given cpu load. But both of them individually have good enough

probability of occurring. But they are at different sides of their

respective bell curves. So we would go for multivariate Normal

distribution.

Each feature modelled independently as gaussian and multiplied is same

as multivariate Gaussian when axes are aligned, i.e. all off diagonal

components are zero.

Multivariate captures correlations between features automatically.

Otherwise you have to create those unusual features manually.

But the original model is computationally cheaper and scales with

large number of features. In MV, you have to do large matrix

operations.

In MV m > n => number of examples should be more than number of

features. Not so in original. Since you can't inverse the matrix.

In MV, the covariance matrix(sigma) should be invertible. It will not

be invertible if there are redundant features, i.e. you have duplicate

features like x2 = x1 or x3 = x4 + x5 etc.