Anomaly detection vs Supervised learning - when negative examples are
too few go for anamoly detection
Anamoly detection - choosing features - features should have Normal
distribution. Plot histogram and see. If not, try log(x), log(x+c),
x^0.5, x^0.2 etc. Try combination of features : CPU/Net traffic,
CPU^2/Network traffic etc
Multivariate Normal distribution - let's say memory is unusually high
for a given cpu load. But both of them individually have good enough
probability of occurring. But they are at different sides of their
respective bell curves. So we would go for multivariate Normal
distribution.
Each feature modelled independently as gaussian and multiplied is same
as multivariate Gaussian when axes are aligned, i.e. all off diagonal
components are zero.
Multivariate captures correlations between features automatically.
Otherwise you have to create those unusual features manually.
But the original model is computationally cheaper and scales with
large number of features. In MV, you have to do large matrix
operations.
In MV m > n => number of examples should be more than number of
features. Not so in original. Since you can't inverse the matrix.
In MV, the covariance matrix(sigma) should be invertible. It will not
be invertible if there are redundant features, i.e. you have duplicate
features like x2 = x1 or x3 = x4 + x5 etc.
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2017
(64)
-
▼
February
(14)
- Andrew Ng course
- Anomaly detection - Andrew Ng
- neural network notes
- Neural network notes - 2
- Neural networks notes 1
- letsencrypt interfact not found
- exponential function and e
- tensorflow notes
- Numpy vs Tensorflow Matrix multiplication
- conda commands
- PyCharm with Anaconda - Using a specific environment
- scala notes
- scala notes
- Changing the port for react app(create-react-app)
-
▼
February
(14)
No comments:
Post a Comment