## Monday, February 27, 2017

### Andrew Ng course

Online learning doesn't require learning rate configuration?

ceiling analysis - machine learning pipeline etc

### Anomaly detection - Andrew Ng

Anomaly detection vs Supervised learning - when negative examples are
too few go for anamoly detection

Anamoly detection - choosing features - features should have Normal
distribution. Plot histogram and see. If not, try log(x), log(x+c),
x^0.5, x^0.2 etc. Try combination of features : CPU/Net traffic,
CPU^2/Network traffic etc

Multivariate Normal distribution - let's say memory is unusually high
for a given cpu load. But both of them individually have good enough
probability of occurring. But they are at different sides of their
respective bell curves. So we would go for multivariate Normal
distribution.

Each feature modelled independently as gaussian and multiplied is same
as multivariate Gaussian when axes are aligned, i.e. all off diagonal
components are zero.

Multivariate captures correlations between features automatically.
Otherwise you have to create those unusual features manually.

But the original model is computationally cheaper and scales with
large number of features. In MV, you have to do large matrix
operations.

In MV m > n => number of examples should be more than number of
features. Not so in original. Since you can't inverse the matrix.

In MV, the covariance matrix(sigma) should be invertible. It will not
be invertible if there are redundant features, i.e. you have duplicate
features like x2 = x1 or x3 = x4 + x5 etc.

## Wednesday, February 15, 2017

### neural network notes

Study back propagation and implement gradient descent.
Implement dropout.
Cross entropy is an alternative to quadratic cost function for faster learning.

Softmax is a different activation(output) function. An alternative to Sigmoid. Sum of outputs is always 1. Hence can be thought of as a probability distribution. In a Sigmoid layer, output activations won't always sum to 1.

2 good combinations in NN are : Softmax + Log likelihood cost & Sigmoid + Quadratic cost
Usually Softmax + Log Likelihood is good for multi class classification problems.

Validation_data vs. test_data

Validation_data for tuning hyper parameters like learning rate
test_data for evaluation

Avoiding overfitting
Best way to avoid over fitting is to have larger training sets.

Regularization is another way to prevent over fitting since it pushes towards smaller weights. It means small changes in inputs will yield small changes in output. If the weights are large, small changes in input may result in large changes in output. So it's helping the model avoid the effects of noise.

L2 Regularization - add weight^2 to cost
L2 Regularization - add |weight| to cost

You could train multiple Neural networks and do a voting on their results.
Similarly, there is Dropout in which you remove half the neurons at a time which helps you adjust the weights in an average way.

Expand the data set - for images add rotations/scaling/elastic distortions, for speech - vary the speed up/down, add noise

Weight Initialization
Explore Gaussian.

In a multi layer NN, initial layers' learning can explode or vanish - the learning rates may be too high as compared to others or too low.

Convolutional networks

1. Local receptive fields, stride length
2. Shared weights and biases - all neurons in a hidden layer will have same weights and biases. So that all of them can detect the same feature at different locations. They are protected against translational changes. An image shifted slightly to right or left of something is still the image of the same thing.
3. Map from input layers to hidden layers is called feature map. Shared weights and biases constitute a kernel/filter.
4. One input layer can be mapped to multiple hidden layers. That enables detection of multiple features.
5. Later layers could be pooling layers - map 2x2 inputs to one neuron/pixel.
Recurrent Neural Networks(RNNs)
1. Output of a neuron might be determined by its earlier value. Time based. Might fit Speech and Natural language problems.
Deep Belief Networks (DBNs)
1. Generative - not only recognize digits, but able to produce them as well.
2. Able to do unsupervised learning too.
3. Restricted Boltzmann machines are a key component of DBNs.

What's going on with NNs
1. Playing video games
2. NLP

## Tuesday, February 14, 2017

### Neural network notes - 2

http://neuralnetworksanddeeplearning.com/chap1.html#eqtn3

Feed forward vs Recurrent nets
Feed forward simply give output as input to the next layer
Recurrent nets can give output to previous layer and that could come back as input after some time to the same layer.
But as of now algorithms are not good enough for Recurrent.

### Neural networks notes 1

http://neuralnetworksanddeeplearning.com/chap1.html

Perceptrons cam compute anything since they are NAND gates. NN is a
network of perceptrons which can adjust weights and biases, hence
better than a conventional laid out circuit. Their inputs and outputs
are 0/1. Output = w.x + b where w.x is dot product of weights and
inputs and b is bias. Bias is -threshold.

But inputs to/outputs from Sigmoid neurons can be 0.683, i.e. anything
between 0 to 1. Output activation function = 1/(1 + e^(-z)) where z =
w.x + b. If we plot it, it's a smoothed version of step function or
Perceptron. Which gives it the property that small changes in inputs
result in small changes in output, unlike Perceptron. This property is
helpful in tuning of a NN, otherwise small changes in inputs will
result in significant changes down the line.

Still, Sigmoids and Perceptrons are similar in the sense that for
large z, output is 1 for small z output is 0.

Essentially, Δoutput is a linear function of the changes Δwj and Δb in
the weights and bias.

We can use other activation functions too, but σ(z) ≡ 1/(1+e^-z) is
popular since exponential has nice differential properties.

rm -rf /root/.local/share/letsencrypt
wget https://raw.githubusercontent.com/letsencrypt/letsencrypt/master/letsencrypt-auto
chmod +x letsencrypt-auto
./letsencrypt-auto --debug renew

### exponential function and e

f'(e^x) = e^x, rate of change at any x = e^x
integration(1/x) = ln(x), i.e. area under the curve from x = 1 to x =
k is ln(k), for k = e, area is 1.
slope at x=0 of e^x is 1
compound interest, (1 + 1/n)^n = e as n approaches infinity
e = sigma(1/fact(n))
e^x = sigma(similar)

## Sunday, February 12, 2017

### Numpy vs Tensorflow Matrix multiplication

Numpy is much much faster (Note: Used CPU version, not the GPU version)
import numpy
import tensorflow as tf
import time

def getTestData():
A = [[1., 2., 3., 4.],[3.,4.,5.,6.],[7.,8,9.,10.],[11.,12.,13.,14.]]
return 6,A

def tfMatMul():
n,A = getTestData()
A = tf.constant(A)
sess = tf.Session()
for num in range(1,n):
A = tf.matmul(A,A)
output = sess.run(A)
A = tf.convert_to_tensor(output)
sess.close()
return output

def numPyMatMul():
n,A = getTestData()
for num in range(1,n):
A = numpy.matmul(A,A)
return A

def timedRun(methodToRun):
start = time.time()
result = methodToRun()
end = time.time()
diff = end - start
print("Time Taken :"+str(diff))
print(result)

timedRun(numPyMatMul)
timedRun(tfMatMul)

## Saturday, February 11, 2017

### conda commands

conda create --name test35 python=3.5
conda info --envs (list)
activate test35
deactivate test35
conda remove --name test35 --all
conda install -y scipy

### PyCharm with Anaconda - Using a specific environment

Search for "Project Interpreter"
Click on Settings Icon(Wheel)
Choose python.exe in your env, path is like Anaconda3\envs\<env_name>\python.exe

## Thursday, February 9, 2017

### scala notes

https://www.safaribooksonline.com/library/view/scala-for-the/9780134510613/

for yield => map
for yield with guard clause => filter
reduceLeft

clojures - how are they implemented? In scala, they are objects which capture the method and bindings of free variables.

Expression evalutation (Recursive data structures)
abstract class Expr
case class Num(value: Int) extends Expr
case class Sum(left: Expr, right: Expr) extends Expr
case class Product(left: Expr, right: Expr) extends Expr

val e = Product(Num(3), Sum(Num(4), Num(5)))
def eval(e: Expr):Int = e match {
case Num(v) => v
case Sum(l,r) => eval(l) + eval(r)
case Product(l,r) => eval(l) * eval(r)
}
eval(e)


Expression evalutation (Recursive data structures) - OOP version
abstract class Expr {
def eval: Int
}

class Num(val data: Int) extends Expr {
def eval: Int = data
}
class Product(val left: Expr, val right: Expr) extends Expr {
def eval: Int = left.eval * right.eval
}
class Sum(val left: Expr, val right: Expr) extends Expr {
def eval: Int = left.eval + right.eval
}

val e = new Product(new Num(3), new Sum(new Num(4), new Num(5)))
e.eval

So what to use, Polymorphism version or case classes.
Use case classes when your cases are bound. Like here. There is a finite set of expressions.
Use Polymorphism



## Wednesday, February 8, 2017

### scala notes

https://www.safaribooksonline.com/library/view/scala-for-the/9780134510613/

packages => nesting can be all at one place, no need to have similar source directories
imports are flexible, can import specific classes, can hide a class, can alias, can import anywhere (lexical scoping)

Traits (like Java interfaces)
But much more powerful

Traits cannot have construction parameters, otherwise they are same as classes.
traits can be mixed in with objects, rather than class declaration
traits can invoke others in a priorlayer (consolelogger, timestamplogger, shortlogger)

## Thursday, February 2, 2017

### Changing the port for react app(create-react-app)

Built with create react app

Edit node_modules/react-scripts/scripts/start.js

Search for DEFAULT_PORT and modify as follows:

var DEFAULT_PORT = process.env.PORT || 80;