Wednesday, February 15, 2017

neural network notes

Study back propagation and implement gradient descent.
Implement dropout.
Cross entropy is an alternative to quadratic cost function for faster learning.

Softmax is a different activation(output) function. An alternative to Sigmoid. Sum of outputs is always 1. Hence can be thought of as a probability distribution. In a Sigmoid layer, output activations won't always sum to 1.

2 good combinations in NN are : Softmax + Log likelihood cost & Sigmoid + Quadratic cost
Usually Softmax + Log Likelihood is good for multi class classification problems.

Validation_data vs. test_data

Validation_data for tuning hyper parameters like learning rate
test_data for evaluation

Avoiding overfitting
Best way to avoid over fitting is to have larger training sets.

Regularization is another way to prevent over fitting since it pushes towards smaller weights. It means small changes in inputs will yield small changes in output. If the weights are large, small changes in input may result in large changes in output. So it's helping the model avoid the effects of noise.

L2 Regularization - add weight^2 to cost
L2 Regularization - add |weight| to cost

You could train multiple Neural networks and do a voting on their results.
Similarly, there is Dropout in which you remove half the neurons at a time which helps you adjust the weights in an average way.

Expand the data set - for images add rotations/scaling/elastic distortions, for speech - vary the speed up/down, add noise

Weight Initialization
Explore Gaussian.

Vanishing Gradient
In a multi layer NN, initial layers' learning can explode or vanish - the learning rates may be too high as compared to others or too low.

Convolutional networks

  1. Local receptive fields, stride length
  2. Shared weights and biases - all neurons in a hidden layer will have same weights and biases. So that all of them can detect the same feature at different locations. They are protected against translational changes. An image shifted slightly to right or left of something is still the image of the same thing. 
  3. Map from input layers to hidden layers is called feature map. Shared weights and biases constitute a kernel/filter.
  4. One input layer can be mapped to multiple hidden layers. That enables detection of multiple features.
  5. Later layers could be pooling layers - map 2x2 inputs to one neuron/pixel.
Recurrent Neural Networks(RNNs)
  1. Output of a neuron might be determined by its earlier value. Time based. Might fit Speech and Natural language problems.
Deep Belief Networks (DBNs)
  1. Generative - not only recognize digits, but able to produce them as well.
  2. Able to do unsupervised learning too.
  3. Restricted Boltzmann machines are a key component of DBNs.

What's going on with NNs
  1. Playing video games
  2. NLP

Tuesday, February 14, 2017

Neural network notes - 2

Feed forward vs Recurrent nets
Feed forward simply give output as input to the next layer
Recurrent nets can give output to previous layer and that could come back as input after some time to the same layer.
But as of now algorithms are not good enough for Recurrent.

Weights and biases/Gradient Descent

Neural networks notes 1

Perceptrons cam compute anything since they are NAND gates. NN is a
network of perceptrons which can adjust weights and biases, hence
better than a conventional laid out circuit. Their inputs and outputs
are 0/1. Output = w.x + b where w.x is dot product of weights and
inputs and b is bias. Bias is -threshold.

But inputs to/outputs from Sigmoid neurons can be 0.683, i.e. anything
between 0 to 1. Output activation function = 1/(1 + e^(-z)) where z =
w.x + b. If we plot it, it's a smoothed version of step function or
Perceptron. Which gives it the property that small changes in inputs
result in small changes in output, unlike Perceptron. This property is
helpful in tuning of a NN, otherwise small changes in inputs will
result in significant changes down the line.

Still, Sigmoids and Perceptrons are similar in the sense that for
large z, output is 1 for small z output is 0.

Essentially, Δoutput is a linear function of the changes Δwj and Δb in
the weights and bias.

We can use other activation functions too, but σ(z) ≡ 1/(1+e^-z) is
popular since exponential has nice differential properties.

letsencrypt interfact not found

rm -rf /root/.local/share/letsencrypt
chmod +x letsencrypt-auto
./letsencrypt-auto --debug renew

exponential function and e

f'(e^x) = e^x, rate of change at any x = e^x
integration(1/x) = ln(x), i.e. area under the curve from x = 1 to x =
k is ln(k), for k = e, area is 1.
slope at x=0 of e^x is 1
compound interest, (1 + 1/n)^n = e as n approaches infinity
e = sigma(1/fact(n))
e^x = sigma(similar)

Sunday, February 12, 2017

Numpy vs Tensorflow Matrix multiplication

Numpy is much much faster (Note: Used CPU version, not the GPU version)
import numpy
import tensorflow as tf
import time

def getTestData():
    A = [[1., 2., 3., 4.],[3.,4.,5.,6.],[7.,8,9.,10.],[11.,12.,13.,14.]]
    return 6,A

def tfMatMul():
    n,A = getTestData()
    A = tf.constant(A)
    sess = tf.Session()
    for num in range(1,n):
        A = tf.matmul(A,A)
        output =
        A = tf.convert_to_tensor(output)
    return output

def numPyMatMul():
    n,A = getTestData()
    for num in range(1,n):
        A = numpy.matmul(A,A)
    return A

def timedRun(methodToRun):
    start = time.time()
    result = methodToRun()
    end = time.time()
    diff = end - start
    print("Time Taken :"+str(diff))


Blog Archive