Monday, March 31, 2014

installing and using matplotlib on centos 6.4

yum install -y python-matplotlib
yum install pygtk2
import matplotlib as mpl
import matplotlib.pyplot as plt

Monday, March 17, 2014

installing scipy

yum install scipy

installing module sklearn python on centos 6.4

yum install gcc-c++
pip install -U scikit-learn

summary hilary mason machine learing intro - part 1/2/3/4

Code :
Google Prediction API :

Classification :
1. Using NYTimes Developer API
2. Naive Bayes algo

Clustering :
1. Agglomorative
2. K-means
3. pycluster
4. cluster delicious bookmarks
5. Recommendations systems are examples of clustering.

summary hilary mason machine learing intro - part 5

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.

Suppose you have two sets, A and B, and you would like to know how similar they are. First you might ask, how big is their intersection?

\displaystyle |A\cap B|

That’s nice, but isn’t comparable across different sizes of sets, so let’s normalize it by the union of the two sizes.

\displaystyle \frac{|A\cap B|}{|A\cup B|}

This is called the Jaccard Index, and is a common measure of set similarity. It has the nice property of being 0 when the sets are disjoint, and 1 when they are identical.

a hash function usually hashes different values to totally different hash values
simhash is one where similiar items are hashed to similiar hash values

Blog Archive