Tuesday, September 23, 2014

Command for computing max number of requests per second

 cat access_log.2014-09-23.txt | awk '{print $4}' | sort | uniq -c | sort -n

(assuming $4 is the timestamp)

per hour (sorted on hour):
cat access_log.2015-02-08.txt | awk '{print substr($4,14,2)}' | sort | uniq -c | sort -n -k2

List of IPs with number of requests in a particular hour(15) :
 cat access_log.2015-03-09.txt | awk '{if('15' == substr($4,14,2)) print $1}' | sort | uniq -c | sort -n

Saturday, June 28, 2014

Notes from Violent Python

1. The European Network and Information Security Agency provides an excellent resource for analyzing network traffic. They provide a live DVD ISO image that contains several network captures and exercises. You can download a copy from http://www.enisa.europa.eu/activities/cert/support/exercise/live-dvd-iso-images

2. NMap can be used in python via a library.

3. Pexpect(python) can be used for automating interactive applications - for e.g.   

4. ftplib for brute force ftp user credentials.

5. Metasploit as a penetration testing tool.

6. wigle.net for finding lat/long for a wifi router   

Sunday, June 22, 2014

Python simple script to connect to a host/port and get response.

__author__ = 'admin'
import socket
s = socket.socket()
s.settimeout(3)
s.connect(("server.com",21))
ans  = s.recv(1024)
print ans

Thursday, June 19, 2014

MySql db restore from dump failing

Solution : increase max_allowed_packet

Details :
Recently, I had a strange problem. While doing source dbdump.sql (dbdump.sql was generated by mysqldump) - initial data was loaded fine but after a while data was not being restored properly due to foreign key checks failing.

The dump file was disabling foreign key checks at the top and enabling again at the bottom  - so it was highly surprising.

To debug this I wrote a php script which executed queries line by line from the dump. I observed that it failed at a query where the data being inserted was huge which made database connection go away. And when it came back the foreign key check was enabled again - which in turn made a lot of things fail.

So I increased max_allowed_packet to 64M in my.ini and restarted mysql server. Solved.

Monday, May 5, 2014

debugging eclipse app from command line

go to the path of the class file, let's A.class
and run

java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1044 A

And in eclipse launch a Remote Java Application Debug configuration
with port 1044

Friday, April 11, 2014

downloading youtube video with VLC

First of all go to the YouTube video page which you want to download. Now open your VLC player. Click on Media-->Open Network stream. Paste the URL of the YouTube video page and click Play. Once VLC starts streaming the video.

To download the video,click Tools-->Codec Information and at the bottom of the window you will see a Location box. Copy the URL and paste it on your browser’s address bar. Now download will start.

Monday, March 31, 2014

Monday, March 17, 2014

installing scipy

yum install scipy

installing module sklearn python on centos 6.4

yum install gcc-c++
pip install -U scikit-learn


summary hilary mason machine learing intro - part 1/2/3/4

Code : https://github.com/hmason/ml_class
Google Prediction API : https://cloud.google.com/products/prediction-api/

Classification :
1. Using NYTimes Developer API
2. Naive Bayes algo

Clustering :
1. Agglomorative
2. K-means
3. pycluster
4. cluster delicious bookmarks
5. Recommendations systems are examples of clustering.

summary hilary mason machine learing intro - part 5

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.

Suppose you have two sets, A and B, and you would like to know how similar they are. First you might ask, how big is their intersection?

\displaystyle |A\cap B|

That’s nice, but isn’t comparable across different sizes of sets, so let’s normalize it by the union of the two sizes.

\displaystyle \frac{|A\cap B|}{|A\cup B|}

This is called the Jaccard Index, and is a common measure of set similarity. It has the nice property of being 0 when the sets are disjoint, and 1 when they are identical.


SimHash
a hash function usually hashes different values to totally different hash values
simhash is one where similiar items are hashed to similiar hash values

Blog Archive