Friday, November 26, 2010

Web Server Issues

1. N/W  I/O
2. Disk I/O
3. CPU
4. Memory

A. Dig the root cause : too many processes may start thrashing,
which may seem like a Disk I/O problem, but actually is a memory

Tools : sar,systat,iostat,top,vmstat,nstat

Apache performance testing : TSUNG

Apache usually dies at 5000 concurrent connections.
It usually works in pre-fork/worker model.
Pre-fork : OS kernel does the load balancing.
Worker : Apache master process does the load balancing.

Usually every process spawns many threads.
A process has 4GB address virtual memory
shared by all its threads.
So, if there are too many, for e.g. 500, threads,
one thread may bajaao the band for everyone else.

So,it's better to have 5 processes with 100 threads each.

But in apache, creating many threads has its own problems : 
every thread has its kernel stack, process stack, which eats
up resources.

Lighttpd works on event driven model.
Accepts connections, passes the baton
to fast-cgi.exe, when it returns, lighttpd
returns to requester.

Since it is event driven, it scales better.

Work is on to make lighttpd work better on multiple cores,
so that it can spawn a different process for every core.

FrontEnd optimization : 
in .htaccess set headers(for e.g. TTL(time to live), time after which the client should fetch the image again) for static images.
caching proxy layer : varnish/squid : saves db query time, php script running time.
a separate caching layer, as opposed to php's in-built caching may help diagnose the issue better.
- proxy because caching runs at :80, server at :8080

you can also look into php opcode caching...


look at 31 yahoo tips, yahoo performance blog, mysql performance blog

If there are too many threads running, and you may be using external
libraries(code) from them, it's better to kill a thread after it has served
5000(n) requests. Since, after that it will have many memory leaks,
which will be hard to find.

MySql General Info

ibdata/iblog are write ahead logs(queries are written in the logs before executing them on the server)
- used for recovering a crashed server

binlog -> logs of queries after committing
- for slave level replication

Slave level replication is async, i.e. there are 2 threads : 
1. slave reads queries from server and writes them to its relay log
2. another process reads from relay log and executes them
hence it's async

If it were sync, it will hamper performance of master, since
the master will have to wait for the slave to write in db.


To set up a slave : 
When you take the dump from the master, specify an option, which will
tell the slave, from where to start reading the logs.

You can specify in slave config, about which dbs/tables to include/exclude.

show slave status\G
start slave
keep taking periodic solid dumps

Tuesday, November 23, 2010

Intellij Idea : Using framework as a RSL(Remote shared library) in a Flex project

In Flex Builder 3: 
Project -> Properties -> Flex Build Path -> Library Path -> Framework linkage in Flex Builder.
Flex builder also copies the required files : 

In Intellij Idea 9.0.3 : 
Right click on a Module -> Module Settings -> Flex Compiler Settings -> Advanced->
Select Use Framework as Runtime Shared Library (RSL)

Unlike Flex Builder, Idea doesn't copy the required .swz and .swf files.
You have to copy them manually to the location where your main application swf resides.
Copy them from here : 
{location of your flex sdk}/frameworks/rsls/

To verify if RSL is working for you, clear the Flash Player Cache and run your flex application.
You should get a swz or swf file of roughly 525KB in your cache.

On Windows XP, the location of the Flash Player Cache : 
C:\Documents and Settings\{User name}\Application Data\Adobe\Flash Player\AssetCache\

On linux, it's : 

Monday, November 22, 2010

Installing and running hadoop on windows xp in standalone mode

Prereqs : have cygwin installed
2. unpack the file hadoop-0.20.2.tar.gz.gz in C:/ (using tar xvf filename)
3. Add the following to conf/ in the unpacked folder : 
export JAVA_HOME=/cygdrive/c/Java/jdk1.6.0_23
(assuming that C:/Java/jdk1.6.0_23/bin contains javac.exe and other binaries)

5. Unpack the file tomwhite-hadoop-book-32dae01.tar.gz
6. in the unpacked folder : cd ch02/src/main/java
7. mkdir -p build/classes
8. $ javac -verbose -classpath C:\\hadoop-0.20.2\\hadoop-0.20.2-core.jar MaxTemperature*.java -d build/classes
9. export HADOOP_CLASSPATH=build/classes
10. hadoop MaxTemperature ../../../../input/ncdc/sample.txt output

your output is in output folder

Thursday, November 18, 2010

Shell: showing lines which are present only in one file

comm -13 <(sort a.txt) <(sort b.txt)
will show only the lines which are present in b.txt
but not in a.txt.

Monday, November 15, 2010

Vim regexp example

I have a file : 

So, line 1 has "alert" without any white spaces.
line 2 has white spaces followed by "alert"
line 3 has white spaces,"//" followed by "alert"

I want to find "alert" which is not commented, i.e. not following "//"

Here is the regexp : 


which is basically : 
(beginning of line OR absense of "//") followed by "alert"

Shell scripts on Cygwin, xargs in general

Lessons learned today : 
1. Convert the scripts and the files they operate upon to unix, using d2u.

2. For debugging, add #!/bin/bash -x at the top of the script, so that you know
which variable gets expanded to what?

3. For debugging xargs 
(i) find out whether the command being used with xargs expects single operand or multiple operands, for e.g.
find . -name expects a single operand, whereas grep -i can operate on multiple operands.

(ii) if you need to convert multiple lines of input to single line, do : 
xargs --max-args=1

(iii) use -t option with xargs to see what's going on?

Friday, November 12, 2010


cat filename | aspell list
produce misspelt words

cat filename | aspell munch-list simple
produce roots of the words

Built-in shell variables

# Number of arguments given to current process.

@ Command-line arguments to current process. Inside double quotes, expands to individual arguments.

* Command-line arguments to current process. Inside double quotes, expands to a single argument.

- (hyphen) Options given to shell on invocation.

? Exit status of previous command.

$ Process ID of shell process.

0 (zero) The name of the shell program.

! Process ID of last background command. Use this to save process ID numbers for later use with the wait

ENV Used only by interactive shells upon invocation; the value of$ENV is parameter-expanded. The result
should be a full pathname for a file to be read and executed at startup. This is an XSI requirement.
HOME Home (login) directory.

IFS Internal field separator; i.e., the list of characters that act as word separators. Normally set to space, tab,
and newline.

LANG Default name of current locale; overridden by the otherLC_* variables.

LC_ALL Name of current locale; overrides LANG and the otherLC_* variables.

LC_COLLATE Name of current locale for character collation (sorting) purposes.

Shell : getopts

use getopts for routine command line option processing

Shell: exit status of last command

echo $?

Shell: setting unsetting functions

who_is_on ( ) { Define a function
    who | awk ' { print $1 }' | sort -u     //Generate sorted list of users
. . .
unset -f who_is_on    //Remove the function

Thursday, November 11, 2010

The Pragmatic Programmer : tips

1. Test stage coverage, not code coverage
2. Test early, test often.
3. Refactor early, refactor often.
4. Costly tools don't produce better designs.

The Pragmatic Programmer : tips

1. Invest regularly in your knowledge portfolio
2. Keep knowledge in plain text
3. You can't write perfect software
4. Don't be a slave to formal methods
5. Sign your work 

Shell: showing lines which are present only in one file

comm -13 <(sort a.txt) <(sort b.txt)
will show only the lines which are present in b.txt
but not in a.txt.

Wednesday, November 10, 2010

Mysql : percentile of queries completed in a specific duration, on weekly basis

What I have :
A MySql table with 2 columns : load_start_time,load_end_time, both the
columns contain unix timestamps.

What I need :
Weekwise data in the following format :
Week Time_taken_for_load_to_complete percentile_of_loads_completed_in_that_time

Here are the requisite queries :

drop table if exists tbl;

create table tbl as ( select
week(date(from_unixtime(load_start_time))) as w,round((load_end_time -
load_start_time)/10) as 10SecInterval,count(1) as numReqs from
user_load_data where load_start_time is not null and load_end_time is
not null group by w,10SecInterval having(10SecInterval < 240 and
10SecInterval >= 0));

--compute percentile
drop table if exists tbl1;
create table tbl1 as ( select a.w,a.10SecInterval*10 as
seconds,round(100*((select sum(numReqs) from tbl as b where b.w = a.w
and b.10SecInterval <= a.10SecInterval)/(select sum(numReqs) from tbl
as c where c.w = a.w))) as percentile from tbl as a);
select * from tbl1 limit 200;

Blog Archive