Wednesday, September 21, 2016

java rmq using sparse table algorithm

int[][] m = process(a);
query(i,j,m,a);


public static int query(int i,int j,int[][] m, int[] a) {
int k = binlog(j-i+1);
return Math.min(a[m[i][k]], a[m[j+1-(1<<k)][k]]);
}
public static int binlog( int bits ) // returns 0 for bits=0
{
   int log = 0;
   if( ( bits & 0xffff0000 ) != 0 ) { bits >>>= 16; log = 16; }
   if( bits >= 256 ) { bits >>>= 8; log += 8; }
   if( bits >= 16  ) { bits >>>= 4; log += 4; }
   if( bits >= 4   ) { bits >>>= 2; log += 2; }
   return log + ( bits >>> 1 );
}
private static int[][] process(int[] a) {
int n = a.length;
int[][] m = new int[n][binlog(n)+1];
 //initialize M for the intervals with length 1
     for (int i = 0; i < n; i++)
         m[i][0] = i;
 //compute values from smaller to bigger intervals
     for (int j = 1; 1 << j <= n; j++)
         for (int i = 0; i + (1 << j) - 1 < n; i++)
             if (a[m[i][j - 1]] < a[m[i + (1 << (j - 1))][j - 1]])
                 m[i][j] = m[i][j - 1];
             else
                 m[i][j] = m[i + (1 << (j - 1))][j - 1];
     return m;
 }  

java rmq using sqrt appoach - range minimum query using squre root approach

int[] a;
int bucketSize = (int) Math.ceil(Math.sqrt(a.length));

int[] m = preProcess(a,bucketSize);

               int ans = query(i,j,m,a,bucketSize);

private static int query(int i, int j, int[] m, int[] a, int bucketSize) {
int curr_min = a[i];
if(j - i +1 < 2*bucketSize) {
for(int k=i;k<=j;k++) {
curr_min = Math.min(curr_min, a[k]);
}
return curr_min;
}
int seg_start = (i + bucketSize - 1) / bucketSize;
int seg_end = (j+1)/bucketSize - 1;
curr_min = a[m[seg_start]];
for(int k=seg_start;k<=seg_end;k++)
curr_min = Math.min(curr_min,a[m[k]]);

for(int k=i;k<seg_start*bucketSize;k++)
curr_min = Math.min(curr_min,a[k]);

for(int k=j;k>=(seg_end+1)*bucketSize;k--)
curr_min = Math.min(curr_min,a[k]);

return curr_min;
}

private static int[] process(int[] a, int bucketSize) {
int[] m = new int[(int)Math.ceil(a.length*1.0/bucketSize*1.0)];

int count = 0;
for (int i = 0; count < a.length; i++) {
int min = a[count];
m[i] = count;
for (int j = 0; j < bucketSize && count < a.length; j++) {
if (a[count] < min) {
min = a[count];
m[i] = count;
}
++count;
}
}
return m;
}

Thursday, September 15, 2016

aws cli s3

aws configure

copy bucket to local
aws s3 sync s3://bucket-name .

copy local folder to s3 bucket
aws s3 sync folder/ s3://folder

Thursday, August 11, 2016

Kafka

Producer:
  1. Fire & forget, Sync, Async with callback(Future, future.get())
  2. 2 kind of errors : retriable(connection error, no leader) & non-retriable(message too large)
  3. RecordMetadata rec = producer.send(new ProducerRecord()); rec has offset of message
  4. ProducerRecord needs topic name, key serializer, value serializer, bootstrap.servers(at least 2 so that if one is down other can take over, rest will be discovered by them)


Consumers:

  1. One consumer group per application. If #consumers > #partitions, rest of the consumers will remain idle.
  2. One consumer per thread is the rule.
  3. How does consumer commit offset:
    1. auto commit after every 5 seconds during poll.
    2. or commit manually - consumer.commitSync()
    3. or commitAsync() with callback
    4. commitAsync() for regular commits, commitSync() in finally block
    5. commitSync and commitAsync can be called with explicit topic,partition,offset too.
  4. RebalanceListener
    1. OnPartitionsRevoked
    2. OnPartitionsAssigned
  5. SeekBeginning, SeekEnd, seek specific offset
  6. Consumer clean exit: consumer.wakeup() from shutdownhook. consumer.close(). wakeupException().
  7. Single Consumer also possible as opposed to the Consumer Group.



Serialization:

  1. Custom serializers
  2. Avro: Schema can be changed without affecting existing messages

Partitioning
  1. Default: hash
  2. Custom

Friday, July 22, 2016

kafka

Basic unit: Message(Like a row in a table). Message can have a key(metadata) associated.
Message schema: Apache Avro is most popular. JSON/XML also fine.
Topic(like a DB table)/Partition: One topic can have multiple partitions.
Consumer(Group): A consumer can select specific partition(s) to listen on.

1. There is no guarantee of time-ordering of messages across the entire topic, just within a single partition. 

2. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers to provide for performance.

3. The term stream is often used when discussing data within systems like Kafka. Most often, a stream is considered to be a single topic of data, regardless of the number of partitions. This represents a single stream of data moving from the producers to the consumers. This way of referring to messages is most common when discussing stream processing, which is when frameworks, some of which are Kafka Streams, Apache Samza, and Storm, operate on the messages in real time.

Broker(a single server) - receives messages from producers and services consumers.
Broker cluster - one of them is controller.

Retention of data by time/size.

Topics may also be configured as log compacted, which means that Kafka will retain only the last message produced with a specific key. This can be useful for changelog-type data, where only the last update is interesting.

MIrror Maker - for data replication across clusters.



Thursday, June 23, 2016

Docker

https://jpetazzo.github.io/2014/06/23/docker-ssh-considered-evil/



docker logs

docker logs -f

docker run -d -p

docker run -it

docker attach

docker port webserver 80

docker diff webserver

docker cp

docker inspect webserver

docker rm -f $(docker ps -aq)
-------------------------------

docker search postgres
docker pull postgres:latest

Blog Archive