Friday, July 22, 2016

kafka

Basic unit: Message(Like a row in a table). Message can have a key(metadata) associated.
Message schema: Apache Avro is most popular. JSON/XML also fine.
Topic(like a DB table)/Partition: One topic can have multiple partitions.
Consumer(Group): A consumer can select specific partition(s) to listen on.

1. There is no guarantee of time-ordering of messages across the entire topic, just within a single partition. 

2. Each partition can be hosted on a different server, which means that a single topic can be scaled horizontally across multiple servers to provide for performance.

3. The term stream is often used when discussing data within systems like Kafka. Most often, a stream is considered to be a single topic of data, regardless of the number of partitions. This represents a single stream of data moving from the producers to the consumers. This way of referring to messages is most common when discussing stream processing, which is when frameworks, some of which are Kafka Streams, Apache Samza, and Storm, operate on the messages in real time.

Broker(a single server) - receives messages from producers and services consumers.
Broker cluster - one of them is controller.

Retention of data by time/size.

Topics may also be configured as log compacted, which means that Kafka will retain only the last message produced with a specific key. This can be useful for changelog-type data, where only the last update is interesting.

MIrror Maker - for data replication across clusters.



Blog Archive