Nub Of Kafka

Kafka is an open source pub-sub messaging system developed by Linked-in that receives data from disparate source systems and makes the data available to target systems in real time.

A Kafka cluster consists of multiple brokers which host multiple topics.Each topic further consists of multiple partitions.Each partition contains messages having the same key and are in order

Main entities involved in a kafka messaging system are:

  • Producers
  • Consumers
  • Kafka cluster comprising of multiple broker systems
  • Zookeeper cluster

Producers

Producers are responsible for writing messages to the kafka. Producers know to which topic and partition to write to. While the topic name is presented explicitly, partition number could either be mentioned separately or is computed based on the key provided as part of message.

In case message key is empty, the partition number is chosen in a round robin fashion by the cluster.

The message is always passed to the leader of the partition.

Considerations while producing:

Durability or Resiliency: This is taken care by the acks configuration.

  • acks=0, no acknowledgement is sent
  • acks=1, producer waits for acknowledgement from the leader and ensures the message is written in leaders log.
  • acks=all, producer waits for acknowledgement from the leader and ensures the message is written by all ISR(In sync replicas).

Higher throughput could be achieved by keeping ack to a lower number.

Buffering, batching and compression
The Kafka Producer has a send() method which is asynchronous. Calling the send method adds the record to the output buffer and return right away. The buffer is used to batch records for efficient IO and compression.The Producer has buffers of unsent records per topic partition

To reduce requests count and increase throughput, set linger.ms > 0. This setting forces the Producer to wait up to linger.ms before sending contents of buffer or until batch fills up whichever comes first. Under heavy load linger.msnot met as the buffer fills up before the linger.ms duration completes. Under lighter load, the producer can use to linger to increase broker IO throughput and increase compression. The buffer.memory controls total memory available to a producer for buffering. If records get sent faster than they can be transmitted to Kafka then and this buffer will get exceeded then additional send calls block up to max.block.ms after then Producer throws a TimeoutException.

Leave a comment