A Brief Introduction to Kafka

Posted By :Kundan Ray Akela |30th December 2018

In this blog, I am going to present an introduction to Kafka which is very popular nowadays.

Apache Kafka is a distributed streaming platform which has capabilities to

  • Message Queueing by the help of Publish and Subscribe paradigm
  • Store records in form of streams in a fault-tolerant way
  • Processing records in form of streams


Kafka is mainly popular and used in areas such as:

  • Very reliable capture data from application or system in real time, and
  • Building application which has the capability to use and feature real-time data


Let’s understand how Kafka is able to do the great job which is nowadays getting popularity:

  • It runs as a cluster on one or more servers and can be span across multiple datacenters.
  • Every Kafka clusters stores stream of records in categories, called topics.
  • Each record consists Key, Value, and Timestamp.

Image source: https://kafka.apache.org/intro


Let’s understand the main abstract thing which stores streams of records - Topic

We can understand the topic as a feed name to which records are published. Kafka topics is a multi-subscribe in nature which means a topic can have zero, one or multiple consumers that have subscribed for the data written into it in form of a stream.

In each topic there is a concept of maintaining partition log which looks like below:

Image source: https://kafka.apache.org/intro


In the diagram, there are three partitioned of a topic (0,1 and 2). Each partition is ordered and immutable that consists of stream of records that are written in a structured commit log, a special feature implemented in Kafka. Each record has a sequential id number which is called offset and on the basis of this id, Kafka uniquely identifies each record in a partition.

Let’s next explore about the Producers:

In simple word, we can say that producer produces data on a particular topic.The producer can choose which records can be stored on which partition of a topic. This is internally handled in a round robin manner simply to balance the load.

Let’s see about Consumers:

In simple words, consumers consume data from a particular topic. Every consumer is a part of consumer group. We can separate the consumer groups across multiple servers.

Image source: https://kafka.apache.org/intro

There are two servers and two consumer groups A and B. There is total of 4 partitions P0, P1, P2, P3, and each partition is subscribed by both Consumer groups.

Hope, this helps in understanding the basic concepts of Kafka. I will publish more information about it in the next subsequent blogs.



Kundan Ray

About Author

Kundan Ray Akela

Kundan has good programming and problem-solving skills.He is very good in explaining the ideas clearly and make the proper system design as well as plan.His hobbies are to play cricket and travel.

Request For Proposal

Sending message..

Ready to innovate ? Let's get in touch

Notice: Undefined index: HTTP_REFERER in /var/html/www/AI/wp-content/themes/oxides-child/functions.php on line 272

Notice: Undefined index: HTTP_REFERER in /var/html/www/AI/wp-content/themes/oxides-child/functions.php on line 272

Notice: Undefined index: HTTP_REFERER in /var/html/www/AI/wp-content/themes/oxides-child/functions.php on line 272

Notice: Undefined index: HTTP_REFERER in /var/html/www/AI/wp-content/themes/oxides-child/functions.php on line 272

Chat With Us