In this blog, I am going to present an introduction to Kafka which is very popular nowadays.
Apache Kafka is a distributed streaming platform which has capabilities to
Kafka is mainly popular and used in areas such as:
Let's understand how Kafka is able to do the great job which is nowadays getting popularity:
Image source: https://kafka.apache.org/intro
Let's understand the main abstract thing which stores streams of records - Topic
We can understand the topic as a feed name to which records are published. Kafka topics is a multi-subscribe in nature which means a topic can have zero, one or multiple consumers that have subscribed for the data written into it in form of a stream.
In each topic there is a concept of maintaining partition log which looks like below:
Image source: https://kafka.apache.org/intro
In the diagram, there are three partitioned of a topic (0,1 and 2). Each partition is ordered and immutable that consists of stream of records that are written in a structured commit log, a special feature implemented in Kafka. Each record has a sequential id number which is called offset and on the basis of this id, Kafka uniquely identifies each record in a partition.
Let's next explore about the Producers:
In simple word, we can say that producer produces data on a particular topic.The producer can choose which records can be stored on which partition of a topic. This is internally handled in a round robin manner simply to balance the load.
Let's see about Consumers:
In simple words, consumers consume data from a particular topic. Every consumer is a part of consumer group. We can separate the consumer groups across multiple servers.
Image source: https://kafka.apache.org/intro
There are two servers and two consumer groups A and B. There is total of 4 partitions P0, P1, P2, P3, and each partition is subscribed by both Consumer groups.
Hope, this helps in understanding the basic concepts of Kafka. I will publish more information about it in the next subsequent blogs.
Thanks,
Kundan Ray