A Brief Introduction to Kafka

Posted By :Kundan Ray Akela |30th December 2018

In this blog, I am going to present an introduction to Kafka which is very popular nowadays.

Apache Kafka is a distributed streaming platform which has capabilities to

  • Message Queueing by the help of Publish and Subscribe paradigm
  • Store records in form of streams in a fault-tolerant way
  • Processing records in form of streams

 

Kafka is mainly popular and used in areas such as:

  • Very reliable capture data from application or system in real time, and
  • Building application which has the capability to use and feature real-time data

 

Let's understand how Kafka is able to do the great job which is nowadays getting popularity:

  • It runs as a cluster on one or more servers and can be span across multiple datacenters.
  • Every Kafka clusters stores stream of records in categories, called topics.
  • Each record consists Key, Value, and Timestamp.

Image source: https://kafka.apache.org/intro

 

Let's understand the main abstract thing which stores streams of records - Topic

We can understand the topic as a feed name to which records are published. Kafka topics is a multi-subscribe in nature which means a topic can have zero, one or multiple consumers that have subscribed for the data written into it in form of a stream.

In each topic there is a concept of maintaining partition log which looks like below:

Image source: https://kafka.apache.org/intro

 

In the diagram, there are three partitioned of a topic (0,1 and 2). Each partition is ordered and immutable that consists of stream of records that are written in a structured commit log, a special feature implemented in Kafka. Each record has a sequential id number which is called offset and on the basis of this id, Kafka uniquely identifies each record in a partition.

Let's next explore about the Producers:

In simple word, we can say that producer produces data on a particular topic.The producer can choose which records can be stored on which partition of a topic. This is internally handled in a round robin manner simply to balance the load.

Let's see about Consumers:

In simple words, consumers consume data from a particular topic. Every consumer is a part of consumer group. We can separate the consumer groups across multiple servers.

Image source: https://kafka.apache.org/intro

There are two servers and two consumer groups A and B. There is total of 4 partitions P0, P1, P2, P3, and each partition is subscribed by both Consumer groups.

Hope, this helps in understanding the basic concepts of Kafka. I will publish more information about it in the next subsequent blogs.

 

Thanks,

Kundan Ray


About Author

Kundan Ray Akela

Kundan holds years of industry experience as a Fullstack Developer in various technologies and is focused in defining the architecture of the system to ensure reliability and resilience. He possess good knowledge & understanding of latest technologies and hands-on experience in Core Java, Spring-Boot, hibernate, React, Angular , Apache Kafka messaging queue , AI Development like Computer Vision/Generative AI/Prediction System, Internet of Things based technologies and relational database like MySql, PostgreSQL etc. He is proficient in API Implementations, Webservices, Development Testings and deployments, code enhancements and have been contributing to company values through his deliverable in various client projects namely VirginMedia, Konfer, TIHM, Herdsy, HP1T and many more. He has a creative mind and has good analytical skills and likes reading and exploring new technologies.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us