Apache Kafka for Developers #1: Introduction to Kafka and Comparison with RabbitMQ


Apache Kafka is an is an open-source, distributed event streaming platform developed by LinkedIn. It is designed to handle real-time data feeds with high throughput and low latency.

It offers high throughput, fault tolerance, resilience, and scalability. It supports a range of use cases, including data integration from various data sources using data connectors, log aggregation, real-time stream processing, website activity tracking, event sourcing and publish-subscribe messaging.

Kafka's architecture is based on a distributed commit log, where data is partitioned and replicated across multiple servers to ensure fault tolerance and scalability. Producers send data to Kafka topics, which are split into partitions, and consumers read data from these partitions.



Key Characteristics of distributed commit log

  1. Append-Only: New records are always appended to the end of the log, ensuring that the order of events is preserved.
  2. Immutable Records: Once a record is written to the log, it cannot be changed or deleted. This immutability guarantees consistency and reliability.
  3. Sequential Reads: Records are read in the order they were written, which simplifies the process of replaying events.
  4. Replication: Data is replicated across multiple nodes to provide fault tolerance. If one node fails, the data can still be accessed from another node.
  5. Scalability: By partitioning the log across multiple nodes, the system can handle large volumes of data with high throughput
Generally, two major messaging models are used to facilitate communication between the multiple applications in decoupled way includes,

Point-to-Point messaging Model:

Messages are stored in a queue, where one or more consumers can access them. However, each message can only be consumed by a single consumer. Once a consumer reads a message, it is removed from the queue.

Publish-Subscribe messaging Model:

Messages are stored in a topic. consumers can subscribe to one or more topics and consume all the messages within those topics.

Kafka's topic partitioned log architecture enables it to support both the Queuing (Point-to-Point) and Publish-Subscribe messaging models.

Kafka Vs RabbitMQ

Kafka

RabbitMQ

It uses a log-based architecture where messages are stored in topics. These topics are divided into partitions to ensure scalability and fault tolerance. Producers send messages to these topics, and consumers read from them at their own pace.

It uses a queue-based architecture where producers send messages to exchanges. These exchanges route the messages to queues based on routing keys, and consumers then read the messages from these queues.

It delivers high throughput and low latency, capable of handling millions of messages per second.

It delivers low latency, capable of handling thousands of messages per second.

It is ideally suited for real-time data processing, event sourcing, log aggregation, website activity tracking and stream processing.

It is ideally suited for task queues, background job processing, communication between applications and complex routing logic.

It doesn’t support publishing messages based on priority order

It supports assigning priorities to messages and consuming them based on the highest priority.

It uses a pull-based model where consumers request messages from specific offsets, enabling message replay and batch processing.

It uses a push-based model, delivering messages to consumers as they arrive.

Messages are stored durably according to the specified retention period.

Messages are removed once they have been consumed by the consumers.

Multiple consumers can subscribe to the same topic in Kafka, as it supports same message can be consumed by different consumers using consumer groups.

Multiple consumers cannot all receive the same message, as messages are deleted once they are consumed.

It uses a binary protocol over TCP.

It uses AMQP, STOMP and MQTT protocols


Apache Kafka for Developers Journey:

Happy Coding :)