Apache Kafka for Developers #9: Replacing ZooKeeper with KRaft

In a traditional Kafka cluster setup, ZooKeeper is essential for managing and coordinating the Kafka Cluster. Its responsibilities include
  • Storing metadata about Kafka brokers, topics, partitions, and their configurations
  • Maintaining Kafka topic information, such as the number of partitions, replication factor, and partition leader.
  • Electing leaders for each partition to ensure there is always a leader available to handle read and write requests.
  • Electing one of the nodes as the Kafka Controller, which manages the leader-follower relationship for partitions.
  • Monitoring the health of Kafka brokers and notifying the controller of any broker failures, enabling quick failover and recovery. Maintaining Access Control Lists (ACLs) for all topics in the cluster.

However, there are several complexities to using ZooKeeper
  • Kafka and ZooKeeper are separate systems, which adds complexity and increases the risk of misconfiguration when managing a Kafka cluster.
  • Storing metadata in ZooKeeper can become a bottleneck as the Kafka cluster grows.
  • Loading metadata from ZooKeeper can be slow, particularly during startup or controller elections.
  • Synchronizing metadata between ZooKeeper and Kafka requires careful handling during version updates.
Introduction of KRaft(Kafka Raft)

Apache Kafka Raft(KRaft) is the consensus protocol introduced to remove Zookeeper for cluster metadata management. The Kafka KRaft architecture simplifies the metadata management within Kafka itself to remove the external system dependency.



Advantages of KRaft
  • It uses a quorum-based controller, ensures that metadata is consistently replicated across the cluster
  • The removal of ZooKeeper simplifies operational tasks, making it easier to monitor, administer, and troubleshoot Kafka clusters
  • KRaft allows kafka to scale more efficiently even if cluster reaches millions of partitions
  • It allows a single security model for the entire system
  • It is production ready from Kafka version 3.3.1 onwards
  • During startup or controller failover, a new controller can be spin up immediately because the data is already replicated across other controllers in the quorum
How KRaft Works

Quorum Controllers:
  • Metadata is managed by a group of nodes known as quorum controllers.
  • These controllers use the Raft consensus algorithm to ensure all nodes agree on the metadata state.
Event-Driven Protocol:
  • Quorum controllers use an event-driven protocol to replicate metadata changes across all controllers in the quorum.
Metadata Storage:
  • All metadata changes are stored in a dedicated Kafka topic called __cluster_metadata.
  • This topic has a single partition containing all information related to topics, partitions, and configurations.
Leader-Follower Model:
  • One of the quorum controllers acts as the leader and manages the metadata.
  • The follower controllers replicate the metadata for failover purposes.
Commitment of Changes:
  • For every metadata change, the leader controller sends an acknowledgment to all follower controllers.
  • The change is considered committed only after receiving confirmation from the majority controllers.
Leader Election:
  • If the leader becomes unreachable, the followers initiate a new leader election.
  • A follower becomes a candidate, requests votes from other nodes, and the node with the majority of votes becomes the new leader.
Heartbeat Messages:
  • The leader sends regular heartbeat messages to followers to maintain authority.
  • If a follower doesn't receive a heartbeat within a set time, it assumes the leader has failed and starts a new election.
Apache Kafka for Developers Journey:

Happy Coding :)

Comments