Kafka topic replication ensures data durability and high availability by duplicating each partition across multiple brokers in a Kafka cluster.
Kafka follows a leader-follower Replica model in which each partition has single leader and multiple follower replicas. The leader handles all read and write operations, while the followers replicate the data from the leader.
Replication Factor
The replication factor is set at the topic level when the topic is created. It specifies how many copies of each partition will be stored across different brokers in a Kafka cluster
In-Sync Replicas (ISR)
The replicas that have fully synchronized with the leader for a specific partition are known as In-Sync Replicas. This means that all In-Sync Replicas and the leader contain the same data.
Single Broker with Replication Factor
In a Kafka setup with a single broker, the replication factor must be set to 1. This means each partition in Kafka has single leader and zero followers. The replication factor includes the total number of replicas including the leader. There is only one copy of each partition, and no replication occurs. This setup poses a significant risk of data loss and is not advisable for production environments.
Multiple Brokers with Replication Factor
In a multi-broker Kafka cluster, the replication factor can be set to a value greater than 1. This means that each partition will have multiple copies distributed across different brokers. It ensures fault tolerance, high availability and data durability.
Replication Factor 3 with three brokers and two partitions
Kafka Setup
When a producer sends data to Topic A with Partition 0, it is first stored into the leader of Partition 0 on Broker 1. The data is then replicated (In-Sync Replica) to the follower partitions on both Broker 2 and Broker 3.
If Broker 1 fails, Kafka will select a new leader for Partition 0 from the in-sync replicas (either Broker 2 or Broker 3). This ensures that the system continues to function properly, maintaining data availability and durability.
Best practices for Kafka replication
Happy Coding :)
Kafka follows a leader-follower Replica model in which each partition has single leader and multiple follower replicas. The leader handles all read and write operations, while the followers replicate the data from the leader.
Replication Factor
The replication factor is set at the topic level when the topic is created. It specifies how many copies of each partition will be stored across different brokers in a Kafka cluster
In-Sync Replicas (ISR)
The replicas that have fully synchronized with the leader for a specific partition are known as In-Sync Replicas. This means that all In-Sync Replicas and the leader contain the same data.
In a Kafka setup with a single broker, the replication factor must be set to 1. This means each partition in Kafka has single leader and zero followers. The replication factor includes the total number of replicas including the leader. There is only one copy of each partition, and no replication occurs. This setup poses a significant risk of data loss and is not advisable for production environments.
Multiple Brokers with Replication Factor
In a multi-broker Kafka cluster, the replication factor can be set to a value greater than 1. This means that each partition will have multiple copies distributed across different brokers. It ensures fault tolerance, high availability and data durability.
Replication Factor 3 with three brokers and two partitions
Let’s consider a Kafka cluster with 3 brokers and a topic A with 2 partitions (0 and 1). The replication factor is set to 3, meaning each partition will have 3 replicas (including leader).
- Brokers: Kafka cluster consists of 3 brokers, named Broker 1, Broker 2, and Broker 3.
- Replication Factor: Set to 3, meaning each partition will have 3 replicas (including leader).
- Topic: topic A with 2 partitions, named Partition 0 and Partition 1.
- Broker1 contains Leader for Partition 0, Follower for Partition 1
- Broker2 contains Follower for Partition 0, Leader for Partition 1
- Broker3 contains Follower for both Partition 0 and Partition 1
When a producer sends data to Topic A with Partition 0, it is first stored into the leader of Partition 0 on Broker 1. The data is then replicated (In-Sync Replica) to the follower partitions on both Broker 2 and Broker 3.
If Broker 1 fails, Kafka will select a new leader for Partition 0 from the in-sync replicas (either Broker 2 or Broker 3). This ensures that the system continues to function properly, maintaining data availability and durability.
- Starting with a replication factor of three and three brokers in a Kafka cluster is recommended to ensure even data distribution, fault tolerance and can survive the failure of up to two brokers.
- A good rule of thumb is to have at least as many brokers as the replication factor to ensure even distribution and fault tolerance
- Avoid setting up too high replication factor which lead to increased resource consumption and network traffic
- Avoid setting up too low replication factor which can compromise data availability and fault tolerance
Apache Kafka for Developers Journey:
- Apache Kafka for Developers #1: Introduction to Kafka and Comparison with RabbitMQ
- Apache Kafka for Developers #2: Kafka Architecture and Components
- Apache Kafka for Developers #3: Kafka Topic Replication
- Apache Kafka for Developers #4: Kafka Producer and Acknowledgements
- Apache Kafka for Developers #5: Kafka Consumer and Consumer Group