Partition rebalancing in Kafka is essential for maintaining an even distribution of data across consumers within a consumer group. Partition rebalancing can occur in the following situations
New Consumer Joins the Group
When a new consumer joins the group, Kafka needs to redistribute the partitions to include the new consumer.
Consumer Leaves the Group
If a consumer crashes or is shut down, Kafka redistributes its partitions among the remaining consumers.
Partition Changes in a Topic
Adding or removing partitions in a topic also triggers a rebalance
Kafka Rebalancing Protocols
Eager Rebalancing Protocol:
It is also known as "stop-the-world" rebalancing, this protocol stops all consumers in the group during a rebalance. All consumers must rejoin the group and receive new partition assignments.
It is simple and straightforward. However, it can cause significant disruption and downtime, especially in large clusters, as all consumers must stop and rejoin the group
Incremental Cooperative Rebalancing Protocol
This protocol supports incremental rebalancing, so only the consumers that need to adjust their assignments will pause and rejoin the group, while the rest continue processing.
It reduces disruption and downtime, making it ideal for large clusters. However, it is more complex to implement and manage compared to eager rebalancing.
Partition assignment strategies
During a rebalance, Kafka uses partition assignment strategies to determine how partitions are assigned to consumers. Here are the different rebalance strategies
Round-Robin Assignment
This is the simplest and most commonly used partition assignment strategy in Kafka. Partitions are assigned to consumers one after another in a circular fashion. This approach helps balance the load evenly among consumers, ensuring that no single consumer is overloaded while others remain underutilized.
This assignment uses Eager Rebalancing Protocol while rebalancing.
Use below setting to enable round-robin assignment
Example:
Consider a scenario with 3 consumers and 6 partitions:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2, Partition 5
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 4
Consumer 2: Partition 1, Partition 5
Consumer 3: Partition 2
Consumer 4: Partition 3
This assignment is suitable for evenly distributed partitions and consumers with similar processing capabilities
Range Assignment
This approach allocates partitions to consumers in contiguous ranges, making it beneficial when partitions have a natural order and consumers can efficiently manage a range of partitions. Partitions are sorted and divided into contiguous ranges, with each range assigned to a specific consumer.
This assignment uses Eager Rebalancing Protocol while rebalancing.
Use below setting to enable range assignment
Example:
Consider a scenario with 3 consumers and 6 partitions:
Consumer 1: Partition 0, Partition 1
Consumer 2: Partition 2, Partition 3
Consumer 3: Partition 4, Partition 5
Each consumer gets a contiguous range of partitions
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 1
Consumer 2: Partition 2
Consumer 3: Partition 3
Consumer 4: Partition 4, Partition 5
This assignment is suitable when partitions have a natural order and consumers can handle a range of partitions efficiently.
Sticky Assignment
The approach aims to minimize partition movement by sticking to previous assignments as much as possible. It reduces the overhead of rebalancing and helps maintain data locality.
This assignment uses Eager Rebalancing Protocol while rebalancing.
Use below setting to enable sticky assignment
Example:
Consider a scenario with 3 consumers and 6 partitions:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2, Partition 5
Each consumer gets two partitions, distributed in a round-robin fashion initially.
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2
Consumer 4: Partition 5
Cooperative Sticky Assignment
The approach aims to minimize partition movement by sticking to previous assignments as much as possible. It reduces the overhead of rebalancing and helps maintain data locality.
This assignment uses Incremental Cooperative Rebalancing Protocol while rebalancing.
Use below setting to enable cooperative sticky assignment
Example:
Consider a scenario with 3 consumers and 6 partitions:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2, Partition 5
Each consumer gets two partitions, distributed in a round-robin fashion initially.
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2
Consumer 4: Partition 5
The key difference is Sticky Assignment uses an Eager Rebalancing Protocol, while Cooperative Sticky Assignment uses Incremental Cooperative Protocol.
This assignment is suitable for large clusters where minimizing disruption during rebalancing is critical
In Kafka, the default partition assignment strategy for consumers is the Range Assignor. This strategy allocates partitions to consumers in contiguous ranges, ensuring that each consumer receives a set of numerically adjacent partitions.
Happy Coding :)
New Consumer Joins the Group
When a new consumer joins the group, Kafka needs to redistribute the partitions to include the new consumer.
Consumer Leaves the Group
If a consumer crashes or is shut down, Kafka redistributes its partitions among the remaining consumers.
Partition Changes in a Topic
Adding or removing partitions in a topic also triggers a rebalance
It is also known as "stop-the-world" rebalancing, this protocol stops all consumers in the group during a rebalance. All consumers must rejoin the group and receive new partition assignments.
It is simple and straightforward. However, it can cause significant disruption and downtime, especially in large clusters, as all consumers must stop and rejoin the group
Incremental Cooperative Rebalancing Protocol
This protocol supports incremental rebalancing, so only the consumers that need to adjust their assignments will pause and rejoin the group, while the rest continue processing.
It reduces disruption and downtime, making it ideal for large clusters. However, it is more complex to implement and manage compared to eager rebalancing.
Partition assignment strategies
During a rebalance, Kafka uses partition assignment strategies to determine how partitions are assigned to consumers. Here are the different rebalance strategies
Round-Robin Assignment
This is the simplest and most commonly used partition assignment strategy in Kafka. Partitions are assigned to consumers one after another in a circular fashion. This approach helps balance the load evenly among consumers, ensuring that no single consumer is overloaded while others remain underutilized.
This assignment uses Eager Rebalancing Protocol while rebalancing.
Use below setting to enable round-robin assignment
partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor
Consider a scenario with 3 consumers and 6 partitions:
- Topic: A
- Consumers: Consumer 1, Consumer 2, Consumer 3
- Partitions: Partition 0, Partition 1, Partition 2, Partition 3, Partition 4, Partition 5
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2, Partition 5
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 4
Consumer 2: Partition 1, Partition 5
Consumer 3: Partition 2
Consumer 4: Partition 3
This assignment is suitable for evenly distributed partitions and consumers with similar processing capabilities
This approach allocates partitions to consumers in contiguous ranges, making it beneficial when partitions have a natural order and consumers can efficiently manage a range of partitions. Partitions are sorted and divided into contiguous ranges, with each range assigned to a specific consumer.
This assignment uses Eager Rebalancing Protocol while rebalancing.
Use below setting to enable range assignment
partition.assignment.strategy=org.apache.kafka.clients.consumer.RangeAssignor
Example:
Consider a scenario with 3 consumers and 6 partitions:
- Topic: A
- Consumers: Consumer 1, Consumer 2, Consumer 3
- Partitions: Partition 0, Partition 1, Partition 2, Partition 3, Partition 4, Partition 5
Consumer 1: Partition 0, Partition 1
Consumer 2: Partition 2, Partition 3
Consumer 3: Partition 4, Partition 5
Each consumer gets a contiguous range of partitions
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 1
Consumer 2: Partition 2
Consumer 3: Partition 3
Consumer 4: Partition 4, Partition 5
This assignment is suitable when partitions have a natural order and consumers can handle a range of partitions efficiently.
The approach aims to minimize partition movement by sticking to previous assignments as much as possible. It reduces the overhead of rebalancing and helps maintain data locality.
This assignment uses Eager Rebalancing Protocol while rebalancing.
Use below setting to enable sticky assignment
partition.assignment.strategy=org.apache.kafka.clients.consumer.StickyAssignor
Example:
Consider a scenario with 3 consumers and 6 partitions:
- Topic: A
- Consumers: Consumer 1, Consumer 2, Consumer 3
- Partitions: Partition 0, Partition 1, Partition 2, Partition 3, Partition 4, Partition 5
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2, Partition 5
Each consumer gets two partitions, distributed in a round-robin fashion initially.
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2
Consumer 4: Partition 5
This assignment is suitable Ideal for reducing the overhead of rebalancing and maintaining data locality.
The approach aims to minimize partition movement by sticking to previous assignments as much as possible. It reduces the overhead of rebalancing and helps maintain data locality.
This assignment uses Incremental Cooperative Rebalancing Protocol while rebalancing.
Use below setting to enable cooperative sticky assignment
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
Example:
Consider a scenario with 3 consumers and 6 partitions:
- Topic: A
- Consumers: Consumer 1, Consumer 2, Consumer 3
- Partitions: Partition 0, Partition 1, Partition 2, Partition 3, Partition 4, Partition 5
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2, Partition 5
Each consumer gets two partitions, distributed in a round-robin fashion initially.
Assignment after a new consumer C4 joins:
Consumer 1: Partition 0, Partition 3
Consumer 2: Partition 1, Partition 4
Consumer 3: Partition 2
Consumer 4: Partition 5
The key difference is Sticky Assignment uses an Eager Rebalancing Protocol, while Cooperative Sticky Assignment uses Incremental Cooperative Protocol.
This assignment is suitable for large clusters where minimizing disruption during rebalancing is critical
In Kafka, the default partition assignment strategy for consumers is the Range Assignor. This strategy allocates partitions to consumers in contiguous ranges, ensuring that each consumer receives a set of numerically adjacent partitions.
Apache Kafka for Developers Journey:
- Apache Kafka for Developers #1: Introduction to Kafka and Comparison with RabbitMQ
- Apache Kafka for Developers #2: Kafka Architecture and Components
- Apache Kafka for Developers #3: Kafka Topic Replication
- Apache Kafka for Developers #4: Kafka Producer and Acknowledgements
- Apache Kafka for Developers #5: Kafka Consumer and Consumer Group
- Apache Kafka for Developers #6: Kafka Consumer Partition Rebalancing