Each message or record that a producer sends includes a Key (optional), Value, Header (optional), and Timestamp.
Message/Record Key
Message keys in Kafka are optional but can be quite beneficial. When a key is provided, Kafka hashes the key to determine the partition. This guarantees that all messages with the same key are directed to the same partition, which is crucial for maintaining order in message processing.
If a Kafka producer does not provide a message key, Kafka distributes the messages evenly across the available partitions using a round-robin algorithm. However, this approach helps balance the load across partitions but does not guarantee order for related messages.
Acknowledgments (acks)
Acknowledgements (acks) in Kafka are a mechanism to ensure that messages are reliably stored in the Kafka topic partition.
Kafka producers only write data to the leader partition in the broker.
Kafka producers must also set the acknowledgment level (acks) to indicate whether a message needs to be written to a minimum number of replicas before it is considered successfully written.
acks=0
The producer sends messages without waiting for any acknowledgment from the broker. While this approach minimizes latency, it also carries a high risk of message loss since the producer doesn't receive confirmation that the message was successfully written.
This follows the At Most Once delivery model, where messages are delivered once, and if a failure occurs, they may be lost and not redelivered. This approach is suitable for scenarios where occasional data loss is acceptable and low latency is crucial.
acks=all
The producer waits for an acknowledgment from the leader Partition and all in-sync partition replicas before responding to the client request. This setting provides highest level of reliability and ensures that the message is replicated across multiple brokers. However, it can increase latency because the producer waits for acknowledgments from multiple brokers.
This follows the At Least Once delivery model, where messages are delivered one or more times, and if a failure occurs, messages are not lost but may be delivered more than once. This approach is suitable for scenarios where data loss is unacceptable, and duplicates can be handled.
- cluster size: 3
- replication factor: 3 (including leader)
- min.insync.replicas: 3 (including leader)
- cluster size: 3
- replication factor: 3 (including leader)
- min.insync.replicas: 2 (including leader)
When a producer sends a message to a partition, the leader broker writes the message and waits for acknowledgments from at least one follower replicas (resulting in a total of two acknowledgments: one from the leader and one from follower). Once this condition is met, the message is considered successfully written.
Idempotency in Kafka producers guarantees that each message is delivered exactly once. This prevents duplicate entries during retries and maintains message ordering.
Without Idempotency
Imagine a scenario where a producer sends a message, but due to a network issue, the acknowledgment from the broker is not received. The producer will retry sending the message. This leads to duplicate message commits.
With Idempotency
Imagine a scenario where a producer sends a message, but due to a network issue, the acknowledgment from the broker is not received. The producer will retry sending the message. Here, the broker will identify the duplicate and discard it, ensuring that only one copy of the message is stored.
Kafka performs the following steps internally to ensure idempotency
- When idempotency is enabled, each producer is assigned a unique Producer ID (PID)
- Each message sent by the producer is assigned a monotonically increasing sequence number that is unique to each partition.
- The broker tracks the highest sequence number it has received from each producer for each partition. If it receives a message with a lower sequence number, it discards it as a duplicate.
This follows the Exactly Once delivery model, where messages are exactly once, even in the case of retries. This ensures no duplicates and maintains message order. This approach is suitable for scenarios application requires strict data consistency and no duplicates.