'What is the difference between kafka earliest and latest offset values
producer
sends messages 1, 2, 3, 4
consumer
receives messages 1, 2, 3, 4
consumer
crashes/disconnects
producer
sends messages 5, 6, 7
consumer
comes back up and should receive messages starting from 5 instead of 7
For this kind of result, which offset
value I have to use and what are the other changes/configurations need to do
Solution 1:[1]
When a consumer joins a consumer group it will fetch the last committed offset so it will restart to read from 5, 6, 7 if before crashing it committed the latest offset (so 4).
The earliest
and latest
values for the auto.offset.reset
property is used when a consumer starts but there is no committed offset for the assigned partition. In this case you can chose if you want to re-read all the messages from the beginning (earliest) or just after the last one (latest).
Solution 2:[2]
To get a clear idea about this scenario we need to understand what happens when a consumer joins the same consumer group.
- Join the consumer group which triggers rebalance and assigns partitions to the new consumer.
- Look for committed offsets of the partitions assigned to the consumer.
- Check the auto.offset.reset configuration parameter to decide where to start consuming messages from.
We can set two values for auto.offset.reset configuration.
i. earliest - start consuming from the point where it stopped consuming before. (According to your example starts from 5)
ii. latest - starts consuming from the latest offsets in the assigned partitions. (According to your example starts from 7)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ppatierno |
Solution 2 | Daham Navinda |