'What is the difference between kafka earliest and latest offset values

producer sends messages 1, 2, 3, 4

consumer receives messages 1, 2, 3, 4

consumer crashes/disconnects

producer sends messages 5, 6, 7

consumer comes back up and should receive messages starting from 5 instead of 7

For this kind of result, which offset value I have to use and what are the other changes/configurations need to do



Solution 1:[1]

When a consumer joins a consumer group it will fetch the last committed offset so it will restart to read from 5, 6, 7 if before crashing it committed the latest offset (so 4). The earliest and latest values for the auto.offset.reset property is used when a consumer starts but there is no committed offset for the assigned partition. In this case you can chose if you want to re-read all the messages from the beginning (earliest) or just after the last one (latest).

Solution 2:[2]

To get a clear idea about this scenario we need to understand what happens when a consumer joins the same consumer group.

  1. Join the consumer group which triggers rebalance and assigns partitions to the new consumer.
  2. Look for committed offsets of the partitions assigned to the consumer.
  3. Check the auto.offset.reset configuration parameter to decide where to start consuming messages from.

We can set two values for auto.offset.reset configuration.

i. earliest - start consuming from the point where it stopped consuming before. (According to your example starts from 5)

ii. latest - starts consuming from the latest offsets in the assigned partitions. (According to your example starts from 7)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ppatierno
Solution 2 Daham Navinda