'How to skip kafka history data in flink job if certain lag is encountered?

Sometimes we encounter lag in kafka consumer due to some external issues.

Flink job will always consume kafka history (delayed data) with exactly-once semantics, but here's a scenario:

We will skip delayed data when kafka consumer lag is too much in order to let our downstream service get the latest data in time. I am thinking to set a window period to do it. What should I code for it?



Solution 1:[1]

You could stop the Flink job and use kafka-consumer-groups CLI from Kafka to seek the consumer group forward (assuming Flink is using one, rather than maintaining offsets itself)

When the job restarts, it'll start from the new offset location

Solution 2:[2]

I'd say your least painful option is to always read all the messages, but do not process (discard them as soon as possible) the ones you want to skip. Just reading and discarding without any further processing is really fast.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 OneCricketeer
Solution 2 Gerard Garcia