'Recalculate historical data using Apache Beam

I have an Apache Beam streaming project that calculates data and writes it to the database, what is the best way to reprocess all historical records after a bug fix or after changing the way it processes data without a big delay?

Solution 1:^[1]

It is quite application dependent.

For example, a straightforward approach if you are using Kafka (and all data is in there):

Stop and relaunch the job (or if you want no downtime at all, launch another job while the other keeps running) without using a savepoint:
- Use a different Kafka consumer group to not mess with the existing pipeline
- Set a new database as output to build its contents from scratch
- Scale up the job so it finishes to reprocess as fast as possible
Switch the old database with the new one atomically
Scale back down the job

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Gerard Garcia

'Recalculate historical data using Apache Beam

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]