'Kubernetes - How do I prevent duplication of work when there are multiple replicas of a service with a watcher?

I'm trying to build an event exporter as a toy project. It has a watcher that gets informed by the Kubernetes API every time an event, and as a simple case, let's assume that it wants to store the event in a database or something.

Having just one running instance is probably susceptible to failures, so ideally I'd like two. In this situation, the naive implementation would both instances trying to store the event in the database so it'd be duplicated.

  1. What strategies are there to de-duplicate? Do I have to do it at the database level (say, by using some sort of eventId or hash of the event content) and accept the extra database load or is there a way to de-duplicate at the instance level, maybe built into the Kubernetes client code? Or do I need to implement some sort of leader election?

  2. I assume this is a pretty common problem. Is there a more general term for this issue that I can search on to learn more?

I looked at the code for GKE event exporter as a reference but I was unable to find any de-duplication, so I assume that it happens on the receiving end.



Solution 1:[1]

You should use both leader election and de-duplication at your watcher level. Only one of them won't be enough.

Why need leader election?

If high availability is your main concern, you should have leader election between the watcher instances. Only the leader pod will write the event to the database. If you don't use leader election, the instances will race with each other to write into the database.

You may check if the event has been already written in the database and then write it. However, you can not guarantee that other instances won't write into the database between when you checked and when you write the event. In that case, database level lock / transaction might help.

Why need de-duplication?

Only leader election will not save you. You also need to implement de-duplication. If your leader pod restart, it will resync all the existing events. So, you should have a check whether to process the event or not.

Furthermore, if a failover happen, how you know from the new leader about which events were successfully exported by previous leader?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1