'I want multiple servers processing data from a single database table

I have a single database table on a relational database. Data will be loaded into it. I then want to have multiple servers processing that data concurrently (I don't want to have only one server running at a time). E.g. each server will:

  • Query for a fixed number of rows
  • Do some work for each row retrieved
  • Update each row to show it has been processed

How do I ensure that each row is only processed once? Note I don't want to pre-assign a row of data to a server; i'm designing for high availability so the solution should keep running if one or more servers goes down.

The solution I've gone for so far is as follows:

  1. The table has three columns: LOCKED_BY (VARCHAR), LOCKED_AT (TIMESTAMP) and PROCESSED (CHAR)
  2. Each server starts by attempting to "pseudo-lock" some rows by doing:
UPDATE THE_TABLE
SET LOCKED_BY= $servername, 
    LOCKED_AT = CURRENT_TIMESTAMP,
WHERE (LOCKED_BY = null OR (CURRENT_TIMESTAMP- LOCKED_AT > $timeout)
AND PROCSSED = 'N'

i.e. try to "pseudo-lock" rows that aren't locked already or where the pseudo-lock as expired. Only do this for unprocessed rows.

  1. More than one server may have attempted this at the same time. The current server needs to query to find out if it was successful in the "pseudo-lock":
SELECT * FROM THE_TABLE
WHERE LOCKED_BY = $server_name
AND PROCESSED = 'N'

If any rows are returned the server can process them.

  1. Once the processing has been done the row is updated
UPDATE THE_TABLE SET PROCESSED = 'Y' WHERE PRIMARYKEYCOL = $pk

Note: the update statement should ideally limit the number of rows updated.

sql


Solution 1:[1]

If you are open to changing platform then I would suggest moving to a modern, cloud-based solution like Snowflake. This will do what you want but in the background and by default - so you don't need to know what it's doing or how it's doing it (unless you want to).

This may come across as patronising, which is not my intention, but what you are attempting (in the way you are attempting it) is very complex; so if you don't already know how to do it then someone telling you how to do it is not going to give you the skills/experience you need to be able to implement it successfully

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 NickW