'Understanding working of multiple gunicorn process

I have no knowledge of what I'm trying to understand, surfing the internet brought me here and now I need this in my code.

I use django-rest-framework, gunicorn, and Nginx.

Suppose I have 3 workers process of gunicorn setup.

and I have a very simple view that reads a value from the database, performs a different task that takes around 1 second, increments the value by 1, and saves it back to the database.

class CreateView():
    value = MyModel.objects.get(id=1).integerValueField
    otherTask() #takes around 1 second (assume)
    updatedValue = value + 1
    MyModel.objects.filter(id=1).update(integerValueField=updatedValue)
    return

Will this always work?

what if a different worker process of gunicorn is handling the request of concurrent users? If the database is updated (integerValueField field) by a different process in between reading the value and updating the value by some other worker process? Is this locked somehow to maintain integrity?

if I can get valid links to read more about the topic, will work well for me.



Solution 1:[1]

To expand on The Pjot‘s comment - no, the code you provided won't work reliably if you execute it with multiple Gunicorn workers. What happens here is called a race condition and isn't actually anything that is specific to Django - here is a discussion of exactly this in a more general database setting.

Now, what would happen in your specific case if multiple Gunicorn workers access the same object (or a single worker with multiple threads) looks roughly like this if we assume that MyModel.objects.get(id=1).integerValueField is 100 at the beginning:

  1. worker 1 executes value = MyModel.objects.get(id=1).integerValueField, which will hit the database and retrieve the object with primary key 1 and store it in memory. value will be set to the value of integerValueField, which is 100in our example
  2. worker 1 executes otherTask()
  3. worker 2 executes value = MyModel.objects.get(id=1).integerValueField and just as worker 1 it will store the current value of integerValueField in the database in value. value, again, will be 100 as the value hasn't yet changed in the database
  4. worker 2 executes otherTask()
  5. worker 1 will now execute updatedValue = value + 1, which sets updatedValue to 101 and then execute MyModel.objects.filter(id=1).update(integerValueField=updatedValue) to save it to the database
  6. worker 2 will now execute updatedValue = value + 1 as well - but value will be also 101 instead of 102, as the local copy of the database value is used. No additional database access happens here. After that it will execute MyModel.objects.filter(id=1).update(integerValueField=updatedValue) as well, which will update the database, but won't change the value - it still will be 101

What select_for_update does is it locks the database row so it cannot be accessed by any other worker at the same time (this is a concept that is called mutual exclusive access and is often implemented through locking). This will solve your issue of lost updates. However, what you should consider here is that you will block all access to this row while the otherTask() is running (which is apparently a substantial time) and this can easily lead to long delays for your clients and worse. I'd really consider if there isn't a better way to solve this. If not I'd at least look into multi-threaded Gunicorn workers - here is a good discussion.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 eega