'Understanding working of multiple gunicorn process
I have no knowledge of what I'm trying to understand, surfing the internet brought me here and now I need this in my code.
I use django-rest-framework, gunicorn, and Nginx.
Suppose I have 3 workers process of gunicorn setup.
and I have a very simple view that reads a value from the database, performs a different task that takes around 1 second, increments the value by 1, and saves it back to the database.
class CreateView():
value = MyModel.objects.get(id=1).integerValueField
otherTask() #takes around 1 second (assume)
updatedValue = value + 1
MyModel.objects.filter(id=1).update(integerValueField=updatedValue)
return
Will this always work?
what if a different worker process of gunicorn is handling the request of concurrent users? If the database is updated (integerValueField field) by a different process in between reading the value and updating the value by some other worker process? Is this locked somehow to maintain integrity?
if I can get valid links to read more about the topic, will work well for me.
Solution 1:[1]
To expand on The Pjot‘s comment - no, the code you provided won't work reliably if you execute it with multiple Gunicorn workers. What happens here is called a race condition and isn't actually anything that is specific to Django - here is a discussion of exactly this in a more general database setting.
Now, what would happen in your specific case if multiple Gunicorn workers access the same object (or a single worker with multiple threads) looks roughly like this if we assume that MyModel.objects.get(id=1).integerValueField
is 100
at the beginning:
- worker 1 executes
value = MyModel.objects.get(id=1).integerValueField
, which will hit the database and retrieve the object with primary key1
and store it in memory.value
will be set to the value ofintegerValueField
, which is100
in our example - worker 1 executes
otherTask()
- worker 2 executes
value = MyModel.objects.get(id=1).integerValueField
and just as worker 1 it will store the current value ofintegerValueField
in the database invalue
.value
, again, will be100
as the value hasn't yet changed in the database - worker 2 executes
otherTask()
- worker 1 will now execute
updatedValue = value + 1
, which setsupdatedValue
to101
and then executeMyModel.objects.filter(id=1).update(integerValueField=updatedValue)
to save it to the database - worker 2 will now execute
updatedValue = value + 1
as well - butvalue
will be also101
instead of102
, as the local copy of the database value is used. No additional database access happens here. After that it will executeMyModel.objects.filter(id=1).update(integerValueField=updatedValue)
as well, which will update the database, but won't change the value - it still will be101
What select_for_update
does is it locks the database row so it cannot be accessed by any other worker at the same time (this is a concept that is called mutual exclusive access and is often implemented through locking). This will solve your issue of lost updates. However, what you should consider here is that you will block all access to this row while the otherTask()
is running (which is apparently a substantial time) and this can easily lead to long delays for your clients and worse.
I'd really consider if there isn't a better way to solve this. If not I'd at least look into multi-threaded Gunicorn workers - here is a good discussion.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | eega |