'SQLAlchemy update multiple rows in one transaction
How can I update multiple, existing rows in a database, using dictionary that maps existing values for one column, to the required new values for another column?
I have a table:
class MyTable(BaseModel):
col1 = sa.Column(sa.String(256))
col2 = sa.Column(sa.String(256))
Given that col1
has already values and col2
is empty, how can I update col2
if I have the set of data as a dictionary:
payload = {'x': 'y', 'a': 'b', 'c': 'd'}
So this payload maps values for col1
, to a new value for col2
; after the update you'd get [{'col1': 'x', 'col2': 'y'}, ...]
from the database.
I tried a couple of ways, which actually work but I think they are not as optimal as it could be ex.:
my_table = MyTable.__table__
for key, value in payload.items():
stm = my_table.update()
stm = stm.where(getattr(sales_order_item.c, 'col1') == key)
stm = stm.values({'col2': value})
session.execute(stm)
Or like this
for key, value in payload.items():
query = session.query(MyTable).filter(MyTable.col1==key)
query.update({MyTable.col2: value})
Now both of these solutions work as expected the only thing that is bothering me is the time it takes, for example for a payload of 100 elements it takes up to 6 sec, and I'm almost sure that there should be a better way to do that, isn't it?
I was thinking if there is a way of making it work with the in_
function:
query(MyTable).filter(
MyTable.col1.in_(payload.keys())
)
but I don't know how to structure the update query.
Solution 1:[1]
Yes, updating a larger number of rows with a single bulk UPDATE
statement will be a lot faster than using individual UPDATE
s on each and every object. An IN
filter would only help you limit what rows are updated, but you still need to tell the database what value to use for the col2
updates.
You can use a CASE ... WHEN ... THEN
construct for that, with the case()
function:
from sqlalchemy.sql import case
query(MyTable).filter(
MyTable.col1.in_(payload)
).update({
MyTable.col2: case(
payload,
value=MyTable.col1,
)
}, synchronize_session=False)
The above a) selects rows where the col1
value is a key in the payload
dictionary, then b) updates the col2
column value using a CASE
statement that picks values from that same dictionary to update that column based on matching col1
against the keys.
With payload
set to {'x': 'y', 'a': 'b', 'c': 'd'}
, the above executes the following query (give or take the exact order of WHEN
clauses and values in the IN
test):
UPDATE mytable
SET
col2=CASE mytable.col1
WHEN 'x' THEN 'y'
WHEN 'a' THEN 'b'
WHEN 'c' THEN 'd'
END
WHERE
mytable.col1 IN ('x', 'a', 'c')
I set synchronize_session
to False
there, as updating all possible cached MyTable
instances at once is perhaps not the best idea when updating a large number of rows. Your other options are 'evaluate'
and 'fetch'
.
We can't use the default
'evaluate'
(which would find existing objects in the session that match thewhere
clause, to update in-place), because SQLAlchemy currently doesn't know how to process anIN
filter (you get anUnevaluatableError
exception).If you do use
'fetch'
then all instances ofMyTable
cached in the session that were affected are updated with new values forcol2
(as mapped by their primary key).
Note that a commit would expire the session anyway, so you'd only want to use 'fetch'
if you need to do some more work with the updated rows before you can commit the current transaction.
See the Query.update()
documentation for more information on what synchronize_session
options you have.
Solution 2:[2]
Another solution that I found before the chosen answer which also works fast would be:
# payload = {'x': 'y', 'a': 'b', 'c': 'd'}
all_rows = query(MyTable).filter(
MyTable.col1.in_(payload)
)
for row in all_rows:
row.col2=payload[row.col1]
This would do aSELECT
though, which will add a couple of seconds, but is helpful to someone who finds it more convenient having the full row, also giving more flexibility.
Solution 3:[3]
As an extension to @Martijn's excellent answer i've done it in a way that allows you to increment rather than set the value:
# List of items to increment [id, increment value]
increments = [
[1,1],
[2,5],
[3,8]
]
# Extract ids
ids = [v[0] for v in increments]
# Create payload
payload = {
v[0]:Table.col2 + v[1] for v in increments
}
# Get the product
query(Table) \
.filter(Products.col1.in_(ids)) \
.update({
Table.col2: case(
payload,
value=Table.col1
)
})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | onlyphantom |
Solution 3 | Robert Franklin |