'SQLAlchemy update multiple rows in one transaction

How can I update multiple, existing rows in a database, using dictionary that maps existing values for one column, to the required new values for another column?

I have a table:

class MyTable(BaseModel):
    col1 = sa.Column(sa.String(256))
    col2 = sa.Column(sa.String(256))

Given that col1 has already values and col2 is empty, how can I update col2 if I have the set of data as a dictionary:

payload = {'x': 'y', 'a': 'b', 'c': 'd'}

So this payload maps values for col1, to a new value for col2; after the update you'd get [{'col1': 'x', 'col2': 'y'}, ...] from the database.

I tried a couple of ways, which actually work but I think they are not as optimal as it could be ex.:

my_table = MyTable.__table__
for key, value in payload.items():
    stm = my_table.update()
    stm = stm.where(getattr(sales_order_item.c, 'col1') == key)
    stm = stm.values({'col2': value})
    session.execute(stm)

Or like this

for key, value in payload.items():
    query = session.query(MyTable).filter(MyTable.col1==key)
    query.update({MyTable.col2: value})

Now both of these solutions work as expected the only thing that is bothering me is the time it takes, for example for a payload of 100 elements it takes up to 6 sec, and I'm almost sure that there should be a better way to do that, isn't it?

I was thinking if there is a way of making it work with the in_ function:

query(MyTable).filter(
        MyTable.col1.in_(payload.keys())
    )

but I don't know how to structure the update query.



Solution 1:[1]

Yes, updating a larger number of rows with a single bulk UPDATE statement will be a lot faster than using individual UPDATEs on each and every object. An IN filter would only help you limit what rows are updated, but you still need to tell the database what value to use for the col2 updates.

You can use a CASE ... WHEN ... THEN construct for that, with the case() function:

from sqlalchemy.sql import case

query(MyTable).filter(
    MyTable.col1.in_(payload)
).update({
    MyTable.col2: case(
        payload,
        value=MyTable.col1,
    )
}, synchronize_session=False)

The above a) selects rows where the col1 value is a key in the payload dictionary, then b) updates the col2 column value using a CASE statement that picks values from that same dictionary to update that column based on matching col1 against the keys.

With payload set to {'x': 'y', 'a': 'b', 'c': 'd'}, the above executes the following query (give or take the exact order of WHEN clauses and values in the IN test):

UPDATE mytable
SET
    col2=CASE mytable.col1
        WHEN 'x' THEN 'y'
        WHEN 'a' THEN 'b'
        WHEN 'c' THEN 'd'
    END
WHERE
    mytable.col1 IN ('x', 'a', 'c')

I set synchronize_session to False there, as updating all possible cached MyTable instances at once is perhaps not the best idea when updating a large number of rows. Your other options are 'evaluate' and 'fetch'.

  • We can't use the default 'evaluate' (which would find existing objects in the session that match the where clause, to update in-place), because SQLAlchemy currently doesn't know how to process an IN filter (you get an UnevaluatableError exception).

  • If you do use 'fetch' then all instances of MyTable cached in the session that were affected are updated with new values for col2 (as mapped by their primary key).

Note that a commit would expire the session anyway, so you'd only want to use 'fetch' if you need to do some more work with the updated rows before you can commit the current transaction.

See the Query.update() documentation for more information on what synchronize_session options you have.

Solution 2:[2]

Another solution that I found before the chosen answer which also works fast would be:

# payload = {'x': 'y', 'a': 'b', 'c': 'd'}
all_rows = query(MyTable).filter(
    MyTable.col1.in_(payload)
)
for row in all_rows:
    row.col2=payload[row.col1]

This would do aSELECT though, which will add a couple of seconds, but is helpful to someone who finds it more convenient having the full row, also giving more flexibility.

Solution 3:[3]

As an extension to @Martijn's excellent answer i've done it in a way that allows you to increment rather than set the value:

# List of items to increment [id, increment value]
increments = [
    [1,1],
    [2,5],
    [3,8]
]
# Extract ids
ids = [v[0] for v in increments]
# Create payload
payload = {
    v[0]:Table.col2 + v[1] for v in increments
}
# Get the product
query(Table) \
    .filter(Products.col1.in_(ids)) \
    .update({
        Table.col2: case(
            payload, 
            value=Table.col1
        )
    })

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 onlyphantom
Solution 3 Robert Franklin