'Sqlite executemany and DELETE
Execute many seems to be very slow with deletion (Insertion is fine) and I was wondering if anyone knows why it takes so long.
Consider the code below:
import sqlite3
db = sqlite3.connect("mydb")
c = db.cursor()
c.execute("DROP TABLE IF EXISTS testing ")
c.execute("CREATE TABLE testing (val INTEGER);")
my_vals2 = [[x] for x in range(1,10000)]
def insertmany(vals):
c.executemany("INSERT INTO testing (val) VALUES (?)",vals)
db.commit()
def deletemany1(vals):
c.executemany("DELETE FROM testing WHERE val=?",vals)
db.commit()
def deletemany2(vals): #this is fastest even though im looping over to convert to strings and again to join ...
vals = ["'%s'"%v[0] for v in vals]
c.execute("DELETE FROM testing WHERE val IN (%s)"%",".join(vals))
#DELETE FROM TABLE WHERE x in (1,2,3...)
And the following time results (timeit was giving funny data so :/) from IPython:
%time insertmany(my_vals2)
#CPU times: user 0.60 s, sys: 0.00 s, total: 0.60 s Wall time: 0.60 s
%time deletemany1(my_vals2)
#CPU times: user 3.58 s, sys: 0.00 s, total: 3.58 s Wall time: 3.58 s
%time deletemany2(my_vals2)
#CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s Wall time: 0.02 s
And just for the sake of completeness here are the timeit results (but I think timeit is broken on the second test(that or the ms is a different unit then the first test))
%timeit insertmany(my_vals2)
#1 loops, best of 3: 358 ms per loop
%timeit deletemany1(my_vals2)
#1 loops, best of 3: 8.34 ms per loop <- this is not faster than the above!!!! (timeit lies?)
%timeit deletemany2(my_vals2)
#100 loops, best of 3: 2.3 ms per loop
So why is executemany
soooooo slow with deletes?
Solution 1:[1]
SQLites stores table records in a B+ tree, sorted by rowid
.
When you are inserting with an automatically generated rowid
, all records are just appended at the end of the table.
However, when deleting, SQLite has to search for the record first. This is slow if the id
column is not indexed; either create an explicit index (as proposed by John), or declare the column as INTEGER PRIMARY KEY
to make it the rowid.
Inserting with an index becomes faster if you don't use the index, i.e., if you create the index only after bulk inserts.
Your last delete command deletes all records at once. If you know that you're deleting all records in the table, you could speed it up even further by using just DELETE FROM testing
, which doesn't need to look at any records at all.
Solution 2:[2]
I'm just taking a punt: Because it has to search exhaustively for the ones to delete. Try it with an index and report back.
CREATE INDEX foo ON testing (val)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | CL. |
Solution 2 | John Mee |