'SQLite3 database or disk is full on csv imports

This issue has been discussed on a number of threads, but none of the proposals seem to apply to my case.

I have a very large sqlite database (4Tb). I am trying to import csv files from the terminal

sqlite3 -csv -separator "       " /data/mydb.db ".import '|cat *.csv' mytable"

I intermittently receive SQLite3 database or disk is full errors. Re-running the command after an error usually succeeds.

Some notes:

  1. /data has 3.2Tb free
  2. /tmp has 1.8Tb free.
  3. *.csv takes up approximately 802Gb.
  4. Both /tmp and /data are using ext4 which has a maximum file size of 16tb.
  5. The only process accessing the database is the one mentioned above.
  6. PRAGMA integrity_check returns ok.
  7. Test on both -sqlite3 --version - 3.38.1 2022-03-12 13:37:29 38c210fdd258658321c85ec9c01a072fda3ada94540e3239d29b34dc547a8cbc and 3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837balt1
  8. OS - Ubuntu 20.04

Any thoughts on what could be happening?

(Unless there is an informed reason for why I am exceeding the limits sqlite, I would prefer to avoid suggestions that I move to a client/server RDBMS.)



Solution 1:[1]

i didn't figure it out, but someone else did, am pretty sure this will "fix it" until you reach 8TB-ish:

sqlite3 ... "PRAGMA main.max_page_count=2147483647; .import '|cat *.csv' mytable"

However the invocation

sqlite3 ... "PRAGMA main.journal_mode=DELETE; PRAGMA main.max_page_count; PRAGMA main.max_page_count=2147483647; PRAGMA main.page_size=65536;VACUUM; import '|cat *.csv' mytable;"

should allow the db to grow to ~200TB, but that VACUUM command, which is needed to apply the new page_size, requires a lot of free space to run, and will probably use a long time =/

good news is that you only need to run that once and it should be a permanent change to your db, your next invocation only needs sqlite3 ... "import '|cat *.csv' mytable;"

notably, this will probably break again around ~200TB

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1