'Discussion around potential reasons for going over Postgres `work_mem`
Curious to get some discussion / opinion on a particular performance problem I'm having in Postgres. Greatly appreciate anyone who chimes in!
The essence / nature of the problem and some questions:
I have a long running pipeline in which data flows into the database via many single upserts inside of a single transaction that gets committed once at the end of the pipeline. Each of these upserts seem to be causing us to go over
work_mem
and are consequently writing to disk. (Individual upserts can be seen going over in the logs)There are two unique indexes on the table we're upserting against: one of which is required for the pipeline to run and one which isn't. Could dropping the non required index before the pipeline and rebuilding it after help alleviate some pressure on our work_mem, and if so, why?
Due to some nuance around how I'm receiving the data, the pipeline running in one transaction felt like a requirement at the time I engineered all of this. Could it be possible that running all of these upserts in a single transaction is contributing to the problem, and if so, why?
For context, the underlying SQL for each of these upserts looks something like below. Is there some nuance around how these are done in CTEs and then selected from that could be contributing to the problem?
We can assume each of these individual upserts contains a
MASSIVE_JSON_PAYLOAD
that is smaller than the currentwork_mem
value.
WITH
cte__some_table_inserted_rowsets AS (
INSERT INTO
"some_table" (...SOME_FIELDS, BIG_JSONB_ARRAY_FIELD)
VALUES
(
...SOME_FIELDS
...BIG_JSON_PAYLOAD
)
ON CONFLICT ("some_field", "some_field_2", "some_field_3")
DO UPDATE SET
...APPEND BIG_JSON_PAYLOAD TO EXISTING BIG_JSONB_ARRAY_FIELD
RETURNING *
)
SELECT * FROM cte__some_table_inserted_rowsets;
The logs that get reported look like something along the lines of:
sql_error_code = 00000 LOG: temporary file: path "...", size n
sql_error_code = 00000 STATEMENT: ...THE_UPSERT
The flag responsible for enabling / tweaking these kinds of logs: log_temp_files
This is all being done in PG10, and as far as how big size n
is comparatively:
work_mem
is currently set to 32mb.- The payload being processed by a single one of these upserts will never be more than a few mb.
- It’s very common to see one of these logs for a single upsert having a
size n
of 50+mb
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|