'PostgreSQL out of memory: Linux OOM killer

I am having issues with a large query, that I expect to rely on wrong configs of my postgresql.config. My setup is PostgreSQL 9.6 on Ubuntu 17.10 with 32GB RAM and 3TB HDD. The query is running pgr_dijkstraCost to create an OD-Matrix of ~10.000 points in a network of 25.000 links. Resulting table is thus expected to be very big ( ~100'000'000 rows with columns from, to, costs). However, creating simple test as select x,1 as c2,2 as c3 from generate_series(1,90000000) succeeds.

The query plan:

                                     QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Function Scan on pgr_dijkstracost  (cost=393.90..403.90 rows=1000 width=24)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=196.82..196.83 rows=1 width=32)
           ->  Seq Scan on building_nodes b  (cost=0.00..166.85 rows=11985 width=4)
   InitPlan 2 (returns $1)
     ->  Aggregate  (cost=196.82..196.83 rows=1 width=32)
           ->  Seq Scan on building_nodes b_1  (cost=0.00..166.85 rows=11985 width=4)

This leads to a crash of PostgreSQL:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited      
normally and possibly corrupted shared memory.

Running dmesg I could trace it down to be an Out of memory issue:

Out of memory: Kill process 5630 (postgres) score 949 or sacrifice child
[ 5322.821084] Killed process 5630 (postgres) total-vm:36365660kB,anon-rss:32344260kB, file-rss:0kB, shmem-rss:0kB
[ 5323.615761] oom_reaper: reaped process 5630 (postgres), now anon-rss:0kB,file-rss:0kB, shmem-rss:0kB
[11741.155949] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
[11741.155953] postgres cpuset=/ mems_allowed=0

When running the query I also can observe with topthat my RAM is going down to 0 before the crash. The amount of committed memory just before the crash:

$grep Commit /proc/meminfo
CommitLimit:    18574304 kB
Committed_AS:   42114856 kB

I would expect the HDD is used to write/buffer temporary data, when RAM is not enough. But the available space on my hdd does not change during the processing. So I began to dig for missing configs (expecting issues due to my relocated data-directory) and following different sites:

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT https://www.credativ.com/credativ-blog/2010/03/postgresql-and-linux-memory-management

My original settings of postgresql.conf are default except for changes in the data-directory:

data_directory = '/hdd_data/postgresql/9.6/main'
shared_buffers = 128MB          # min 128kB
#huge_pages = try           # on, off, or try
#temp_buffers = 8MB         # min 800kB
#max_prepared_transactions = 0      # zero disables the feature
#work_mem = 4MB             # min 64kB
#maintenance_work_mem = 64MB        # min 1MB
#replacement_sort_tuples = 150000   # limits use of replacement selection sort
#autovacuum_work_mem = -1       # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB          # min 100kB
dynamic_shared_memory_type = posix  # the default is the first option

I changed the config:

shared_buffers = 128MB
work_mem = 40MB             # min 64kB
maintenance_work_mem = 64MB

Relaunched with sudo service postgresql reload and tested the same query, but found no change in behavior. Does this simply mean, such a large query can not be done? Any help appreciated.



Solution 1:[1]

I'm having similar trouble, but not with PostgreSQL (which is running happily): what is happening is simply that the kernel cannot allocate more RAM to the process, whichever process it is.

It would certainly help to add some swap to your configuration. To check how much RAM and swap you have, run: free -h On my machine, here is what it returns:

               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       5.3Gi       928Mi       865Mi       1.5Gi       1.3Gi
Swap:          9.4Gi       7.1Gi       2.2Gi

You can clearly see that my machine is quite overloaded: about 8Gb of RAM, and 9Gb of swap, from which 7 are used.

When the RAM-hungry process got killed after Out of memory, I saw both RAM and swap being used at 100%.

So, allocating more swap may improve our problems.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pierre