'How to limit the memory that is available for PostgreSQL server?
For an academic experiment I need to restrict the total amount of memory that is available for a pgSQL server to compute a given set of queries.
I know that I can do this through postgresql.conf
file, where I can adjust some parameters related with Resource Management.
The problem is that: it's not clear for me--given the several parameters available on the config file--which is the parameter that I should change.
When I first opened the config file I'm expecting someting like this: max_server_memmory
. Instead I found a lot of: shared_buffers
, temp_buffers
, work_mem
, and so on...
Given that, I've consulted pgSQL docs. on Resource Consumption and I come up with the shared_buffers
as the best candidate for what I'm looking for: the parameter that restricts the total amount of memory that a pgSQL server can use to perform its computation. But I'm not completely sure about this.
Can you guys give me some insight about which parameters should I adjust to restrict the pgSQL server's memory, please?
Solution 1:[1]
Unfortunately, PostgreSQL does not have any easy to adjust parameter. One might think that there's one tunable along the lines max_memory_usage = 1500MB
and everything is automatically scaled to offer best performance for given resources.
However, as that's not available, the next best thing seems to be:
Actual max RAM = shared_buffers
+ (temp_buffers
+ work_mem
) * max_connections
As a rough guide, shared_buffers
should be set to 40% of the memory you are willing to use for PostgreSQL, max_connections
to maximum number of parallel connections you want to have and temp_buffers
and work_mem
so that you don't go over your RAM budget. If you don't use temporary tables, setting temp_buffers
to pretty low (default is 8 MB) will allow setting work_mem
a bit higher. The work_mem
is mostly used for sorting rows so you'll need lot if you handle queries dealing with high row count as intermediate or final result. PostgreSQL will work with very low settings if needed but many queries will then need to create temporary files on the server instead of keeping things in RAM which obviously results in sub-par performance.
Note that if you set shared_buffers
to higher than a couple of gigabytes, you should enable "huge pages" feature for your OS. Otherwise quite a big hunk of the RAM will be lost to virtual memory page tables and you'll also get slightly lower overall performance with default 4KB pages. Unfortunately, configuring huge pages is a bit complex task on any OS. (This is not a bug in PostgreSQL but a shortcoming of virtual memory handling on 32 and 64 bit x86 processors and huge pages setup is the best workaround for the issue. Again, everything will work without doing the huge pages dance but performance will not be as good as it could be.) That said, if you're running recent enough Linux kernel, you can usually just add something like vm.nr_overcommit_hugepages=25000
to the end of /etc/sysctl.conf
. This example will allow using up to 50 GB of your RAM for huge pages but this is in overcommit mode so it's not immediately reserved. PostgreSQL will then automatically try to reserve the shared_buffers
from this RAM area. Be warned that huge pages cannot be swapped out and if you run out of RAM with hugepages, kernel OOM Killer may be triggered even before swap is full. It's a good idea to set nr_overcommit_hugepages
to value less than your full system RAM – note that default size of a single hugepage is 2 MB so a good value could be 0.45 * real installed RAM in MB.
In addition, some internal limits are scaled according to max_connections
(e.g. predicate locks) so in some cases you need to set max_connections
way higher than your actual connection count is - especially if you do long transactions with SERIALIZABLE
or REPEATABLE READ
isolation levels. For larger installations, you also want to use pgbouncer
or pgpool
between the client and PostgreSQL to avoid having so many parallel connections that the performance is going to suffer. The best would be to limit actual max connections to CPU count x 1.5 or so, and use transaction pooling instead of connection pooling to get maximum performance. Depending on your use case this may not be possible if clients need to do longer transactions than a couple of queries, or your clients are not compatible with transaction pooling because they use connection specific settings or features. Note that if your clients are not active 100% of the transaction time, you may need to increase your max connection limit to get CPU usage up to 100% for max performance.
Note that if you run PostgreSQL 12.x or lesser, then PostgreSQL may go over the limit you have set for work_mem
if query planner misestimation causes hash aggregation to be incorrectly chosen. See https://dba.stackexchange.com/a/285423/29183 for details. Version 13.x doesn't suffer from such a bug and there misestimation only causes reduced performance for said query instead of going over the memory limit, as expected.
Solution 2:[2]
On postgresql wiki you can find the answer, but beforehand already say that the most that can be done are configurations in shared memories and amounts of simultaneous connections. See this link: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Arthur Melo |