'Intermittent SIGBUS on shared memory segment
I have a server process that allocates a big chunk of memory using the System V XSI shared memory calls (shmget/shmat), from address to 0x500000000 to 0x1d00000000, and then binds it to the first NUMA node. See pseudocode below (it should be logically correct but might not compile).
The problem is sometimes I get a SIGBUS signal when I access the memory, but most of the time I don't. It works most of the time. I am using signal handlers to gracefully catch the signals (including SIGSEGV too).
I don't understand why this is flaky. What can cause SIGBUS erratically?
size_t totalSize_ = 96 GiB;
const int flags = IPC_CREAT // creatE a new shared memory segment
| SHM_HUGETLB // huge pages
| SHM_NORESERVE // don't reserve swap space for this.
| 0666 // Leave the leading 0 (octal)
| SHM_R | SHM_W // These are redundant with 0666
;
shmid_ = shmget(key_, totalSize_, flags);
void* desiredBase = 0x0x500000000;
auto base = shmat(shmid_, desiredBase, 0);
uint32_t mask = 1; // first NUMA node
mbind(base , totalSize_ , MPOL_BIND, &mask, 64, 0) ;
And then I do some tests to make sure that the memory is accessible (because of the NUMA binding), by stepping through the address space and calling memcpy every GiB.
for(addr = base; addr < last, addr += 1GiB)
memcpy(addr, "testpattern",10);
and then I look for SIGBUS signals. If NUMA node doesn't have enough huge page memory assigned to it, I get SIGBUS signals. That's fine. However, if I restart the server with the example same code and settings, sometimes it gives me a SIGBUS on the 3rd huge page. And then I restart it again, and it works fine for all 96 huge pages. Our system has 100 GiB of huge page memory per NUMA node (4 nodes).
How do I debug this? What log files are useful? And does it make any sense to add a retry loop on the memcpy until the SIGBUS goes away?
The first part of /proc/self/maps
looks like this:
00400000-01213000 r-xp 00000000 103:07 4986562 /opt/daemon-0.0.9/bin/daemon
01413000-01415000 r-xp 00e13000 103:07 4986562 /opt/daemon-0.0.9/bin/daemon
01415000-0141b000 rwxp 00e15000 103:07 4986562 /opt/daemon-0.0.9/bin/daemon
0141b000-01427000 rwxp 00000000 00:00 0
01b5d000-01bfa000 rwxp 00000000 00:00 0 [heap]
500000000-1d00000000 rwxs 00000000 00:0d 44072961 /SYSV50420051 (deleted)
2aaac0000000-2aab00000000 rwxp 00000000 00:0d 83594486 /anon_hugepage (deleted)
7ff9e4000000-7ff9e4021000 rwxp 00000000 00:00 0
7ff9e4021000-7ff9e8000000 ---p 00000000 00:00 0
Solution 1:[1]
The SHM_NORESERVE
among your flags is a likely culprit. From the docs for shmget()
:
Do not reserve swap space for this segment. When swap space is reserved, one has the guarantee that it is possible to modify the segment. When swap space is not reserved one might get
SIGSEGV
upon a write if no physical memory is available.
(Emphasis added)
The fact that the segment creation, binding, and subsequent access usually works suggests some kind of contextual effect, such as arising from the amount of physical memory presently available, and a segfault is explicitly among the risks of creating the segment with that flag.
SIGSEGV
is not exactly the same thing as SIGBUS
, but the two are generated under similar circumstances.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | John Bollinger |