'Why is my .Net app only using single NUMA node?

I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows.

In my app I am starting 64 threads, using Thread.Start(). When I run the app, it's CPU intensive, only the first 16 CPUs are busy, the other 16 CPUs are idle.

Why? I am using Interlocked.Increment() a lot. Could this be a reason? Is there a way I can start threads on a specific NUMA node?



Solution 1:[1]

In addition to gcserver we should enable GCCpuGroup and Thread_UseAllCpuGroups so the config should be more like:

<configuration
   <runtime>
      <gcServer enabled="true"/>
      <GCCpuGroup  enabled="true"/>
      <Thread_UseAllCpuGroups  enabled="true"/>
   </runtime>
</configuration>

GcCpuGroup enables Garbage Collection for multiple CPU groups and Thread_UseAllCpuGroups enables manage thread distribution across all CPU groups for the runtime.

Solution 2:[2]

First thing to check would be indeed the app.config making sure the necessary options are set:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
    <runtime>
        <gcServer enabled="true" />
        <Thread_UseAllCpuGroups enabled="true" />
        <GCCpuGroup enabled="true" />
    </runtime>
    <startup> 
        <!-- 4.5 and later should work, use the one targeted -->
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2"/>       
    </startup>
</configuration>

If app.config-Wizadry isn't helping, is likely that your machine uses multiple kernel groups (Kgroups) when it shouldn't. You can then check your BIOS for NUMA Group Size Optimization if you have Gen9 HP. If it is in Clustered mode, the current CLR (2017, .net 4.6.2) only utilizes the first one. If you have no more than 64 cores in that machine, you should be able select the Flat layout which puts all cores in the same group. If you cannot find it, you may need a BIOS Update.

For lot more details see Unable to use more than one processor group for my threads in a C# app here on StackOverflow. It even comes with its own diagnostics tool.

Solution 3:[3]

Have you set the garbage collector to the server version?

In app.config, try:

<configuration
   <runtime>
      <gcServer enabled="true"/>
   </runtime>
</configuration>

Because of the way the heaps are allocated the server GC makes a massive difference when churning a lot of objects/data on a lot of threads in a machine with many cores.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user3473830
Solution 2 Community
Solution 3 Nathan Tuggy