I'm running a simple kernel which adds two streams of double-precision complex-values. I've parallelized it using OpenMP with custom scheduling: the slice_indic
I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA
I'm attempting to benchmark the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platinum 8168: 24 cores @ 2.70 GHz, L1 cache 32 kB, L2 cache 1 MB a