Category "microbenchmark"

How to test the problem size scaling performance of code

I'm running a simple kernel which adds two streams of double-precision complex-values. I've parallelized it using OpenMP with custom scheduling: the slice_indic

Measuring bandwidth on a ccNUMA system

I'm attempting to benchmark the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platinum 8168: 24 cores @ 2.70 GHz, L1 cache 32 kB, L2 cache 1 MB a