'Why is my parallel std::for_each only using 1 thread?

I'm trying to parallelize this C++ code (computing a continuous Fourier transform of points, modeled as Dirac impulses), and this code compiles and works correctly, but it only uses 1 thread. Is there something else I need to do to get multiple threads working? This is on a Mac with 4 cores (8 threads), compiled with GCC 10.

vector<double> GetFourierImage(const Point pts[],
                               const int num_samples,
                               const int res,
                               const double freq_step) {
  vector<double> fourier_img(res*res, 0.0);
  double half_res = 0.5 * res;

  vector<int> rows(res);
  std::iota(rows.begin(), rows.end(), 0);
  std::for_each(  // Why doesn't this parallelize?
      std::execution::par_unseq,
      rows.begin(), rows.end(),
      [&](int i) {
    double y = freq_step * (i - half_res);
    for (int j = 0; j < res; j++) {
      double x = freq_step * (j - half_res);

      double fx = 0.0, fy = 0.0;
      for (int pt_idx = 0; pt_idx < num_samples; pt_idx++) {
        double dot = (x * pts[pt_idx].x) + (y * pts[pt_idx].y);
        double exp = -2.0 * M_PI * dot;
        fx += cos(exp);
        fy += sin(exp);
      }
      fourier_img[i*res + j] = sqrt((fx*fx + fy*fy) / num_samples);
    }
  });

  return fourier_img;
}


Solution 1:[1]

In GCC 9, there was a hard dependency to TBB when using the different executions policies, if that were not present then the build would fail. That changed in GCC 10 (and present in GCC 11), where if the library was not present then the for_each would default to a sequential loop. This can be seen at https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.1.0/libstdc++-v3/include/bits/c++config#L679. To fix your issue, try linking to TBB with -ltbb. This resolved the same issue you were having on Ubuntu 20.04 using GCC 11.2.

Solution 2:[2]

I had the same problem on macOS. In my case, adding the path to the tbb header files to the include search path resolved the problem. For g++-11 and tbb installed with homebrew this was g++-11 -O3 -std=c++17 -I/opt/homebrew/include -o main main.cpp -ltbb; this directory contains a tbb header folder. If I do not add this flag, my code compiled but ran only single-threaded, as described by @Andrew. In my case, adding the -fopenmp flag was not necessary, but the -ltbb is, as pointed out by @Ryan H.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 mismou