Simple "parallel for" using C++11 std::thread

Alec Jacobson

February 25, 2016

weblog/

It seems like Apple's built-in clang will never support OpenMP. I really used OpenMP only for its #omp parallel for. Now that C++11 has support for multi-threading, I tried to write a very basic parallel for without OpenMP.

#include <vector>
#include <thread>
#include <iostream>

int main(int argc, char *argv[])
{
  const size_t nloop = 11;

  // Serial version
  {
    // Pre loop
    std::cout<<"serial:"<<std::endl;
    // loop over all items
    for(int i = 0;i<nloop;i++)
    {
      // inner loop
      {
        const int j = i*i;
        std::cout<<j<<std::endl;
      }
    }
    // Post loop
    std::cout<<std::endl;
  }

  // Parallel version
  // number of threads
  const size_t nthreads = std::thread::hardware_concurrency();
  {
    // Pre loop
    std::cout<<"parallel ("<<nthreads<<" threads):"<<std::endl;
    std::vector<std::thread> threads(nthreads);
    std::mutex critical;
    for(int t = 0;t<nthreads;t++)
    {
      threads[t] = std::thread(std::bind(
        [&](const int bi, const int ei, const int t)
        {
          // loop over all items
          for(int i = bi;i<ei;i++)
          {
            // inner loop
            {
              const int j = i*i;
              // (optional) make output critical
              std::lock_guard<std::mutex> lock(critical);
              std::cout<<j<<std::endl;
            }
          }
        },t*nloop/nthreads,(t+1)==nthreads?nloop:(t+1)*nloop/nthreads,t));
    }
    std::for_each(threads.begin(),threads.end(),[](std::thread& x){x.join();});
    // Post loop
    std::cout<<std::endl;
  }
}

It's not as simple as slapping down #omp parallel for but it's really just a few lines above and below the for loop. It can even determine the number of cores available and handle simple atomic operations.