Intel TBB vs CilkPlus [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am developing time-demanding simulations in C++ targeting Intel x86_64 machines.
After researching a little, I found two interesting libraries to enable parallelization:
Intel Threading Bulding Blocks
Intel Cilk Plus
As stated in docs, they both target parallelism on multicore processors but still haven't figured which one is the best. AFAIK Cilkplus simply implements three keywords for an easier parallelism (which causes GCC to be recompiled to support these keywords); while TBB is just a library to promote a better parallel development.
Which one would you recommend?
Consider that I am having many many many problems installing CilkPlus (still trying and still screaming). So I was wondering, should I go check TBB first? Is Cilkplus better than TBB? What would you recommend?
Are they compatible?
Should I accomplish installing CilkPlus (still praying for this), would it be possible to use TBB together with it? Can they work together? Is there anyone who did experience sw develpment with both CiclkPlus and TBB? Would you recommend working with them together?

Here are some FAQ type of information to the question in the original post.
Cilk Plus vs. TBB vs. Intel OpenMP
In short it depends what type of parallelization you are trying to implement and how your application is coded.

I can answer this question in context to TBB. The pros of using TBB are:
No compiler support needed to run the code.
Generic C++ algorithms of TBB lets user create their own objects and map them to thread as task.
User doesn't need to worry about thread management. The built in task scheduler automatically detects the number of possible hardware threads. However user can chose to fix the number of threads for performance studies.
Flow graphs for creating tasks that respect dependencies easily lets user exploit functional as well as data parallelism.
TBB is naturally scalable obviating the need for code modification when migrating to larger systems.
Active forum and documentation being continually updated.
with intel compilers, latest version of tbb performs really well.
The cons can be
Low user base in the open source community making it difficult to find examples
examples in documentations are very basic and in older versions they are even wrong. However the Intel forum is always ready to extend support to resolve issues.
the abstraction in the template classes are very high making the learning curve very steep.
the overhead of creating tasks is high. User has to make sure that the problem size is large enough for the partitioner to create tasks of optimal grain size.
I have not worked with cilk either, but it's apparent that if at all there are users in the two domain, the majority is that of TBB. It's likely if Intel pushes for TBB by it's updated document and free support, the user community in TBB grows

They can be used in complement to each other (CILK and TBB). Usually, thats the best. But from my experience you will use TBB the most.
TBB and CILK will scale automatically with the number of cores. (by creating a tree of tasks, and then using recursion at run-time).
TBB is a runtime library for C++, that uses programmer defined Task Patterns, instead of threads. TBB will decide - at run-time - on the optimal number of threads, tasks granularity and performance oriented scheduling (Automatic load balancing through tasks stealing, Cache efficiency and memory reusing). Create tasks recursively (for a tree this is logarithmic in number of tasks).
CILK(plus) is a C/C++ language extension requires compiler support.
Code may not be portable to different compilers and operating systems. It supports fork-join parallelism. Also, it is extremely easy to parallelize recursive algorithms. Lastly, it has a few tools (spawn, sync), with which you can parallelize a code very easily. (not a lot of rewrite is needed!).
Other differences, that might be interesting:
a) CILK's random work stealing schedule for countering "waiting" processes.
a) TBB steals from the most heavily loaded process.

Is there a reason you can't use the pre-built GCC binaries we make available at ? It's built from the cilkplus_4-8_branch, and should be reasonably current.
Which solution you choose is up to you. Cilk provides a very natural way to express recursive algorithms, and its child-stealing scheduler can be very cache friendly if you use cache-oblivious algorithms. If you have questions about Cilk Plus, you'll get the best response to them in the Intel Cilk Plus forum at
Cilk Plus and TBB are aware of each other, so they should play well together if you mix them. Instead of getting a combinatorial explosion of threads you'll get at most the number of threads in the TBB thread pool plus the number of Cilk worker threads. Which usually means you'll get 2P threads (where P is the number of cores) unless you change the defaults with library calls or environment variables. You can use the vectorization features of Cilk Plus with either threading library.
- Barry Tannenbaum
Intel Cilk Plus developer

So, as a request from the OP:
I have used TBB before and I'm happy with it. It has good docs and the forum is active. It's not rare to see the library developers answering the questions. Give it a try. (I never used cilkplus so I can't talk about it).
I worked with it both in Ubuntu and Windows. You can download the packages via the package manager in Ubuntu or you can build the sources yourself. In that case, it shouldn't be a problem. In Windows I built TBB with MinGW under the cygwin environment.
As for the compatibility issues, there shouldn't be none. TBB works fine with Boost.Thread or OpenMP, for example; it was designed so it could be mixed with other threading solutions.


Intel Thread Building Blocks alternatives & licensing

Before you mark this question as a duplicate please take in to consideration that most of the similar questions are 5+ years old!
I have two questions:
The dual-licensing. What does it mean?! Would I have to buy the commercial version to make a commercial closed-source project?
At this page, it says that
IntelĀ® TBB is offered commercially for customers who want the additional support that comes with IntelĀ® Premier Support. The commercial version is also available for developers who cannot follow the GPLv2 with the runtime exception license. - See more at:
But in this answer he says that the only advantage of buying it, is that you get support. Please notice this question is 7 years old, and therefore I can't trust it, as things could have changed.
If I can't use TBB for a closed-source commercial project, what are som alternatives? I will most likely only need features like concurrent maps and queues.
Edit: Also if the commercial version is required, could I wait buying it till release of my app?
Re: #2 (TBB alternatives), if you're on Windows, the PPL provides parallel containers and algorithms that are somewhat source compatible with TBB.
Also, Boost.Lockfree has lock-free queue and stack implementations.
If you need parallel algorithms and don't mind being on the bleeding edge, take a look at HPX as an alternative to TBB. It's under very active development, though, so it might be a bit of a moving target... In their latest 0.9.11 release they've implemented some aspects of the Parallelism TS, so there might be some API stability there that could make you (somewhat) well positioned to transition to standard algorithms if those ever materialize. Relevant docs are here.

c++11 multi threading vs boost_thread

I am a beginner of c++ parallel computing. However, my project requires that I would need to use c++98 (stdlibc++) for it. I search online and it seems most of the tutorials is based on c++11 thread. I noted that boost_thread is an implementation for c++98 but there seems to be much less available tutorial. So I would like to ask what is the best way for me to learn and implement parallel computing for my project.
Eventually, my project would require calculations based on hundreds of cores and computing nodes. Would multi-threading be sufficient or do I have to use Boost_MPI? Thank you.
If you are limited to c++98 that means that you won't have all the thread managing and locking mechanisms as part of the language.
Therefore you will have to implement them by yourself based on available OS APIs.
There are different APIs for Windows and Linux.
Here is an example of C++ wrapper for Linux pthread library.
And this is an example of C++ wrapper for Windows Threads.
So your project won't be portable unless you create (or find somewhere) a class which hides these libraries behind a common interface under which it implements the same logic for Windows and Linux differed by #ifdef WINDOWS / #ifdef LINUX.
what is the best way for me to learn and implement parallel computing
for my project.
There is no a correct answer for this. Look for some basic Multi Threading tutorials. Try to implement few simple programs (before you move to a big project) and come back when you face difficulties with more specific questions.
I have heard about boost but never used it so I can't provide any feedback on that. But again, you need to ask specific question. You can provide some specific requirements from your project and ask question based on them.
Anyway dive into boost documentation, you can find there threads related libraries (also pay attention for boost usage license).

Is threadpools okay for production? If not, is there any alternative libraries?

I am porting some Java code to C++ and wanted to find something that worked like Java's ThreadPoolExecutor. I saw a few posts suggesting threadpool but after reading a few other forums I have read about problems(memory leaks, etc..) and browsing the code base I see the last update was over 3 years ago. So my problem is, I'm not quite up to speed to make my own thread-pool library but I don't want to use something that is not actively maintained.
Upon looking around there's a few threadpool projects but they don't seem heavily used(I'm basing it on how family favorites/watches on github/ I was wondering what other people are using for threadpools in production environments? I'm looking for 2 types of thread-pools, one fixed and one that grows dynamically.
Which platform ? If windows and can use ppl using Visual C++ compiler, then take
a look at task_group and make_task methods to create tasks. Intel TBB is
another option.
If you can use boost, then boost concurrent programming APIs
can be useful.

When should the OpenMP library be used?

As I understand, OpenMP is a standard and also a library to implement multi-threading in C++ code.
Visual C++ already has threading APIs for Windows, and UNIX has POSIX threading. I don't undertand why it is required, or in which scenario it is applicable to use OpenMP.
EDIT : Does OpenMP has improved performance too, in comparision to using CreateThread or other POSIX functions (assuming similar code was parellilized)?
System threading APIs (such as POSIX threads) require you to do an awful lot of work manually (setting up the threads, splitting up the work between the threads, synchronising when they are complete, tearing down the threads, etc.). Lots and lots of code bloat that obscures what you're really trying to do. And error-prone. And tedious. And platform-dependent.
OpenMP does all of this for you. In my opinion, it's most suitable for data-parallelism; in many cases, it's as simple as putting a #pragma omp directive before e.g. a for loop, and that loop will be automatically multi-threaded. But it can also be used for task-parallelism as well.
OpenMP doesn't improve performance, in the sense that it's always possible to write manual threading code that performs at least as well as the OpenMP version. But very often, OpenMP will get you 90%+ of theoretical optimum performance, with 5 minutes of coding effort (assuming you have written your loops in a thread-friendly way in the first place).
I recommend reading the Wikipedia article for some good examples.
When you're trying to do a portable code for example. OpenMP works both on windows and unix systems.
Moreover, it's most of the time a lot easier to use than manipulating threads.

Package for distributing calculations

Do you know of any package for distributing calculations on several computers and/or several cores on each computer? The calculation code is in c++, the package needs to be able to cope with data >2GB and work on a windows x64 machine. Shareware would be nice, but isn't a requirement.
A suitable solution would depend on the type of calculation and data you wish you process, the granularity of parallelism you wish to achieve, and how much effort you are willing to invest in it.
The simplest would be to just use a suitable solver/library that supports parallelism (e.g.
scalapack). Or if you wish to roll your own solvers, you can squeeze out some paralleisation out of your current code using OpenMP or compilers that provide automatic paralleisation (e.g Intel C/C++ compiler). All these will give you a reasonable performance boost without requiring massive restructuring of your code.
At the other end of the spectrum, you have the MPI option. It can afford you the most performance boost if your algorithm parallelises well. It will however require a fair bit of reengineering.
Another alternative would be to go down the threading route. There are libraries an tools out there that will make this less of a nightmare. These are worth a look: Boost C++ Parallel programming library and Threading Building Block
You may want to look at OpenMP
There's an MPI library and the DVM system working on top of MPI. These are generic tools widely used for parallelizing a variety of tasks.