using OpenMP in a Qt/C++ program - c++

I would like to parallelize a Qt/C++ program with OpenMP, so I can compare against Qt threading tools. I have some questions.
What do I have to include, both in code and in project files to have OpenMP working properly?
Would it be painful to use OpenMP in a deliverable software project? Should each versions include updates for OpenMP and much maintenance?
What do you experienced as performance with OpenMP?
Is nested parallel work with OpenMP trustworthy?
Is OpenMP supported on the same platforms as Qt?
Any references would be appreciated. Thanks a lot!

What do I have to include, both in code and in project files to have OpenMP working properly?
You'll have to introduce OpenMP pragmas in the code, and possibly to link against the OpenMP runtime library (which will introduce limited changes in your build system).
Would it be painful to use OpenMP in a deliverable software project? Should each versions include updates for OpenMP and much maintenance?
I'm not sure what you mean by "painful". I know a lot of projects successfully using OpenMP. There might be some maintenance needed from time to time (but I guess this is also true for Qt).
What do you experienced as performance with OpenMP?
More or less what should be expected of any good thread-based parallelization tool. If the workload is sufficiently heavy, OpenMP in itself should not add much overhead to your code and Amdahl's law will be your limit.
Is nested parallel work with OpenMP trustworthy?
Yes
Is OpenMP supported on the same platforms as Qt?
Unlike Qt, which is a framework, OpenMP support is mostly done by compilers. You might find platforms on which Qt is compilable, but where the C++ compilers don't support OpenMP. Of course, it depends on what type of platforms you target.

Related

Intel TBB vs CilkPlus [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am developing time-demanding simulations in C++ targeting Intel x86_64 machines.
After researching a little, I found two interesting libraries to enable parallelization:
Intel Threading Bulding Blocks
Intel Cilk Plus
As stated in docs, they both target parallelism on multicore processors but still haven't figured which one is the best. AFAIK Cilkplus simply implements three keywords for an easier parallelism (which causes GCC to be recompiled to support these keywords); while TBB is just a library to promote a better parallel development.
Which one would you recommend?
Consider that I am having many many many problems installing CilkPlus (still trying and still screaming). So I was wondering, should I go check TBB first? Is Cilkplus better than TBB? What would you recommend?
Are they compatible?
Should I accomplish installing CilkPlus (still praying for this), would it be possible to use TBB together with it? Can they work together? Is there anyone who did experience sw develpment with both CiclkPlus and TBB? Would you recommend working with them together?
Thankyou
Here are some FAQ type of information to the question in the original post.
Cilk Plus vs. TBB vs. Intel OpenMP
In short it depends what type of parallelization you are trying to implement and how your application is coded.
I can answer this question in context to TBB. The pros of using TBB are:
No compiler support needed to run the code.
Generic C++ algorithms of TBB lets user create their own objects and map them to thread as task.
User doesn't need to worry about thread management. The built in task scheduler automatically detects the number of possible hardware threads. However user can chose to fix the number of threads for performance studies.
Flow graphs for creating tasks that respect dependencies easily lets user exploit functional as well as data parallelism.
TBB is naturally scalable obviating the need for code modification when migrating to larger systems.
Active forum and documentation being continually updated.
with intel compilers, latest version of tbb performs really well.
The cons can be
Low user base in the open source community making it difficult to find examples
examples in documentations are very basic and in older versions they are even wrong. However the Intel forum is always ready to extend support to resolve issues.
the abstraction in the template classes are very high making the learning curve very steep.
the overhead of creating tasks is high. User has to make sure that the problem size is large enough for the partitioner to create tasks of optimal grain size.
I have not worked with cilk either, but it's apparent that if at all there are users in the two domain, the majority is that of TBB. It's likely if Intel pushes for TBB by it's updated document and free support, the user community in TBB grows
They can be used in complement to each other (CILK and TBB). Usually, thats the best. But from my experience you will use TBB the most.
TBB and CILK will scale automatically with the number of cores. (by creating a tree of tasks, and then using recursion at run-time).
TBB is a runtime library for C++, that uses programmer defined Task Patterns, instead of threads. TBB will decide - at run-time - on the optimal number of threads, tasks granularity and performance oriented scheduling (Automatic load balancing through tasks stealing, Cache efficiency and memory reusing). Create tasks recursively (for a tree this is logarithmic in number of tasks).
CILK(plus) is a C/C++ language extension requires compiler support.
Code may not be portable to different compilers and operating systems. It supports fork-join parallelism. Also, it is extremely easy to parallelize recursive algorithms. Lastly, it has a few tools (spawn, sync), with which you can parallelize a code very easily. (not a lot of rewrite is needed!).
Other differences, that might be interesting:
a) CILK's random work stealing schedule for countering "waiting" processes.
a) TBB steals from the most heavily loaded process.
Is there a reason you can't use the pre-built GCC binaries we make available at https://www.cilkplus.org/download#gcc-development-branch ? It's built from the cilkplus_4-8_branch, and should be reasonably current.
Which solution you choose is up to you. Cilk provides a very natural way to express recursive algorithms, and its child-stealing scheduler can be very cache friendly if you use cache-oblivious algorithms. If you have questions about Cilk Plus, you'll get the best response to them in the Intel Cilk Plus forum at http://software.intel.com/en-us/forums/intel-cilk-plus/.
Cilk Plus and TBB are aware of each other, so they should play well together if you mix them. Instead of getting a combinatorial explosion of threads you'll get at most the number of threads in the TBB thread pool plus the number of Cilk worker threads. Which usually means you'll get 2P threads (where P is the number of cores) unless you change the defaults with library calls or environment variables. You can use the vectorization features of Cilk Plus with either threading library.
- Barry Tannenbaum
Intel Cilk Plus developer
So, as a request from the OP:
I have used TBB before and I'm happy with it. It has good docs and the forum is active. It's not rare to see the library developers answering the questions. Give it a try. (I never used cilkplus so I can't talk about it).
I worked with it both in Ubuntu and Windows. You can download the packages via the package manager in Ubuntu or you can build the sources yourself. In that case, it shouldn't be a problem. In Windows I built TBB with MinGW under the cygwin environment.
As for the compatibility issues, there shouldn't be none. TBB works fine with Boost.Thread or OpenMP, for example; it was designed so it could be mixed with other threading solutions.

When should the OpenMP library be used?

As I understand, OpenMP is a standard and also a library to implement multi-threading in C++ code.
Visual C++ already has threading APIs for Windows, and UNIX has POSIX threading. I don't undertand why it is required, or in which scenario it is applicable to use OpenMP.
EDIT : Does OpenMP has improved performance too, in comparision to using CreateThread or other POSIX functions (assuming similar code was parellilized)?
System threading APIs (such as POSIX threads) require you to do an awful lot of work manually (setting up the threads, splitting up the work between the threads, synchronising when they are complete, tearing down the threads, etc.). Lots and lots of code bloat that obscures what you're really trying to do. And error-prone. And tedious. And platform-dependent.
OpenMP does all of this for you. In my opinion, it's most suitable for data-parallelism; in many cases, it's as simple as putting a #pragma omp directive before e.g. a for loop, and that loop will be automatically multi-threaded. But it can also be used for task-parallelism as well.
OpenMP doesn't improve performance, in the sense that it's always possible to write manual threading code that performs at least as well as the OpenMP version. But very often, OpenMP will get you 90%+ of theoretical optimum performance, with 5 minutes of coding effort (assuming you have written your loops in a thread-friendly way in the first place).
I recommend reading the Wikipedia article for some good examples.
When you're trying to do a portable code for example. OpenMP works both on windows and unix systems.
Moreover, it's most of the time a lot easier to use than manipulating threads.

Cross platform multithreading in C++?

Basically, the title explains it all; I'm looking to make a game in C++ and I want to use multithreading for stuff like the physics engine and keeping the animation smooth on the loading screen. I've seen a few multithreading libraries, but I'm wondering which is best for my application, which will work well on Windows Mac and Linux. Does such a library exist?
You probably want boost::thread or Intels' Thread Building Blocks. I'd recommend TBB but it's not free, I think, so boost::thread for the free option.
If you can use c++0x threads, then use that.
If not, boost::thread is the best free multi-platform library.
My favourite is QThread. Part of Qt library.
Currently my recommendation would be OpenMP (libgomp on g++, IBM XlC++, MSVC++ all support it)
OpenMP offers a simple way of exploiting parallelism without interfering with algorithm design; an OpenMP program compiles and operates correctly in both parallel and serial execution environments. Using OpenMP's directive-based parallelism also simplifies the act of converting existing serial code to efficient parallel code.
See msdn
And GOMP
for starting points
Random quote:
To remain relevant, free software development tools must support emerging technologies. By implementing OpenMP, GOMP provides a simplified syntax tools for creating software targeted at parallel architectures. OpenMP's platform-neutral syntax meshes well with the portability goals of GCC and other GNU projects
Another nice library that includes cross platform threads is poco

Data parallel libraries in C/C++

I have a C# prototype that is heavily data parallel, and I've had extremely successful order of magnitude speed ups using the Parallel.For construct in .NETv4. Now I need to write the program in native code, and I'm wondering what my options are. I would prefer something somewhat portable across various operating systems and compilers, but if push comes to shove, I can go with something Windows/VC++.
I've looked at OpenMP, which looks interesting, but I have no idea whether this is looked upon as a good solution by other programmers. I'd also love suggestions for other portable libraries I can look at.
If you're happy with the level of parallelism you're getting from Parallel.For, OpenMP is probably a pretty good solution for you -- it does roughly the same kinds of things. There's also work and research being done on parallelized implementations of the algorithms in the standard library. Code that uses the standard algorithms heavily can gain from this with even less work.
Some of this is done using OpenMP, while other is using hand-written code to (attempt to) provide greater benefits. In the long term, we'll probably see more of the latter, but for now, the OpenMP route is probably a bit more practical.
If you're using Parallel.For in .Net 4.0, you should also look at the Parallel Pattern Library in VS2010 for C++; many things are similar and it is library based.
If you need to run on another platform besides Windows the PPL is also implemented in Intel's Thread Building Blocks which has an open source version.
Regarding the other comment on similarities / differences vs. .Net 4.0, the parallel loops in the PPL (parallel_for, parallel_for_each, parallel_invoke) are virtually identical to the .Net 4.0 loops. The task model is slightly different, but should be straightforward.
You should check out Intel's Thread Building Blocks. Visual Studio 2010 also offers a Concurrency Runtime native. The .NET 4.0 libraries and the ConcRT are designed very similarly, if not identically.
if you want something that is versatile, as in portable across various OS and environments, it would be very difficult to not to consider Java. And they are very similar to C# so it be a very easy transition.
Unless you want to pull out your ninja scalpel and wanting to make your code extremely efficient, I would say java over VC++ or C++.

Package for distributing calculations

Do you know of any package for distributing calculations on several computers and/or several cores on each computer? The calculation code is in c++, the package needs to be able to cope with data >2GB and work on a windows x64 machine. Shareware would be nice, but isn't a requirement.
A suitable solution would depend on the type of calculation and data you wish you process, the granularity of parallelism you wish to achieve, and how much effort you are willing to invest in it.
The simplest would be to just use a suitable solver/library that supports parallelism (e.g.
scalapack). Or if you wish to roll your own solvers, you can squeeze out some paralleisation out of your current code using OpenMP or compilers that provide automatic paralleisation (e.g Intel C/C++ compiler). All these will give you a reasonable performance boost without requiring massive restructuring of your code.
At the other end of the spectrum, you have the MPI option. It can afford you the most performance boost if your algorithm parallelises well. It will however require a fair bit of reengineering.
Another alternative would be to go down the threading route. There are libraries an tools out there that will make this less of a nightmare. These are worth a look: Boost C++ Parallel programming library and Threading Building Block
You may want to look at OpenMP
There's an MPI library and the DVM system working on top of MPI. These are generic tools widely used for parallelizing a variety of tasks.