C++ most common parallel processing method. Pthreads? [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Just need some insight over parallel processing methods on C++. My work on parallelizing untill now , has been on C, where the main method is using pthreads.
However I've also worked with openMP and cilkPlus.
I just want to ask , what is the common way of parallelizing code in C++ ? Are pthreads considered to be a good or bad implementation ? Shall I continue using them in C++ ?

C++ delivers concurrency in its standard library at different layers of abstraction.
The most low-level approach to threads is the use of std::thread in conjunction with the synchronization classes std::mutex, std::unique_lock, std::condition_variable etc. Furthermore, there is support for atomic operations (compare_exchange_weak/strong, fetch_add and alike), including memory ordering facilities like fences.
The next higher level of abstraction makes use of std::async, std::future and std::promise. Instead of having to control several threads on one's own here one just focusses on the execution of tasks, which may be sequential or parallel, whatever suits the most in the specific situation. The implementation even may decide on its own when to work parallel or not.
Finally, in C++17, there are concurrent implementations of known sequential STL-algorithms that use concurrency internally, like std::for_each and std::reduce which behaves similar to std::accumulate. This in conjunction with yet to come features, such as coroutines, ranges and execution contexts, can lead to very robust, performant and especially readable code.
Summarizing, there are several portable tools for parallel programming in C++. As with everything in programming, one should use the most abstract tool one can get in order to prevent dealing with details not relevant for the actual problem. Therefore, unless there is a good reason, I would not use pthreads in C++ any longer.

As a first choice, unless your needs are more complex, use std::thread, std::mutex, std::condition_variable etc
see documentation here:
http://en.cppreference.com/w/cpp/header/thread
http://en.cppreference.com/w/cpp/header/mutex
http://en.cppreference.com/w/cpp/header/future
http://en.cppreference.com/w/cpp/header/condition_variable
The standard thread tools are not fantastic or bleeding edge, but they are good enough for most tasks - certainly as a replacement for pthreads.
The advantage of the std:: constructs is that visibility of changes to shared data is guaranteed to be correct, even when the optimiser reorders loads and stores as part of its as-if pass.
If your needs are more complex, then building concurrency tools, or using a library that it built on top of them is a good idea.

Related

In the performance's point of view: will C++ 11 perform better than its predecessors [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I know that C++ 11 and later ones provide a lot of hands-on tools that make developers easier to work. However, in the performance's point of view, will C++ 11 perform better than its predecessors?
For an instance, C++ 11's std::thread can work cross-platforms but in C++ 0x we have to make a wrapper for both Windows and Linux. In terms of performance, will std::thread perform better?
C++11's main performance benefits are support for move construction/assignment and standard specified copy-elision. The latter improves a lot of code for free, and the former still benefits existing code that is recompiled using the STL collections and other standard types (which will often support and benefit from move semantics even if the code using them doesn't explicitly opt in; the benefits are limited without the code explicitly using std::move and the like appropriately, but still there).
The vast majority of the rest of it is essentially syntactic sugar to my knowledge (std::thread is just wrapping existing threading APIs; it's largely templated/inlined, so overhead is trivial or nonexistent, but neither is it somehow gaining you a performance benefit). That said, runtime performance is often less important than developer cycles; the syntactic sugar is a huge benefit because it means it's easier to write C++11. The existence of auto alone means that templates can do much more complex things, without forcing the developer to write absurd declarations describing what the template is doing; they just use auto and let type deduction figure it out.
Several C++11 features, namely move semantics, can result in dramatic performance improvement. But mostly for code that's explicitly written to take advantage of it.
But a lot of the performance improvements will also automatically be in scope for unmodified code, as the compiler will be able to automatically deduce where move semantics, and other new language features, can be used.
But, for best results, write your code to explicitly take advantage of C++11's features.
Some performance improvements will be indirect. Other language features make it easier, and faster, to write optimal code; I specifically have constexpr in mind. This often results in better performance, too.

Differences between Wrapper and Library [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am curious to know the differences between a Wrapper and a Library.
From what I have been able to find online I can't really see any major difference between the two. I often came across "Wrapper Library" or "Library Wrapper" and it makes it seem as if they are basically one in the same.
However, my assumption is that a Library is a collection of finely tuned functions that provide a means to accomplish a task that isn't part of the core functionality in the language.
And a Wrapper is a facade that makes it easier and quicker to set up certain functionality within your program so that you have less typing to do.
These two descriptions just make it seem like it's the same thing with different wording to me though.. so I come to you Stackoverflow,
What are your opinions and professional points of view on Wrappers/Libraries?
Sidenote/Background:
This question stems from my C++ Design Patterns Class I finished a month ago:
In my C++ Software Design Patterns class we had to create a Socket Library that took the WinSock Library and created a Thin(Or Thick, our choice) Wrapper over it. We then had to create two applications that used our library so that we didn't have to repeat the basic set up and tedious coding of the WinSock Library. I ended up with 100% on the project but wish to know more about the actual differences or similarities because I want to expand on this project in my personal time.
In general, I personally think of it like this:
A library is typically an actual concrete implementation of some functionality.
A wrapper is mostly just a layer of abstraction around existing functionality. It is only intended to offer a better, cleaner interface (or at least one feels more native to the language or technology it targets) to something that already exists.
Now obviously there are times when the distinction between the two is crystal clear and times when the line is blurred, so some subjectivity may be inevitable in the latter scenarios. In other words, what you consider an implementation and what you consider simply an abstraction can be ambiguous occasionally.
For instance, a wrapper can have argument checking functionality or better state bookkeeping that the underlying system did not have. If this is mostly to serve the correctness of the abstraction, then it this still leans on the side of the wrapper. If some genuinely new functionality is added, it starts becoming a library and could be called a wrapper-library.
On the other hand, a full-fledged library may not be a wrapper at all, in the sense that it may leverage an underlying system to offer some functionality, but without exposing any considerable part of that underlying system in any clean interface (other than the specific functionality it adds). This would be called a library, but most probably it wouldn't be called a wrapper.
(I should clarify that what I said is how I think of the matter in general, not specifically in regard to C++. If the C++ world has less ambiguous definitions of these terms, please do correct me.)
Library is an generic implementation of any sort of functionality. Wrapper is a specific implementation intended to provide abstraction to existing library.

dispatch_async() from Grand Central Dispatch and std::async from C++11

I have got some experience using GCD for concurrency and removing explicit locks and threads.
C++11 provides std::async, which seems to provide some similar features(, I'm not a C++ expert, please don't blame me for mistakes on it).
Putting aside arguments on flavours and language preferences, is there any benchmark comparing the two for their performance, especially for platforms like iOS?
Is c++11's std::async worth trying from a practical perspective?
EDIT:
As stackmonster answered, C++11 does not provide something exactly like a dispatch queue per se. However, isn't it possible to make an ad-hoc serial queue with atomic data structures(, and arguable lambda functions) to achieve this?
C++ 11 std::async is not nearly as sophisticated as grand central dispatch.
Its more analogous to the async concurrency model provided by the java.util.concurrent package
providing templates for callbacks but with no built in performance advantages.
I would say that the difference between them is simply this.
A callback template has no particular performance characteristics. GCD is all about performance, and threading/multiplexing those callbacks to reduce thread creation overhead and allowing queuing and task dependencies and thread pooling.
The launch policies of std::async do not compare in their sophistication to GCD and are not implementation portable.
Im not really sure what a benchmark between the two would really prove since they are not that really similar.
As others have already pointed out, this comparison is generally meaningless due to its apples/oranges nature, though if you really wanted to I suppose you could test std::async and std::future against some GCD based futures implementation you cobble together yourself and see which provides futures the quickest for a known set of computations. Might be vaguely interesting, but you'd have to be the one to do it since the experiment is likely too strange and esoteric to be of interest to anyone else. :-)

To STL or !STL, that is the question [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Unquestionably, I would choose to use the STL for most C++ programming projects. The question was presented to me recently however, "Are there any cases where you wouldn't use the STL?"...
The more I thought about it, the more I realized that perhaps there SHOULD be cases where I choose not to use the STL... For example, a really large, long term project whose codebase is expected to last years... Perhaps a custom container solution that precisely fits the projects needs is worth the initial overhead? What do you think, are there any cases where you would choose NOT to STL?
The main reasons not to use STL are that:
Your C++ implementation is old and has horrible template support.
You can't use dynamic memory allocation.
Both are very uncommon requirements in practice.
For a longterm project rolling your own containers that overlap in functionality with the STL is just going to increase maintenance and development costs.
Projects with strict memory requirements such as for embedded systems may not be suited for the STL, as it can be difficult to control and manage what's taken from and returned to the heap. As Evan mentioned, writing proper allocators can help with this, but if you're counting every byte used or concerned with memory fragmentation, it may be wiser to hand-roll a solution that's tailored for your specific problem, as the STL has been optimized for the most general usage.
You may also choose not to use STL for a particular case because more applicable containers exist that are not in the current standard, such as boost::array or boost::unordered_map.
There are just so many advantages to using the stl. For a long term project the benefits outweigh the costs.
New programmers being able to understand the containers from day one giving them more time to learn the other code in the project. (assuming they already know STL like any competent C++ programmer would)
Fixing bugs in containers sucks and wastes time that could be spent enhancing the business logic.
Most likely you're not going to write them as well as the STL is implemented anyways.
That being said, the STL containers don't deal with concurrency at all. So in an environment where you need concurrency I would use other containers like the Intel TBB concurrent containers. These are far more advanced using fine grained locking such that different threads can be modifying the container concurrently and you don't have to serialize access to the container.
Usually, I find that the best bet is to use the STL with custom allocators instead of replacing STL containers with hand rolled ones. The nice thing about the STL is you pay only for what you use.
I think it's a typical build vs buy scenario. However, I think that in this case I would almost always 'buy', and use STL - or a better solution (something from Boost perhaps), before rolling my own. You should be focusing most of your effort on what your application does, not the building blocks it uses.
I don't really think so. In making my own containers, I would even try to make those compatible with the STL because the power of the generic algorithms is too great to give up. The STL should at least be nominally used, even if all you do is write your own container and specialize every algorithm for it. That way, every sorting algorithm can be invoked sort(c.begin(), c.end()). If you specialize sort to have the same effect, even if it works differently.
Coding for Symbian.
STLPort does support Symbian 9, so the case against using STL is weaker than it used to be ("it's not available" is a pretty convincing case), but STL is still alien to all the Symbian libraries, so may be more trouble than just doing things the Symbian way.
Of course it might be argued on these grounds that coding for Symbian is not "a C++ programming project".
Most of the projects I have worked on had a codebase way older than any really usable version of STL - therefore we chose not to introduce it now.
Introduction:
STL is a great library, and useful in many cases, but it definitively don't solve all the situations. Answering STL or !STL is like answering "Does STL meet your need or does it not?"
Pros of STL
In most situations, STL has a container that fit for a given solution.
It is well documented
It is well known ( Programmers usually already know it, getting into a project is shorter)
It is tested and stable.
It is crossplatform
It is included with every compiler (does not add a 3rd library dependency)
STL is already implemented and ready
STL is shiny, ...
Contras of STL
It does not mater that you need a simple Graph, Red-Black Tree, or a very complex database of elements with an AI managing concurrent access through a quantum computer. The fact is, STL do not, and will never solve everything.
Following aspects are only a few examples, but they basically are consequence of this fact: STL is a real library with limits.
Exceptions: STL relay on exceptions, so if for any reason you cannot accept exceptions (e.g. safety critical), you cannot use STL. Right! exceptions may be disabled, but that does not solve the design of the STL relaying on them and will eventually carry a crash.
Need of specific (not yet included) data structure: graph, tree, etc.
Special constraints of complexity: You could discover that STL general purpose container is not the most optimal for your bottleneck code.
Concurrency considerations: Either you need concurrency and STL do not provide what you need (e.g. reader-writer lock cannot(easily) be used because of the bi-directional [] operator). Either you could design a container taking benefit of multi-threading for a much faster access/searching/inserting/whatever.
STL need to fit your needs, but the revers is also true: You need to fulfill the needs of STL. Don't try to use std::vector in a embedded micro-controller with 1K of unmanaged RAM.
Compatibility with other libraries: It may be that for historical reasons, the libraries you use do not accept STL (e.g. QtWidgets make intensive use of it own QList). Converting containers in both directions might be not the best solution.
Implementing your own container
After reading that, you could think: "Well, I am sure I may do something better for my specific case than STL does." WAIT!
Implementing your container correctly become very quickly a huge task: it is not only about implementing something working, you might have to:
Document it deeply, including limitations, algorithm complexity,etc.
Expect bugs, and solving them
Incoming additional needs: you know, this function missing, this conversion between types, etc.
After a while, you could want to refactor, and change all the dependencies (too late?)
....
Code used that deep in the code like a container is definitively something that take time to implement, and should be though carefully.
Using 3rd party library
Not STL does not necessarily mean custom. There are plenty of good libraries in the net, some even with permissive open-source license.
Adding or not an additional 3rd party library is another topic, but it worth to be considered.
One situation where this might occur is when you are already using an external library that already provides the abilities you need from the STL. For instance, my company develops an application in space-limited areas, and already uses Qt for the windowing toolkit. Since Qt provides STL-like container classes, we use those instead of adding the STL to our project.
I have found problems in using STL in multi-threaded code. Even if you do not share STL objects across threads, many implementations use non-thread safe constructs (like ++ for reference counting instead of an interlocked increment style, or having non-thread-safe allocators).
In each of these cases, I still opted to use STL and fix the problems (there are enough hooks to get what you want).
Even if you opt to make your own collections, it would be a good idea to follow STL style for iterators so that you can use algorithms and other STL functions that operate only on iterators.
The main issue I've seen is having to integrate with legacy code that relies on non-throwing operator new.
I started programming C back in about 1984 or so and have never used the STL. Over the years I have rolled my own function librarys and they have evolved and grown when the STL was not stable yet and or lacked cross platform support. My common library has grown to include code by others ( mostly things like libjpeg, libpng, ffmpeg, mysql ) and a few others and I would rather keep the amount of external code in it to a minimum. I'm sure now the STL is great but frankly I'm happy with the items in my toolbox and see no need at this point to load it up with more tools. But I certainly see the great leaps and bounds that new programmers can make by using the STL without having to code all that from scratch.
Standard C++ perversely allows implementations of some iterator operations to throw exceptions. That possibility can be problematic in some cases. You might therefore implement your own simple container that is guaranteed not to throw exceptions for critical operations.
Since almost everybody who answered before me seemed so keen on STL containers, I thought it would be useful to compile a list of good reasons not to use them, from actual problems I have encountered myself.
These can be reasonably grouped into three broad categories:
1) Poor efficiency
STL containers typically run slower AND use too much memory for the job. The reason for this can be partly blamed on too generic implementations of the underlying data structures and algorithms, with additional performance costs deriving from all the extra design constrains required by the tons of API requisites that are irrelevant to the task at hand.
Reckless memory use and poor performance go hand in hand, because memory is addressed on the cache by the CPU in lines of 64 bytes, and if you don't use locality of reference to your advantage, you waste cycles AND precious Kb of cache memory.
For instance, std::list requires 24 bytes per element rather than the optimal 4.
https://lemire.me/blog/2016/09/15/the-memory-usage-of-stl-containers-can-be-surprising/
This is because it is implemented by packing two 64-bit pointers, 1 int and 4 bytes of memory padding, rather than doing anything as basic as allocating small amounts of contiguous memory and separately tracking which elements are in use, or using the pointer xor technique to store both iteration directions in one pointer.
https://en.wikipedia.org/wiki/XOR_linked_list
Depending on your program needs, these inefficiencies can and do add up to large performance hits.
2) Limitations / creeping standards
Of course, sometimes the problem is that you need some perfectly common function or slightly different container class that is just not implemented in STL, such as decrease_min() in a priority queue.
A common practice is to then to wrap the container in a class and implement the missing functionality yourself with extra state external to the container and/or multiple calls to container methods, which may emulate the desired behavior, but with a performance much lower and O() complexity higher than a real implementation of the data structure, since there's no way of extending the inner workings of the container. Alternatively you end up mashing up two or more different containers together because you simultaneously need two or more things that are fundamentally incompatible in any one given STL container, such as a minmax heap, a trie (since you need to be able to use agnostic pointers), etc.
These solutions may be ugly and add on top of the other inefficiencies, and yet the way the language is evolving the tendency is to only add new STL methods to match C++'s feature creep and ignore any of the missing core functionality.
3) Concurrency/parallelism
STL containers are not thread-safe, much less concurrent. In the present age of 16-thread consumer CPUs, it's surprising the go-to default container implementation for a modern language still requires you to write mutexes around every memory access like it's 1996. This is, for any non-trivial parallel program, kind of a big deal, because having memory barriers forces threads to serialize their execution, and if these happen with the same frequency as an STL call, you can kiss your parallel performance goodbye.
In short, STL is good as long as you don't care about performance, memory usage, functionality or parallelism.
STL is of course still perfectly fine for the many times you are not bound by any of these concerns and other priorities like readability, portability, maintainability or coding speed take precedence.

C++ Parallelization Libraries: OpenMP vs. Thread Building Blocks [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
The community reviewed whether to reopen this question 2 months ago and left it closed:
Opinion-based Update the question so it can be answered with facts and citations by editing this post.
I'm going to retrofit my custom graphics engine so that it takes advantage of multicore CPUs. More exactly, I am looking for a library to parallelize loops.
It seems to me that both OpenMP and Intel's Thread Building Blocks are very well suited for the job. Also, both are supported by Visual Studio's C++ compiler and most other popular compilers. And both libraries seem quite straight-forward to use.
So, which one should I choose? Has anyone tried both libraries and can give me some cons and pros of using either library? Also, what did you choose to work with in the end?
Thanks,
Adrian
I haven't used TBB extensively, but my impression is that they complement each other more than competing. TBB provides threadsafe containers and some parallel algorithms, whereas OpenMP is more of a way to parallelise existing code.
Personally I've found OpenMP very easy to drop into existing code where you have a parallelisable loop or bunch of sections that can be run in parallel. However it doesn't help you particularly for a case where you need to modify some shared data - where TBB's concurrent containers might be exactly what you want.
If all you want is to parallelise loops where the iterations are independent (or can be fairly easily made so), I'd go for OpenMP. If you're going to need more interaction between the threads, I think TBB may offer a little more in that regard.
From Intel's software blog: Compare Windows* threads, OpenMP*, IntelĀ® Threading Building Blocks for parallel programming
It is also the matter of style - for me TBB is very C++ like, while I don't like OpenMP pragmas that much (reeks of C a bit, would use it if I had to write in C).
I would also consider the existing knowledge and experience of the team. Learning a new library (especially when it comes to threading/concurrency) does take some time. I think that for now, OpenMP is more widely known and deployed than TBB (but this is just mine opinion).
Yet another factor - but considering most common platforms, probably not an issue - portability. But the license might be an issue.
TBB incorporates some of nice research originating from academic research, for example recursive data parallel approach.
There is some work on cache-friendliness, for example.
Lecture of the Intel blog seems really interesting.
In general I have found that using TBB requires much more time consuming changes to the code base with a high payoff while OpenMP gives a quick but moderate payoff. If you are staring a new module from scratch and thinking long term go with TBB. If you want small but immediate gains go with OpenMP.
Also, TBB and OpenMP are not mutually exclusive.
I've actually used both, and my general impression is that if your algorithm is fairly easy to make parallel (e.g. loops of even size, not too much data interdependence) OpenMP is easier, and quite nice to work with. In fact, if you find you can use OpenMP, it's probably the better way to go, if you know your platform will support it. I haven't used OpenMP's new Task structures, which are much more general than the original loop and section options.
TBB gives you more data structures up front, but definitely requires more up front. As a plus, it might be better at making you aware of race condition bugs. What I mean by this is that it is fairly easy in OpenMP to enable race conditions by not making something shared (or whatever) that should be. You only see this when you get bad results. I think this is a bit less likely to occur with TBB.
Overall my personal preference was for OpenMP, especially given its increased expressiveness with tasks.
As far as i know, TBB (there is an OpenSource Version under GPLv2 avaiable) adresses more the C++ then C Area. These times it's hard to find C++ and general OOP parallelization specific Informations.The most adresses functional stuff like c (the same on CUDA or OpenCL). If you need C++ Support for parallelization go for TBB!
Yes, TBB is much more C++ friendly while OpenMP is more appropriate for FORTRAN-style C code given its design. The new task feature in OpenMP looks very interesting, while at the same time the Lambda and function object in C++0x may make TBB easier to use.
In Visual Studio 2008, you can add the following line to parallelize any "for" loop. It even works with multiple nested for loops. Here is an example:
#pragma omp parallel for private(i,j)
for (i=0; i<num_particles; i++)
{
p[i].fitness = fitnessFunction(p[i].present);
if (p[i].fitness > p[i].pbestFitness)
{
p[i].pbestFitness = p[i].fitness;
for (j=0; j<p[i].numVars; j++) p[i].pbest[j] = p[i].present[j];
}
}
gbest = pso_get_best(num_particles, p);
After we added the #pragma omp parallel, both cores on my Core 2 Duo were used to their maximum capacity, so total CPU usage went from 50% to 100%.