Is combining std::execution and OpenMP advisable?

Is combining std::execution and OpenMP advisable? - c++

I use OpenMP since some time now. Recently, on a new project, I choose to use c++17, for some features.
Because of that, I have been concerned by std::execution which allow to parallelize algorithms. That seems really powerful and elegant, But their are a lot of feature of OpenMP really useful that are not easy to use with algorithms (barrier, SIMD, critical, etc..).
So I think to mix the std::execution::par (or unseq_par) with OpenMP. Is it a good idea, or should i stay only with OpenMP?

Unfortunately this is not officially supported. It may or may not work, depending on the implementation, but it is not portable.
Only the most recent version, OpenMP 5.0, even defines the interaction with C++11. In general, using anything from C++11 and forward "may result in unspecified behavior". While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
Alignment support
Standard layout types
Allowing move constructs to throw
Defining move special member functions
Concurrency
Data-dependency ordering: atomics and memory model
Additions to the standard library
Thread-local storage
Dynamic initialization and destruction with concurrency
C++11 library
While C++17 and its specific high-level parallelism support is not mentioned, it is clear from this list, that it is unsupported.

Related

Std mutex or boost mutex? Which is preferable?

What is the real difference between std::mutex and boost::mutex? Which one is faster in terms of implementation and compilation? Are both of them portable?I read my questions related to it but there is no clear mention of difference . std mutex is supported only since c++11 so the older compilers dont support it . Are boost mutex supported by older compilers or not? If the foremost condition requires the code to be portable , then what should be prefered?

As a default choice you should prefer std::anything to boost::samething because it's a part of standard library and hence is more portable since it doesn't introduce external dependency.
You can't really compare std::mutex and boost::mutex in general because there is no one and only std::mutex, it's implementation depends on the standard library implementation that you are using which usually is a part of your toolchain.
There might be a case when you discover using empirical evidence that std::mutex you are using is in some regard "worse" than boots::mutex. In this case it might be reasonable to switch to it, but only if it's really justified and you have an actual evidence (e.g. performance measurement) of that. Even then it seems like a last resort. It might be better to switch to a different standard library implementation or to a different toolchain or upgrade your current one if possible.

Consider boost as a laboratory for prototyping std features. Many std facilities were originally implemented on boost. The difference is that std takes care of consistency and forward compatiblity, while boost targets new horizons. Nothing prevents boost from applying breaking changes in forth coming versions, but it also provides answers to more questions than std. My personal preference is std first - when possible - and boost next - when needed. I generally avoid pre c++11 platforms, unless I am forced to face.

# std::mutex for me every time, for the reason #Henri states it is (obviously) part of the C++ standard so you can rely on it being available everywhere.
Using boost, on the other hand, means that you have to link against the boost library. While this is widely available and offers a number of handy extra features it's quite heavyweight and you wouldn't want to pull it in just for this.
Also, std::mutex may be faster. The cross-platform nature of boost means that things that rely on OS support (which would include mutexes) can sometimes be less efficient. But this would not be a major factor in my thinking.
But if measuring performance is important to you, you should run your own benchmark. You can do this (roughly) over at (say) Wandbox - they support the boost library there.

The focus of Boost is trying new techniques and introducing new capabilities. The focus of the C++ standard is specifying requirements in a way that (in most cases) can be implemented portably. A number of features from boost have found their way into the C++ standard, but were often changed quite a bit in that transition - to improve portability, sometimes improve reliability, etc.
If your implementation (compiler and library) is C++11 or later, and you intend to not to port to older implementations, then definitely use std::mutex. It is part of the standard, from 2011, so preferable. No need to rely on third-party libraries. You will only need boost if you need bleeding edge features of boost that the C++ standard does not support (which means things other than mutex).
Some exceptions to the above: there are some features of boost (including related to threading and mutexes) that haven't made their way into a C++ standard, and some features in the C++ standard that are not in boost.
If you need to use (or support or port to) an older implementation, then consider using boost::mutex. You will, in most cases, need to install a version of boost separately, with your chosen implementation (some compiler versions have shipped with a version of boost, but don't rely on it). If there isn't a version of boost that works with your compiler/library, then (to state the obvious) you will not be able to use boost::mutex.
Boost has had the thread library (which includes mutex) since about version 1.25.0, which dates from late 2001. Which suggests boost is an option if your compiler is no older than (rough guess) early 2000s.
If you need to support an implementation that is significantly older than the early 2000s, then you may be out of luck using boost::mutex, and will need to resort to other libraries/frameworks or get your hands dirty writing OS-specific code.

Mixing C++11 atomics and OpenMP

OpenMP has its own support for atomic access, however, there are at least two reasons for preferring C++11 atomics: they are significantly more flexible and they are part of the standard. On the other hand, OpenMP is more powerful than the C++11 thread library.
The standard specifies the atomic operations library and the thread support library in two distinct chapters. This makes me to believe that the components for atomic access are kind of orthogonal to the thread library used. Can I indeed combine C++11 atomics and OpenMP?
There is a very similar question on Stack Overflow; however, it has been basically unanswered for three years, since its answer does not answer the actual question.

Update:
OpenMP 5.0 defines the interactions to C++11 and further. Among others, it says that using the following features may result in unspecified behavior:
Data-dependency ordering: atomics and memory model
Additions to the standard library
C++11 library
So clearly, mixing C++11 atomics and OpenMP 5.0 will result in unspecified behavior. At least the standard itself promises that "future versions of the OpenMP specification are expected to address [these] features".
Old discussion:
Interestingly, the OpenMP 4.5 standard (2.13.6) has a rather vague reference to C++11 atomics, or more specific std::memory_order:
The intent is that, when the analogous operation exists in C++11 or
C11, a sequentially consistent atomic construct has the same semantics
as a memory_order_seq_cst atomic operation in C++11/C11. Similarly, a
non-sequentially consistent atomic construct has the same semantics as
a memory_order_relaxed atomic operation in C++11/C11.
Unfortunately this is only a note, there is nothing that defines that they are playing nicely together. In particular, even the latest OpenMP 5.0 preview still refers to C++98 as the only normative reference for C++. So technically, OpenMP doesn't even support C++11 itself.
That aside, it will probably work most of the time in practice. I would agree that using std::atomic has less potential for trouble if used together with OpenMP than C++11 threading. But if there is any trouble, it may not be as obvious. Worst case would be a atomic that doesn't operate atomically, even though I have serious trouble imagining a realistic scenario where this may happen. At the end of the day, it may not be worth it and the safest thing is to stick with pure OpenMP or pure C++11 thread/atomics.
Maybe Hristo has something to say about this, in the mean time check out this answer for a more general discussion. While a bit dated, I'm afraid it still holds.

This is currently unspecified by OpenMP 4.5. In practice, you can use C++11 atomic operations with OpenMP threads in most compilers, but there is no formal guarentee that it will work.
Because of the unspecified behavior, GCC did not support C11 atomics (which are nearly identical in semantics to C++11 atomics) and OpenMP threads until recently. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65467 for details.
OpenMP 5.0 made an attempt to address this. The normative language references were updated to C11 and C++11. However, the atomics and memory model from these is "not supported", which means implementation-defined. I wish OpenMP 5.0 said more but it is extremely difficult to define the interaction of OpenMP and ISO language atomics.

What are the Implications of using _GLIBCXX_CXX11_ABI to use pre-5.1 C++ ABI with C++ 11/14 features?

From the manual:
In the GCC 5.1 release libstdc++ introduced a new library ABI that includes new implementations of std::string and std::list. These changes were necessary to conform to the 2011 C++ standard which forbids Copy-On-Write strings and requires lists to keep track of their size.
It is possible to use the _GLIBCXX_USE_CXX11_ABI macro to control whter the library headers use the old or the new ABI, independently of which "-std" is being used.
I'd like to know what the implications of using this "compatibility ABI" would be? I guess that the run-time performance of small-string operations will be impacted (negatively I assume), and that list-size access goes from O(1) (C11 ABI) to O(N) (compatibility ABI).
Are my guesses correct and can anyone elaborate?
Are there other implications which I have missed? What about atomics and concurrency features? Any impact?

Your first question is actually answered by the manual itself:
... the choice of ABI to use is independent of the -std option used to
compile your code... This ensures that the -std does not change the ABI, so that it is
straightforward to link C++03 and C++11 code together.
Regarding the second question, I'm afraid it's hard to generalize the impact because it depends on how your code is using the standard library. Does it copy strings a lot? How often a list size is queried? Is the code multi-threaded?
Although atomics and concurrency were introduced in C++11's standard, I'd guess that stdlib++ copy-on-write mechanism already used a variation of it anyhow. Those implementations are typically thread-safe.
Perhaps one thing you didn't directly mention is the impact on other std components that depend on those behaviors, such as list::splice

Best practice for using the 'volatile' keyword in VS2012

Since upgrading our development and build environments from VS2008 to VS2012, I am confused about the implications of using the volatile keyword in our legacy codebase (it's quite extensive as there is a much copied pattern for managing threads from the "old" days).
Microsoft has the following remarks in the VS2012 documentation:
If you are familiar with the C# volatile keyword, or familiar with the behavior of volatile in earlier versions of Visual C++, be aware that the C++11 ISO Standard volatile keyword is different and is supported in Visual Studio when the /volatile:iso compiler option is specified. (For ARM, it's specified by default). The volatile keyword in C++11 ISO Standard code is to be used only for hardware access; do not use it for inter-thread communication. For inter-thread communication, use mechanisms such as std::atomic<T> from the C++ Standard Template Library.
It goes on to say:
When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects.
I take this to mean, that our existing code won't break but won't necessarily be portable (not a problem for us).
However, it does raise these questions, on which I would like some advice, if possible:
Should we remove uses of volatile qualifiers in our our code and replace with C++11 ISO Standard compliant equivalents, even though we would not port the code away from MS?
If we don't do the above, is there any downside?
I appreciate that this is not really a specific programming problem but we're embarking on some quite major refactoring and I would like to be able to offer some sensible guidelines for this work.

If you have the time for it. The benefits are not that great - C++11 atomics may allow more precise control over precisely what kind of synchronization you need, and have more clearly defined semantics, which may allow the compiler to optimize the code better.
In theory, but very very unlikely, a future version of the compiler might drop support for the MS-style volatile completely. Or one day you actually do want to port away from the MS compiler, even if you stay on Windows. If you're now doing refactoring, that might be a good time to do the work of replacing the volatiles with atomics, saving you from doing the work in the future.

Is there a non-atomic equivalent of std::shared_ptr? And why isn't there one in <memory>?

This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.
2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).
With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.

1. I'm wondering if there is a non-atomic version of std::shared_ptr available
Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).
2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?
This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.
Arguments against included:
Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.
Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:
Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.
The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.
Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.
The final word from the LWG in Rapperswil that day was:
Reject CH 20. No consensus to make a change at this time.

Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.
While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.
If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.
With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.
I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.
This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.
[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.
[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.

My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why).
One could just as easily ask why there isn't an intrusive pointer, or any number of other possible variations of shared pointers one could have.
The design of shared_ptr, handed down from Boost, has been to create a minimum standard lingua-franca of smart pointers. That, generally speaking, you can just pull this down off the wall and use it. It's something that would be used generally, across a wide variety of applications. You can put it in an interface, and odds are good people will be willing to use it.
Threading is only going to get more prevalent in the future. Indeed, as time passes, threading will generally be one of the primary means to achieve performance. Requiring the basic smart pointer to do the bare minimum needed to support threading facilitates this reality.
Dumping a half-dozen smart pointers with minor variations between them into the standard, or even worse a policy-based smart pointer, would have been terrible. Everyone would pick the pointer they like best and forswear all others. Nobody would be able to communicate with anyone else. It'd be like the current situations with C++ strings, where everyone has their own type. Only far worse, because interoperation with strings is a lot easier than interoperation between smart pointer classes.
Boost, and by extension the committee, picked a specific smart pointer to use. It provided a good balance of features and was widely and commonly used in practice.
std::vector has some inefficiencies compared to naked arrays in some corner cases too. It has some limitations; some uses really want to have a hard limit on the size of a vector, without using a throwing allocator. However, the committee didn't design vector to be everything for everyone. It was designed to be a good default for most applications. Those for whom it can't work can just write an alternative that suites their needs.
Just as you can for a smart pointer if shared_ptr's atomicity is a burden. Then again, one might also consider not copying them around so much.

Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.

I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost
mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4
atomic(11) 13.6 12.4 13.0
mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5
atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js