Can C++ attributes be used to replace OpenMP pragmas? - c++

C++ attributes provide a convenient and standardized way to markup code with extra information to give to the compiler and/or other tools.
Using OpenMP involves adding a lot of #pragma omp... lines into the source (such as to mark a loop for parallel processing). These #pragma lines seem to be excellent candidates for a facility such as generalized attributes.
For example, #pragma omp parallel for might become [[omp::parallel(for)]].
The often inaccurate cppreference.com uses such an attribute as an example here, which confirms it has at least been considered (by someone).
Is there a mapping of OpenMP pragmas to C++ attributes currently available and supported by any/all of the major compilers? If not, are there any plans underway to create one?

This is definitely a possibility and it's even something the OpenMP language committee is looking at. Take a look at OpenMP Technical Report 8 (https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf) page 36, where a syntax for using OpenMP via attributes is proposed. Inclusion in TR8 doesn't guarantee its inclusion in version 5.1, but it shows that it's being discussed. This syntax is largely based on the work done in the original proposal for C++ attributes.
If you have specific feedback on this, I'd encourage you to provide feedback on this via the OpenMP forum (http://forum.openmp.org/forum/viewforum.php?f=26).

Related

Is combining std::execution and OpenMP advisable?

I use OpenMP since some time now. Recently, on a new project, I choose to use c++17, for some features.
Because of that, I have been concerned by std::execution which allow to parallelize algorithms. That seems really powerful and elegant, But their are a lot of feature of OpenMP really useful that are not easy to use with algorithms (barrier, SIMD, critical, etc..).
So I think to mix the std::execution::par (or unseq_par) with OpenMP. Is it a good idea, or should i stay only with OpenMP?
Unfortunately this is not officially supported. It may or may not work, depending on the implementation, but it is not portable.
Only the most recent version, OpenMP 5.0, even defines the interaction with C++11. In general, using anything from C++11 and forward "may result in unspecified behavior". While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
Alignment support
Standard layout types
Allowing move constructs to throw
Defining move special member functions
Concurrency
Data-dependency ordering: atomics and memory model
Additions to the standard library
Thread-local storage
Dynamic initialization and destruction with concurrency
C++11 library
While C++17 and its specific high-level parallelism support is not mentioned, it is clear from this list, that it is unsupported.

How to implement user-defined reduction with OpenACC?

Is there a way to implement a user-defined reduction with OpenACC similar to declare reduction in OpenMP?
So that I could write something like
#pragma acc loop reduction(my_function:my_result)
Or what would be the appropriate way to implement efficient reduction without the predefined operators?
User defined reductions aren't yet part of the OpenACC standard. While I'm not part of the OpenACC technical committee, I believe they have received requests for this but not sure if it's something being considered for the 3.0 standard.
Since the OpenACC standard is largely user driven, I'd suggest you send a note to the OpenACC folks requesting this support. The more folks that request it, the more likely it is to be adopted in the standard.
Contact info for OpenACC can be found at the bottom of https://www.openacc.org/about

What exactly does C++17 change regarding parallelism? (And where is the authoritative doc?)

Is parallelism as such a part of the C++17 changes? When I Google "c++ parallelism" (without quotes), I come across a few different docs, and I can't piece together the timeline or the definitive changes.
There's at least one Technical Specification at open-std.org, like
N4578.
There's this doc at the ISO CPP website.
A fairly recent and deep dive into parallel computing in C++
doesn't mention C++17 at all.
Where's the single source of truth? Is parallelism part of C++17 or a separate TS?
Well, you could have a look at the in-depth C++17 feature list. There, you will see that the Parallelism TS is part of C++17.
The single source of truth is isocpp. If you look at the status page, you will see that Parallelism I is in a dark green color, meaning that it will be merged in the C++ Standard.
The change is that a lot (if not all) of <algorithm>s have another overload which takes an ExecutionPolicy, where you can specify that the algorithm should run in parallel. Here is a complete list.

C++1z Coroutines a language feature?

Why will coroutines (as of now in the newest drafts for C++1z) be implemented as a core language feature (fancy keywords and all) as opposed to a library extension?
There already exist a couple of implementations for them (Boost.Coroutine, etc), some of which can be made platform independent, from what i have read. Why has the committee decided to bake it into the core language itself?
I'm not saying they shouldn't but Bjarne Stroustrup himself mentioned in some talk (don't know which one any more) that new features should be implemented in libraries as far as possible instead of touching the core language.
So is there a good reason to do so? What are the benefits?
While there are library implementation of coroutines, these tend to have specific restrictions. For example, a library implementation cannot detect what variables need to be maintained when a coroutine is suspended. It is possible to work around this need, e.g., by making the used variables explicit in some form. However, when coroutines should behave like normal functions as much as possible, it should be possible to define local variables.
I don't think any of the implementers of Boost coroutines thinks that their respective library interface is ideal. While it is the best which can be achieved in the currently language, the overall use can be improved.
At CppCon 2015, Gor Nishanov from Microsoft made the argument that C++ Coroutines can be a negative overhead abstraction. The paper from his talk is here.
If you take a look at his example, the ability to use a coroutine simplified the control flow of the network code, and when implemented at the compiler level gives you smaller code that has twice the throughput of the original. He makes the argument that really the ability to yield should be a feature of a C++ function.
They have an initial implementation in Visual Studio 2015, so you can give it a try it out for your use-case and see how it compares to the boost implementation. It looks like they are still trying to hash out if they will use the Async/Yield keywords though, so keep an eye on where the standard goes.
The resumable functions proposal for C++ can be found here and the update here. Unfortunately, it didn't make it into c++17, but is now a technical specification p0057r2. On the upside, it looks like their is support in clang with the -fcoroutines_ts flag and in Visual Studio 2015 Update 2. The keywords also have a co_ prepended to them. So co_await, co_yield etc.
Coroutines are a built in feature in golang, D, python, C#, and will be in the new Javascript standard(ECMA6). If C++ comes up with a more efficient implementation, I wonder if it would displace golang adoption.
resumable functions from C++1z support stackless context switching, while boost.coroutine(2) provide stackfull context switching.
The difference is that with stackful context switching the stack frames of function called within the coroutine remain intact on suspending the context, while the stack frames of sub-routiens are removed on suspending a resumable fucntion (C++1z).

Best practice for using the 'volatile' keyword in VS2012

Since upgrading our development and build environments from VS2008 to VS2012, I am confused about the implications of using the volatile keyword in our legacy codebase (it's quite extensive as there is a much copied pattern for managing threads from the "old" days).
Microsoft has the following remarks in the VS2012 documentation:
If you are familiar with the C# volatile keyword, or familiar with the behavior of volatile in earlier versions of Visual C++, be aware that the C++11 ISO Standard volatile keyword is different and is supported in Visual Studio when the /volatile:iso compiler option is specified. (For ARM, it's specified by default). The volatile keyword in C++11 ISO Standard code is to be used only for hardware access; do not use it for inter-thread communication. For inter-thread communication, use mechanisms such as std::atomic<T> from the C++ Standard Template Library.
It goes on to say:
When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects.
I take this to mean, that our existing code won't break but won't necessarily be portable (not a problem for us).
However, it does raise these questions, on which I would like some advice, if possible:
Should we remove uses of volatile qualifiers in our our code and replace with C++11 ISO Standard compliant equivalents, even though we would not port the code away from MS?
If we don't do the above, is there any downside?
I appreciate that this is not really a specific programming problem but we're embarking on some quite major refactoring and I would like to be able to offer some sensible guidelines for this work.
If you have the time for it. The benefits are not that great - C++11 atomics may allow more precise control over precisely what kind of synchronization you need, and have more clearly defined semantics, which may allow the compiler to optimize the code better.
In theory, but very very unlikely, a future version of the compiler might drop support for the MS-style volatile completely. Or one day you actually do want to port away from the MS compiler, even if you stay on Windows. If you're now doing refactoring, that might be a good time to do the work of replacing the volatiles with atomics, saving you from doing the work in the future.