Intermediate Code as a result of OpenMP pragmas

Intermediate Code as a result of OpenMP pragmas - c++

Is there a way to get my hands on the intermediate source code produced by the OpenMP pragmas?
I would like to see how each kind of pragmas is translated.
Cheers.

OpenMp pragmas is part of a C / C++ compiler's implementation. Therefore before using it, you need to ensure that your compiler will support the pragmas ! If they are not supported, then they are ignored, so you may get no errors at compilation, but multi-thread wont work. In any case, as mentioned above, since they are part of the compiler's implementation, the best intermediate result that you can get is a lower level code. OpenMp is language extension + libraries, macros etc opposed to Pthreads that arms you purely with libraries !

Related

Should OpenACC pragmas or runtime routines be preferred?

OpenACC has some pragmas and runtime routines, which can be used to basically achieve the same thing.
For example, there is #pragma acc wait and acc_wait() or #pragma acc update [...] and acc_update_[...]().
I started to mostly use the runtime routines in my C++ code.
Is there a difference? Should I prefer one over the other or is it just a matter of style and personal preference?

In general, the pragma's are preferred since they will be ignored by other compilers and when compiling without OpenACC enabled. The runtime API calls would need to be guarded by a macro, like "#ifdef _OPENACC" to maintain portability.
Though, if you don't mind adding the macro guards or loosing portability, then it's mostly a matter of style. Functionally, they are equivalent.

Can I assume that the Fortran preprocessor will work on most systems?

I'm trying to emulate C asserts in Fortran in order to enforce pre- and post-conditions of all my procedures. This way, I get to provide the user with more detailed information about run-time errors than I could reasonably be expected to maintain otherwise.
In order to implement this, I used the preprocessor directives __FILE__ and __LINE__, and defining an assert macro which expands to a Fortran subroutine call. Rather than try to describe it here, I made a git repo with a little bit of example code in it. If you build it with
make test
./test
the function hangs, because you called a function that expects a positive argument with a negative one. However, if you build with
make test DEBUG=1
./test
the error is caught by an assertion.
This works with both gfortran and the Intel Fortran compiler. I don't have access to other Fortran compilers. Can I reasonably expect other compilers to do the necessary source pre-processing if the file extension is .F90? Or should I be relying on the -cpp flag? What's the most portable way to do this? Should I even do this at all?

It is reasonable to expect that a suitable "C-like" preprocessor is available, though undoubtedly there will be exceptions for some compilers or tools.
The definition of portability can depend on your perspective, but given that a large number of systems run with a case insensitive file system, it is not reasonable to solely rely on case alone to specify that the preprocessor needs to be run. Most Fortran related build systems will have some way of making that specification explicit.
Whether this is a good idea is a bit more subjective. Perhaps it is only nominal in terms of impact, but requiring the preprocessor still represents a reduction in portability and an increase in build complexity. Depending on compiler, use of the preprocessor may hinder use of things like standard conformance diagnostics.
Consequently, my preference for relatively simple use cases like this is to have the assertion coded as normal Fortran source - an if statement testing a named constant from a debug module or similar, invoking [ERROR] STOP (not exit) with a descriptive message if the assertion expression fails.
USE DebuggingFlags
IF (debug_flag) THEN
IF (x <= 0) ERROR STOP 'negative or zero x in sqrrt!'
END IF
This won't give you file and line information, but as long as you are somewhat selective with the STOP message the relevant source shouldn't be too hard to locate.
"Release" builds are made with the debugging flag constant defined as false (or your chosen equivalent) and with any sort of reasonable compiler optimisation active the object code associated with the assertion should be identified and eliminated as dead.
But there are pros and cons.

Fortran 95+ has conditional compilation (coco) defined in standard, but only few compilers support it. The de facto standard (since Fortran 90 I believe) is still cpp and fpp, which most compilers support. Both are highly compatible but not 100%. Using these facts, relying on cpp/fpp style preprocessor should be safe for most cases.

Adding to the answer of #LeleDumbo:
As far as I can tell, the Fortran 2008 Standard does not specify any form of preprocessing. This does also mean, that there is no standard way of specifying, how to invoke the preprocessor. However, it is common practice to use .F90 to specify the need for preprocessing.
Concerning COCO: The third part of the Fortran Standard ISO 1539-3, which should specify conditional compilation, was withdrawn.

Does building the compiler from source result in better optimization?

Consider this simple case scenario:
I download the pre-built binaries of a C++ compiler (say CLang or GCC or anything else) for my generic OS (that is not windows). I compile my code which consists of some computationally expensive mathematical calculation with optimization flag -O3 and I have an execution time of T1.
On a different attempt, this time instead of using pre-built binaries I download the source code and build the compiler by myself on my generic machine. I compile the same code with the same optimization flag, achieving execution time T2?
Will T2 < T1 or they will be more or less the same?
In other words, is the execution time independent from the way that compiler is built?

The compiler's optimization of your code is the result of the behavior of the compiler, not the performance of the compiler.
As long as the compiler has the same behavioral design, it will produce exactly the same output.

Generally the same compiler version should generate the same assembler code given the same C or C++ code input. However there are certain things that might further affect the code that is being execute when you run the compiler.
A distro might have backported (or even created own) patches from other versions.
Modern compilers often have library depenencies (e.g. cloog) that may have different behaviour in different versions, causing the compiler to make code generation decisions based on essentially other data
These libraries may (in some compiler versions) be optional at compile time (might need to give --enable switches to configure, or configure tries to autodetect them).
Compiler switches like -march=native will look on what hardware you compile and try to optimize accordingly.
a time limit in the compilers optimizer triggers, essentially making better optimizations on better machines; or the same for memory (I don't think thats to be found in modern compilers anymore though)
That said, even the same assembler might perform different on yours and their machine, e.g. because one is optimized for AMD, the other for intel.

In my opinion, and in theory, compilation speed can be faster, since you can say to "compiler which compile the compiler", "please target to my computer, and you can use my computer's processor's own machine code to optimize".
But I think compiler's optimization cannot be faster.. To make compiler's optimization faster, I think we need put something like new technology into compiler, not just re-compile.

That depends on how that compiler is implemented and on your platform, but the answer will be most likely "no".
If your platform provides specific functionality that can improve the performance of your program, the optimizer in your compiler might use that functionality to produce a faster program. The optimizer can do so only if the compiler writer was aware of the functionality and has implemented special treatment for your platform in the optimizer. If that is the case, the detection might be done dynamically in the optimizer, meaning any build of the optimizer can detect the platform and optimize your code. Only if the detection has to occur at compiletime of the optimizer for some reason, recompiling it on your platform could give that advantage. But if such a better build for your plaform exists, the compiler vendor most likely has provided binaries for it.
So, with all these ifs, it's unlikely that your program will be any faster when you recompile the compiler on your platform. There is a chance, however, that the compiler will be a bit faster if it is optimized to your platform rather than a generic binary, resulting on shorter compiletimes.

What is the best way to use openmp with multiple subroutines in Fortran

I have a program written in Fortran and I have more than 100 subroutines. However, I have around 30 subroutines where there are open-mp codes present. I was wondering what is the best procedure to compile these subroutines. When I used the all the files to compile at once then I found that open mp compiled code runs even slower than the one without open-mp. Should I compile the subroutines with open-mp tags separately ? What is the best practice under these conditions ?
Thank you so much.
Best Regards,
Jdbaba

The OpenMP-aware compilers look for the OpenMP pragma (the open signs after a comment symbol at the begin of the line). Therefore, sources without OpenMP code compiled with an OpenMP-aware compiler should result on the exact or very close object files (and executable).
Edit: One should note that as stated by Hristo Iliev below, enabling OpenMP could affect the serial code, for example by using OpenMP versions of libraries that may differ in algorithm (to be more effective in parallel) and optimizations.
Most likely, the problem here is more related to your code algorithms.
Or perhaps you did not compile with the same optimization flags when comparing OpenMP and non-OpenMP versions.

Get optimized source code from GCC

I have a task to create optimized C++ source code and give it to friend for compilation. It means, that I do not control the final compilation, I just write the source code of C++ program.
I know, that a can make optimization during compilation with -O1 (and -O2 and others) options of GCC. But how can I get this optimized source code instead of compiled program? I am not able to configure parameters of my friend's compiler, that is why I need to make a good source on my side.

The optimizations performed by GCC are low level, that means you won't get C++ code again but assembly code in best case. But you won't be able to convert it or something.
In sum: Optimize the source code on code level, not on object level.

You could ask GCC to dump its internal (Gimple, ...) representations, at various "stages". The middle-end of GCC is made of hundreds of passes, and you could ask GCC to dump them, with arguments like -fdump-tree-all or -fdump-gimple-all; beware that you can get hundreds of dump files for a single compilation!
However, GCC internal representations are quite low level, and you should not expect to understand them without reading a lot of material.
The dump options I am mentionning are mostly useful to those working inside GCC, or extending it thru plugins coded in C or extensions coded in MELT (a high-level domain specific language to extend GCC). I am not sure they will be very useful to your friend. However, they can be useful to make you understand that optimization passes do a lot of complex processing.
And don't forget that premature optimization is evil : you should first make your program run correctly, then benchmark and profile it, at last optimize the few parts worth of your efforts. You probably won't be able to write correct & efficient programs without testing and running them yourself, before giving them to your friend.

Easy - choose the best algorithm possible, let the rest be handled by the optimizer.
Optimizing the source code is different than optimizing the binary. You optimize the source code, the compiler will optimize the binary.
For anything more than algorithm choice, you'll need to do some profiling. Sure, there are practices that can speed up code speed, but some make the code less readable. Only optimize when you have to, and after you measure.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js