omp_outlined in omp program compiled with LLVM

omp_outlined in omp program compiled with LLVM - llvm

When compiling a cpp program with OMP directives using LLVM clang++ 4.9.2 I see hidden function #.omp_outlined. was generated in the bitcode. I wonder what is this hidden function and if it is possible to avoid generating it.

No, you cannot prevent the generation of outlined code in Clang. Outlining the parallel region code into its own function is how OpenMP is implemented by Clang and many other compilers, including GCC, Intel, and MSVC. More details on how GCC implements it can be found here and here. Clang follows more or less the same method.
To my knowledge, the only OpenMP compiler that does not explicitly outline OpenMP parallel regions is the one from PGI. It does some stack frame magic to enable multiple threads to execute portions of the function code.

Related

How is clang able to steer C/C++ code optimization?

I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.

(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.

It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.

Difference between linking OpenMP with -fopenmp and -lgomp

I've been struggling a weird problem the last few days. We create some libraries using GCC 4.8 which link some of their dependencies statically - eg. log4cplus or boost. For these libraries we have created Python bindings using boost-python.
Every time such a library used TLS (like log4cplus does in it's static initialization or stdlibc++ does when throwing an exception - not only during initialization phase) the whole thing crashed in a segfault - and every time the address of the thread local variable has been 0.
I tried everything like recompiling, ensuring -fPIC is used, ensuring -tls-model=global-dynamic is used, etc. No success. Then today I found out that the reason for these crashes has been our way of linking OpenMP in. We did this using "-lgomp" instead of just using "-fopenmp". Since I changed this everything works fine - no crashes, no nothing. Fine!
But I'd really like to know what the cause of the problem was. So what's the difference between these two possibilities to link in OpenMP?
We have a CentOS 5 machine here where we have installed a GCC-4.8 in /opt/local/gcc48 and we are also sure that the libgomp coming from /opt/local/gcc48 had been used as well as the libstdc++ from there (DL_DEBUG used).
Any ideas? Haven't found anything on Google - or I used the wrong keywords :)

OpenMP is an intermediary between your code and its execution. Each #pragma omp statement are converted to calls to their according OpenMP library function, and it's all there is to it. The multithreaded execution (launching threads, joining and synchronizing them, etc.) is always handled by the Operating System (OS). All OpenMP does is handling these low-level OS-dependent threading calls for us portably in a short and sweet interface.
The -fopenmp flag is a high-level one that does more than include GCC's OpenMP implementation (gomp). This gomp library will require more libraries to access the threading functionality of the OS. On POSIX-compliant OSes, OpenMP is usually based on pthread, which needs to be linked. It may also need the realtime extension library (librt) to work on some OSes, while not on some other. When using dynamic linking, everything should be discovered automatically, but when you specified -static, I think you've fallen in the situation described by Jakub Jelinek here. But nowadays, pthread (and rt if needed) should be automatically linked when -static is used.
Aside from linking dependencies, the -fopenmp flag also activates some pragma statement processing. You can see throughout the GCC code (as here and here) that without the -fopenmp flag (which isn't trigged by only linking the gomp library), multiple pragmas won't be converted to the appropriate OpenMP function call. I just tried with some example code, and both -lgomp and -fopenmp produce a working executable that links against the same libraries. The only difference in my simple example that the -fopenmp has a symbol that the -lgomp doesn't have: GOMP_parallel##GOMP_4.0+ (code here) which is the function that initializes the parallel section performing the forks requested by the #pragma omp parallel in my example code. Thus, the -lgomp version did not translate the pragma to a call to GCC's OpenMP implementation. Both produced a working executable, but only the -fopenmp flag produced a parallel executable in this case.
To wrap up, -fopenmp is needed for GCC to process all the OpenMP pragmas. Without it, your parallel sections won't fork any thread, which could wreak havoc depending on the assumptions on which your inner code was done.

SSE4.1 automatically put in string comparison on newer gcc

I searched the gcc 4.8.1 documents but couldn't find an answer to this:
I have some SSE4.1 code and fallback code, at runtime I detect whether the system supports SSE4.1 and in case it doesn't, I use the fallback code.
So far so good, but with latest gcc versions this is what happens:
- my application crashes because SSE4.1 instructions are being spread throughout the code every time a string comparison is performed
Since I'm compiling all my files with -msse41 this sounds reasonable but crashes my code. My question is this: is there any way to restrict SSE41 usage to just that code which makes use of SSE4.1? Unfortunately these are header files used everywhere so it would be rather difficult to just compile those translation units with msse41

As of GCC 4.8, you can use multi-versioned functions, see http://gcc.gnu.org/gcc-4.8/changes.html, look for "Function Multiversioning Support with G++". Disclaimer: I did not use this (as of yet).

link gcc object and pgfortran object together( both have omp directives inside)

Now I'm working with a scientific project in which we use fortran-cuda from PGI. We have found that PGI's omp support is so poor that we can not have paralleled omp task by pgfortran. So we decided to pull out something out and make it C++ and compile it with g++.
The other reason why we want to mix PGI's pgfortran compiler with GNU's g++ is that we need to use the pgfortran to compile the cuda-fortran part of our code and we also need to use NVCC from Nvidia( which use g++ to compile the host code) to compile that OMP segments which have CUDA in them and which can not be compiled correctly by pgfortran.
At last when I link the C++ object files compiled by g++ and the Fortran object files compiled by pgfortran together, if I don't put a -lgomp option there, there will be link errors. But if I put it there, the behavior of OMP threads is weird.
I think the problem is that the two different compiler use different OMP library and gomp is GNU's library. So anyone knows how to link them correctly? Or can someone tell me how to link the PGI's OMP library?

LLVM OpenMP Support

I want to know whether there is any function/method in LLVM to add Open-MP constructs in LLVM IR. Does llvm-3.0 still support OpenMP directives?

OpenMP is a high-level language extension. So, it's C/C++/FORTRAN front-end which should lower the pragma's into necessary runtime calls and code alterations.
I don't see how OpenMP can be "added" to LLVM IR. If you need C/C++/FORTRAN compiler which supports OpenMP pragmas and emit LLVM IR - try dragonegg.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

omp_outlined in omp program compiled with LLVM - llvm

When compiling a cpp program with OMP directives using LLVM clang++ 4.9.2 I see hidden function #.omp_outlined. was generated in the bitcode. I wonder what is this hidden function and if it is possible to avoid generating it.

Related

How is clang able to steer C/C++ code optimization?

Difference between linking OpenMP with -fopenmp and -lgomp

SSE4.1 automatically put in string comparison on newer gcc

link gcc object and pgfortran object together( both have omp directives inside)

LLVM OpenMP Support

Categories

Resources