Difference between linking OpenMP with -fopenmp and -lgomp - c++

I've been struggling a weird problem the last few days. We create some libraries using GCC 4.8 which link some of their dependencies statically - eg. log4cplus or boost. For these libraries we have created Python bindings using boost-python.
Every time such a library used TLS (like log4cplus does in it's static initialization or stdlibc++ does when throwing an exception - not only during initialization phase) the whole thing crashed in a segfault - and every time the address of the thread local variable has been 0.
I tried everything like recompiling, ensuring -fPIC is used, ensuring -tls-model=global-dynamic is used, etc. No success. Then today I found out that the reason for these crashes has been our way of linking OpenMP in. We did this using "-lgomp" instead of just using "-fopenmp". Since I changed this everything works fine - no crashes, no nothing. Fine!
But I'd really like to know what the cause of the problem was. So what's the difference between these two possibilities to link in OpenMP?
We have a CentOS 5 machine here where we have installed a GCC-4.8 in /opt/local/gcc48 and we are also sure that the libgomp coming from /opt/local/gcc48 had been used as well as the libstdc++ from there (DL_DEBUG used).
Any ideas? Haven't found anything on Google - or I used the wrong keywords :)

OpenMP is an intermediary between your code and its execution. Each #pragma omp statement are converted to calls to their according OpenMP library function, and it's all there is to it. The multithreaded execution (launching threads, joining and synchronizing them, etc.) is always handled by the Operating System (OS). All OpenMP does is handling these low-level OS-dependent threading calls for us portably in a short and sweet interface.
The -fopenmp flag is a high-level one that does more than include GCC's OpenMP implementation (gomp). This gomp library will require more libraries to access the threading functionality of the OS. On POSIX-compliant OSes, OpenMP is usually based on pthread, which needs to be linked. It may also need the realtime extension library (librt) to work on some OSes, while not on some other. When using dynamic linking, everything should be discovered automatically, but when you specified -static, I think you've fallen in the situation described by Jakub Jelinek here. But nowadays, pthread (and rt if needed) should be automatically linked when -static is used.
Aside from linking dependencies, the -fopenmp flag also activates some pragma statement processing. You can see throughout the GCC code (as here and here) that without the -fopenmp flag (which isn't trigged by only linking the gomp library), multiple pragmas won't be converted to the appropriate OpenMP function call. I just tried with some example code, and both -lgomp and -fopenmp produce a working executable that links against the same libraries. The only difference in my simple example that the -fopenmp has a symbol that the -lgomp doesn't have: GOMP_parallel##GOMP_4.0+ (code here) which is the function that initializes the parallel section performing the forks requested by the #pragma omp parallel in my example code. Thus, the -lgomp version did not translate the pragma to a call to GCC's OpenMP implementation. Both produced a working executable, but only the -fopenmp flag produced a parallel executable in this case.
To wrap up, -fopenmp is needed for GCC to process all the OpenMP pragmas. Without it, your parallel sections won't fork any thread, which could wreak havoc depending on the assumptions on which your inner code was done.

Related

How can I statically link standard library to my C++ program?

I'm using Code::Blocks IDE(v13.12) with GNU GCC Compiler.
I want to the linker to link static versions of required runtime libraries for my programs, how may I do this?
I already know that my executable size will increase. Would you please tell me other the downsides?
What about doing this in Visual C++ Express?
Since nobody else has come up with an answer yet, I will give it a try. Unfortunately, I don't know that Code::Blocks IDE so my answer will only be partial.
1 How to Create a Statically Linked Executable with GCC
This is not IDE specific but holds for GCC (and many other compilers) in general. Assume you have a simplistic “hello, world” program in main.cpp (no external dependencies except for the standard library and runtime library). You'd compile and statically link it via:
Compile main.cpp to main.o (the output file name is implicit):
$ g++ -c -Wall main.cpp
The -c tells GCC to stop after the compilation step (not run the linker). The -Wall turns on most diagnostic messages. If novice programmers would use it more often and pay more attention to it, many questions on this site would not have been asked. ;-)
Link main.o (could list more than one object file) statically pulling in the standard and runtime library and put the executable in the file main:
$ g++ -o main main.o -static
Without using the -o main switch, GCC would have put the final executable in the not so well-named file a.out (which once eventually stood for “assembly output”).
Especially at the beginning, I strongly recommend doing such things “by hand” as it will help get a better understanding of the build tool-chain.
As a matter of fact, the above two commands could have been combined into just one:
$ g++ -Wall -o main main.cpp -static
Any reasonable IDE should have options for specifying such compiler / linker flags.
2 Pros and Cons of Static Linking
Reasons for static linking:
You have a single file that can be copied to any machine with a compatible architecture and operating system and it will just work, no matter what version of what library is installed.
You can execute the program in an environment where the shared libraries are not available. For example, putting a statically linked CGI executable into a chroot() jail might help reduce the attack surface on a web server.
Since no dynamic linking is needed, program startup might be faster. (I'm sure there are situations where the opposite is true, especially if the shared library was already loaded for another process.)
Since the linker can hard-code function addresses, function calls might be faster.
On systems that have more than one version of a common library (LAPACK, for example) installed, static linking can help make sure that a specific version is always used without worrying about setting the LD_LIBRARY_PATH correctly. Obviously, this is also a disadvantage since now you cannot select the library any more without recompiling. If you always wanted the same version, why would you have installed more than one in the first place?
Reasons against static linking:
As you have already mentioned, the size of the executable might grow dramatically. This depends of course heavily on what libraries you link in.
The operating system might be smart enough to load the text section of a shared library into the RAM only once if several processes need the library at the same time. By linking statically, you void this advantage and the system might run short of memory more quickly.
Your program no longer profits from library upgrades. Instead of simply replacing one shared library with a (hopefully ABI compatible) newer release, a system administrator will have to recompile and reinstall every program that uses it. This is the most severe drawback in my opinion.
Consider for example the OpenSSL library. When the Heartbleed bug was discovered and fixed earlier this year, system administrators could install a patched version of OpenSSL and restart all services to fix the vulnerability within a day as soon as the patch was out. That is, if their services were linking dynamically against OpenSSL. For those that have been linked statically, it would have taken weeks until the last one was fixed and I'm pretty sure that there is still proprietary “all in one” software out in the wild that did not see a fix up to the present day.
Your users cannot replace a shared library on the fly. For example, the torsocks script (and associated library) allows users to replace (via setting LD_PRELOAD appropriately) the networking system library by one that routes their traffic through the Tor network. And this even works for programs whose developers never even thought of that possibility. (Whether this is secure and a good idea is subject of an unrelated debate.) An other common use-case is debugging or “hardening” applications by replacing malloc and the like with specialized versions.
In my opinion, the disadvantages of static linking outweigh the advantages in all but very special cases. As a rule of thumb: link dynamically if you can and statically if you have to.
A Addendum
As Alf has pointed out (see comments), there is a special GCC option to selectively link in the C++ standard library statically but not link the whole program statically. From the GCC manual:
-static-libstdc++
When the g++ program is used to link a C++ program, it normally automatically links against libstdc++. If libstdc++ is available as a shared library, and the -static option is not used, then this links against the shared version of libstdc++. That is normally fine. However, it is sometimes useful to freeze the version of libstdc++ used by the program without going all the way to a fully static link. The -static-libstdc++ option directs the g++ driver to link libstdc++ statically, without necessarily linking other libraries statically.
In Visual C++, the /MT option does a static link and the /MD option does a dynamic link. (see http://msdn.microsoft.com/en-us/library/2kzt1wy3.aspx)
I'd recommend using /MD and redistributing the C++ runtime, which is freely available from Microsoft. Once the C++ runtime is installed, than any program requiring the run time will continue to work. You would need to pass the proper option to tell the compiler which runtime to use. There is a good explanation here, Should I compile with /MD or /MT?
On Linux, I'd recommend redistributing libstdc++ instead of a static link. If their system libstdc++ works, I'd let the user just use that. System libraries, such as libpthread and libgcc should just use the system default. This requires compiling the program on a system with symbols compatible with all linux versions you are distributing for.
On Mac OS X, just redistribute the app with dynamic linking to libstdc++. Anyone using the same OS version should be able to use your program.

position independent executable (-pie) for arm(cortex-m3)

I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.
The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.
There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly

Static linking of Glibc

How can i compile my app linking statically glibc library, but only the code needed for my app? (Not all lib)
Now my compile command:
g++ -o newserver test.cpp ... -lboost_system -lboost_thread -std=c++0x
Thanks!
That's what -static does (as described in another answer): unneeded modules won't get linked into your program. But your expectations on the amount of stuff which is needed (in a sense that we can't convince linker to the contrary) may be too optimistic.
If you trying to do it for portability (running an executable on other machines with older glibc or something like that), there is one easy test question to see if you're going to get what you want:
Did you think of the problem with libnss, and are you sure it is not going to bite you?
If your answer is yes, maybe it makes sense to go on. If the answer is no, or the question seems too obscure and there is no answer, just quit your expirements with statically linked glibc: it has more chance to hurt than help.
Add -static to the compile line. It will only add what your application needs [and of course, any functions the functions you application calls, and any functions those functions call, including a bunch of startup code and some other bits and pieces], so it will be around 800K (for a simple "hello world" program) on an x86 machine. Other architectures vary. Since boost probably also calls the standard library at least a little bit, it's likely that you will have more than 800K added to your appliciation. But it only applies functions used by any of the code in the final binary, not the entire library [about 2MB as a shared library].
If you ONLY want link glibc, you will need to modify the linking line to your compile to:
-Wl,-Bstatic -libc -Wl,-Bdynamic. This will prevent any other library from being linked statically [you sometimes need to have more than one of these statements, as sometimes something pulled in by another library requires "more" from glibc to be pulled in - don't worry, it won't bring in anything more than the linker thinks is necessary].

In g++ is C++ 11 thread model using pthreads in the background?

I am just trying my hands on g++ 4.6 and C++11 features.
Every time I compile a simple threading code using -std=c++0x flag, either it crashes with segmentation fault or it just throws some weird exception.
I read some questions related to C++11 threads and I realized that, I also need to use -pthread flag to compile the code properly. Using -pthread worked fine and I was able to run the threaded code.
My question is, whether the C++11 multi-threading model uses Pthreads in the background?
Or is it written from the scratch?
I don't know if any of the members are gcc contributors but I am just curious.
If you run g++ -v it will give you a bunch of information about how it was configured. One of those things will generally be a line that looks like
Thread model: posix
which means that it was configured to use pthreads for its threading library (std::thread in libstdc++), and which means you also need to use any flags that might be required for pthreads on your system (-pthread on Linux).
This has nothing specific to do with the standard, its just a detail of how the standard is implemented by g++
C++ doesn't specify how threads are implemented. In practice C++ threads are generally implemented as thin wrappers over pre-existing system thread libraries (like pthreads or windows threads). There is even a provision to access the underlying thread object with std::thread::native_handle().
The reason that it crashes is that if you do not specify -pthreads or -lpthreads, a number of weakly defined pthreads stub functions from libc are linked. These stub functions are enough to get your program to link without error. However, actually creating a pthread requires the full on libpthread.a library, and when the dynamic linker (dl) tries to resolve those missing functions, you get a segmentation violation.

compatibility of c++11 and MPI library

After installing gcc and mpich library in my linux I can compile my codes with mpicxx compiler. Is it possible to use c++11 with mpi library with just upgrading gcc compiler?
Changing the compiler with a newer version should work in general unless some strong code generation changes are observed (e.g. different data alignment or different ABIs). MPI is a library and as such it doesn't care what language constructs you are using as long as those constructs don't mess up with its internals. Since you are going to use C++11 for the threading it provides, there are some things that you should be aware of.
First, multithreading doesn't always play nice with MPI. Most MPI implementations are internally threaded themselves but are not thread safe by default.
Second, MPI defines four levels of threading support:
MPI_THREAD_SINGLE: no threading support - MPI would function safely only when used by a single-threaded application;
MPI_THREAD_FUNNELED: partial threading support - MPI can be used in a multithreaded application but only the main thread may call to MPI;
MPI_THREAD_SERIALIZED: partial threading support - MPI can be used in a multithreaded application but no concurrent calls in different threads are allowed. That is, each thread can call into MPI but a serialisation mechanism has to be in place;
MPI_THREAD_MULTIPLE: full threading support - MPI can be called freely from many threads.
Truth is most MPI implementations support out of the box MPI_THREAD_FUNNELED at max with most of them supporting only MPI_THERAD_SINGLE. Open MPI for example has to be compiled with a non-default option in order to get the full threading support.
Multithreaded applications should initialise the MPI library using MPI_Init_thread() instead of MPI_Init() and the thread that makes the initialisation call becomes the main thread - the very same main thread that is only allowed to call into MPI when the supported level is MPI_THREAD_FUNNELED. One gives MPI_Thread_init() the desired level of threading support and the function returns the supported level which might be lower than desired. In the latter case correct and portable programs are supposed to act accordingly and either switch to non-threaded operation or abort with the respective error message to the user.
More information about how MPI works together with threads can be found in the MPI Standard v2.2.
No problem as far as I can think of, since you shouldn't be able to tamper with the MPI directives in any way, and other than that, MPI and C++11 concerns are orthogonal.
By the way, issuing mpic++ or mpicxx on my machine (gcc 4.6.3, MPICH2 1.4.1) simply translates into
c++ -Wl,-Bsymbolic-functions -Wl,-z,relro -I/usr/include/mpich2 -L/usr/lib -lmpichcxx -lmpich -lopa -lmpl -lrt -lcr -lpthread
You can check that on your own machine with mpic++ -show.
It is no problem to combine C++11 with MPI.
mpic++ and mpicxx are only wrappers and use either the standard compiler or the user speciifed compiler. So you can define that mpic++ and mpicxx use a compiler with is compatible to C++11.
I do not know the exact command for mpich. For opemmpi you need to set these environment flags:
export OMPI_CC='gcc-mp-4.7'
export OMPI_CXX='g++-mp-4.7'
In my case I use openmpi 1.5.5 with the gcc 4.7 compiler from macports.