position independent executable (-pie) for arm(cortex-m3) - c++

I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.

The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.

There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly

Related

Load and link obj files at Runtime in C/C++ (JIT)

I'm searching for a C or C++ library which can load and link obj files (doesn't matter if ELF or obj) dynamicly at runtime. I spend some time searching for such library, but my results weren't successful.
What I tried:
LLVM:
Currently my best solution! I used Clang to generate .obj files in the bytecode format of LLVM and used its JIT functions to dynamic load and execute the function. But, the LLVM is huge and my PC at home hasn't the power to compile the complete LLVM just for the JIT. Also I encountered some problems with relocation overflows or not implemented relocation types.
libjit:
I read, that it can load .elf files and link them too. But sadly, I couldn't compile it for windows, so I couldn't try.
Nanojit and NativeJit:
It seems like they don't support JITting an object file.
So... What can I do? Do I have to stick around with the LLVM? Are there any alternatives?
I suppose that an analogy that can be taken as a 1st approach is that the .bc is similar to an .o (or .obj) file in that it is just the translation of C++ code to an intermediate language, and tht it can contain references to functions not defined in it, to be searched in libraries.
And that the JIT-ted code is similar to a DLL, in the sense that it will be linked dynamically to the executable where it will run in.
You need not to compile LLVM -- you can download the binaries for LLVM and assorted utilities (like clang) from LLVM Download Page

How can I statically link standard library to my C++ program?

I'm using Code::Blocks IDE(v13.12) with GNU GCC Compiler.
I want to the linker to link static versions of required runtime libraries for my programs, how may I do this?
I already know that my executable size will increase. Would you please tell me other the downsides?
What about doing this in Visual C++ Express?
Since nobody else has come up with an answer yet, I will give it a try. Unfortunately, I don't know that Code::Blocks IDE so my answer will only be partial.
1 How to Create a Statically Linked Executable with GCC
This is not IDE specific but holds for GCC (and many other compilers) in general. Assume you have a simplistic “hello, world” program in main.cpp (no external dependencies except for the standard library and runtime library). You'd compile and statically link it via:
Compile main.cpp to main.o (the output file name is implicit):
$ g++ -c -Wall main.cpp
The -c tells GCC to stop after the compilation step (not run the linker). The -Wall turns on most diagnostic messages. If novice programmers would use it more often and pay more attention to it, many questions on this site would not have been asked. ;-)
Link main.o (could list more than one object file) statically pulling in the standard and runtime library and put the executable in the file main:
$ g++ -o main main.o -static
Without using the -o main switch, GCC would have put the final executable in the not so well-named file a.out (which once eventually stood for “assembly output”).
Especially at the beginning, I strongly recommend doing such things “by hand” as it will help get a better understanding of the build tool-chain.
As a matter of fact, the above two commands could have been combined into just one:
$ g++ -Wall -o main main.cpp -static
Any reasonable IDE should have options for specifying such compiler / linker flags.
2 Pros and Cons of Static Linking
Reasons for static linking:
You have a single file that can be copied to any machine with a compatible architecture and operating system and it will just work, no matter what version of what library is installed.
You can execute the program in an environment where the shared libraries are not available. For example, putting a statically linked CGI executable into a chroot() jail might help reduce the attack surface on a web server.
Since no dynamic linking is needed, program startup might be faster. (I'm sure there are situations where the opposite is true, especially if the shared library was already loaded for another process.)
Since the linker can hard-code function addresses, function calls might be faster.
On systems that have more than one version of a common library (LAPACK, for example) installed, static linking can help make sure that a specific version is always used without worrying about setting the LD_LIBRARY_PATH correctly. Obviously, this is also a disadvantage since now you cannot select the library any more without recompiling. If you always wanted the same version, why would you have installed more than one in the first place?
Reasons against static linking:
As you have already mentioned, the size of the executable might grow dramatically. This depends of course heavily on what libraries you link in.
The operating system might be smart enough to load the text section of a shared library into the RAM only once if several processes need the library at the same time. By linking statically, you void this advantage and the system might run short of memory more quickly.
Your program no longer profits from library upgrades. Instead of simply replacing one shared library with a (hopefully ABI compatible) newer release, a system administrator will have to recompile and reinstall every program that uses it. This is the most severe drawback in my opinion.
Consider for example the OpenSSL library. When the Heartbleed bug was discovered and fixed earlier this year, system administrators could install a patched version of OpenSSL and restart all services to fix the vulnerability within a day as soon as the patch was out. That is, if their services were linking dynamically against OpenSSL. For those that have been linked statically, it would have taken weeks until the last one was fixed and I'm pretty sure that there is still proprietary “all in one” software out in the wild that did not see a fix up to the present day.
Your users cannot replace a shared library on the fly. For example, the torsocks script (and associated library) allows users to replace (via setting LD_PRELOAD appropriately) the networking system library by one that routes their traffic through the Tor network. And this even works for programs whose developers never even thought of that possibility. (Whether this is secure and a good idea is subject of an unrelated debate.) An other common use-case is debugging or “hardening” applications by replacing malloc and the like with specialized versions.
In my opinion, the disadvantages of static linking outweigh the advantages in all but very special cases. As a rule of thumb: link dynamically if you can and statically if you have to.
A Addendum
As Alf has pointed out (see comments), there is a special GCC option to selectively link in the C++ standard library statically but not link the whole program statically. From the GCC manual:
-static-libstdc++
When the g++ program is used to link a C++ program, it normally automatically links against libstdc++. If libstdc++ is available as a shared library, and the -static option is not used, then this links against the shared version of libstdc++. That is normally fine. However, it is sometimes useful to freeze the version of libstdc++ used by the program without going all the way to a fully static link. The -static-libstdc++ option directs the g++ driver to link libstdc++ statically, without necessarily linking other libraries statically.
In Visual C++, the /MT option does a static link and the /MD option does a dynamic link. (see http://msdn.microsoft.com/en-us/library/2kzt1wy3.aspx)
I'd recommend using /MD and redistributing the C++ runtime, which is freely available from Microsoft. Once the C++ runtime is installed, than any program requiring the run time will continue to work. You would need to pass the proper option to tell the compiler which runtime to use. There is a good explanation here, Should I compile with /MD or /MT?
On Linux, I'd recommend redistributing libstdc++ instead of a static link. If their system libstdc++ works, I'd let the user just use that. System libraries, such as libpthread and libgcc should just use the system default. This requires compiling the program on a system with symbols compatible with all linux versions you are distributing for.
On Mac OS X, just redistribute the app with dynamic linking to libstdc++. Anyone using the same OS version should be able to use your program.

What is a .so.2 file?

I compiled Intel TBB from source using GCC. It generates a libtbb.so and littbb.so.2. Looks like the .so.2 file is the real shared library, libtbb.so just contains a line of text
INPUT (libtbb.so.2)
What is the purpose to generate these two files instead of one? For the INPUT (libtbb.so.2), what is the syntax? I want to know more about it.
Usually when you build shared objects (.so) then you also take care of versions by adding suffixes such as mylib.so.2.3.1. To make sure your programs can load this lib or other later versions you create links with names
mylib.so -> mylib.so.2.3.1
mylib.so.2 -> mylib.so.2.3.1
mylib.so.2.3 -> mylib.so.2.3.1
So, everything after .so represents version.sub-version.build (or similar)
Also, it is possible for more than one version of the same lib to coexist with this scheme, and all that is necessary to switch programs to using a particular version is to have the appropriate links in place.
Dynamically linked ELF binary (whether another library or an executable) uses shared-object name or soname to identify the library that the executable should be linked against upon execution.
When a library created as an ELF shared library, the compile-time link editor inserts a DT_SONAME field in the executable which the library's SONAME into the library itself. The DT_SONAME is defined in the ELF standard as:
This element holds the string table offset of a null-terminated
string, giving the name of the shared object. The offset is an index
into the table recorded in the DT_STRTAB entry. See ‘‘Shared Object
Dependencies’’ below for more information about these names.
So now when an executable is create the SONAME is embedded into it. When when the executable is run is used by the linker to look for the library in the files in the predifined locations for dynamic library. The predefined location in windows would be wherever DLLs reside. In Linux and Mac OS X and other System V compatible systems they would be /lib and /usr/lib and possibly other spots, it depends on the linker used, and can be defined in linkers own configurations.
In all events the linker looks to see if the library named in soname entry is present in any of those locations, if it is it will use it.
Note that the standard says that the soname is a STRING and the versioning conventions became a defacto standard after the fact and goes something like this:
Make the soname to be libmyname.so.A and make the library filename be libmyname.so.A.B or libmyname.so.A.B.C (under MacOSX it's libmyname.A.B.dylib). Create a softlink from libmyname.so.A.B[.C]? to libmyname.so.A.
A is kept the same while the library's ABI stays the same.
B (or B.C) becomes the minor version.
Under Linux it's really common that the library version would be the same as the package version number. This has its pros and cons.
libtool formalization
GNU libtool is used a lot to build dynamic libraries, and has a more formal versioning system and has strong logic for it. The libtool versioning system for sonames works very well and is adopted by complex libraries to keep things straight.
Under libtool, the versioning is as under:
libmylib-current.release.age
Under libtool the idea is that as libraries evolve they will add and remove functionality.
Let's say you are developing a library. Start by using a version as 0.0.0.
Now let's say you fix a few bugs, you would only increase the release number.
So new name would be come libmylib.0.1.0 or libmylib.0.2.0 etc.. for every release that just fixes bugs but doesn't change any of the ABI.
Along the way you say. Ugh! I could've done this subfunctionality better, So you add a new set of functions to do something better, but because others are still using your library so you still leave the old (deprecated) functionality in there.
The rules are as under:
Start with version information of ‘0:0:0’ for each libtool library.
Update the version information only immediately before a public
release of your software. More frequent updates are unnecessary, and
only guarantee that the current interface number gets larger faster.
If the library source code has changed at all since the last update,
then increment revision (‘c:r:a’ becomes ‘c:r+1:a’).
If any interfaces have been added, removed, or changed since the last
update, increment current, and set revision to 0.
If any interfaces have been added since the last public release, then increment age.
If any interfaces have been removed or changed since the last public
release, then set age to 0.
You can read more about it in the libtool documentation
Update ...
The following was a comment that my explanation has an error. It does not, which, requires a bit more detail than can be put into an answer comment, so see below.
Original objection
There is an error here: on linux, the version is of the form
libmylib.(current-age).release.age, where the parentheses indicate an
expression to be evaluated. For example GLPK 4.54 with
current:revision:age = 37:1:1 on linux installs the library file
libglpk.so.36.1.1. For more info, see, e.g.,
<autotools.io/libtool/version.html>.
Rebuttal
TLDR: autotools.io's not authortative source.
Explanation
Whilst the Flameeyes is an amazing developer and he is one of Gentoo maintainers, it was he who made the mistake, and created a "rule of thumb" loose interpretation of the libtool spec. While this is not going to break systems 99% of the time, if we were to follow the ad-hoc way of updating current:
The rules of thumb, when dealing with these values are:
Always increase the revision value.
Increase the current value whenever an interface has been added,
removed or changed.
Increase the age value only if the changes made to the ABI are
backward compatible.
he then goes on to say that maintaining multiple versions of Gtk it would be best to just append the library version into the library NAME and simply dump the version number. (as they do in GTK+):
In this situation, the best option is to append part of the library's
version information to the library's name, which is exemplified by
Glib's libglib-2.0.so.0 soname. To do so, the declaration in the
Makefile.am has to be like this:
lib_LTLIBRARIES = libtest-1.0.la
libtest_1_0_la_LDFLAGS = -version-info 0:0:0
Well that's just crockpot approach to mucking up the power of dynamic linking and symbol resolution versioning completely moot!. He's saying just turn it off. Horse boogers! No wonder even experienced developers have had a hard time building and maintaining open source projects and we are constantly running into binaries dying every time new versions of libraries are installed (because they clobber each other).
The libtool versioning approach is VERY WELL THOUGHT OUT. It is an algorithm and its steps are ordered instructions 1 to 6 are to be followed every time there is an update to the code of a dynamic linked library.
For new and current developers, please read them carefully and visualize what will happen to the library version number throughout the life of your amazing software. If you do you will notice that every piece of previously linked software will always use the most current and accurate version of your amazing library correctly, and none of them will ever clobber or stomp on each other, AND you never have to add a blooming number in the name of your library (unless it's for pleasure or esthetics).

Static linking of Glibc

How can i compile my app linking statically glibc library, but only the code needed for my app? (Not all lib)
Now my compile command:
g++ -o newserver test.cpp ... -lboost_system -lboost_thread -std=c++0x
Thanks!
That's what -static does (as described in another answer): unneeded modules won't get linked into your program. But your expectations on the amount of stuff which is needed (in a sense that we can't convince linker to the contrary) may be too optimistic.
If you trying to do it for portability (running an executable on other machines with older glibc or something like that), there is one easy test question to see if you're going to get what you want:
Did you think of the problem with libnss, and are you sure it is not going to bite you?
If your answer is yes, maybe it makes sense to go on. If the answer is no, or the question seems too obscure and there is no answer, just quit your expirements with statically linked glibc: it has more chance to hurt than help.
Add -static to the compile line. It will only add what your application needs [and of course, any functions the functions you application calls, and any functions those functions call, including a bunch of startup code and some other bits and pieces], so it will be around 800K (for a simple "hello world" program) on an x86 machine. Other architectures vary. Since boost probably also calls the standard library at least a little bit, it's likely that you will have more than 800K added to your appliciation. But it only applies functions used by any of the code in the final binary, not the entire library [about 2MB as a shared library].
If you ONLY want link glibc, you will need to modify the linking line to your compile to:
-Wl,-Bstatic -libc -Wl,-Bdynamic. This will prevent any other library from being linked statically [you sometimes need to have more than one of these statements, as sometimes something pulled in by another library requires "more" from glibc to be pulled in - don't worry, it won't bring in anything more than the linker thinks is necessary].

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.
How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.