What is a .so.2 file?

What is a .so.2 file? - c++

I compiled Intel TBB from source using GCC. It generates a libtbb.so and littbb.so.2. Looks like the .so.2 file is the real shared library, libtbb.so just contains a line of text
INPUT (libtbb.so.2)
What is the purpose to generate these two files instead of one? For the INPUT (libtbb.so.2), what is the syntax? I want to know more about it.

Usually when you build shared objects (.so) then you also take care of versions by adding suffixes such as mylib.so.2.3.1. To make sure your programs can load this lib or other later versions you create links with names
mylib.so -> mylib.so.2.3.1
mylib.so.2 -> mylib.so.2.3.1
mylib.so.2.3 -> mylib.so.2.3.1
So, everything after .so represents version.sub-version.build (or similar)
Also, it is possible for more than one version of the same lib to coexist with this scheme, and all that is necessary to switch programs to using a particular version is to have the appropriate links in place.

Dynamically linked ELF binary (whether another library or an executable) uses shared-object name or soname to identify the library that the executable should be linked against upon execution.
When a library created as an ELF shared library, the compile-time link editor inserts a DT_SONAME field in the executable which the library's SONAME into the library itself. The DT_SONAME is defined in the ELF standard as:
This element holds the string table offset of a null-terminated
string, giving the name of the shared object. The offset is an index
into the table recorded in the DT_STRTAB entry. See ‘‘Shared Object
Dependencies’’ below for more information about these names.
So now when an executable is create the SONAME is embedded into it. When when the executable is run is used by the linker to look for the library in the files in the predifined locations for dynamic library. The predefined location in windows would be wherever DLLs reside. In Linux and Mac OS X and other System V compatible systems they would be /lib and /usr/lib and possibly other spots, it depends on the linker used, and can be defined in linkers own configurations.
In all events the linker looks to see if the library named in soname entry is present in any of those locations, if it is it will use it.
Note that the standard says that the soname is a STRING and the versioning conventions became a defacto standard after the fact and goes something like this:
Make the soname to be libmyname.so.A and make the library filename be libmyname.so.A.B or libmyname.so.A.B.C (under MacOSX it's libmyname.A.B.dylib). Create a softlink from libmyname.so.A.B[.C]? to libmyname.so.A.
A is kept the same while the library's ABI stays the same.
B (or B.C) becomes the minor version.
Under Linux it's really common that the library version would be the same as the package version number. This has its pros and cons.
libtool formalization
GNU libtool is used a lot to build dynamic libraries, and has a more formal versioning system and has strong logic for it. The libtool versioning system for sonames works very well and is adopted by complex libraries to keep things straight.
Under libtool, the versioning is as under:
libmylib-current.release.age
Under libtool the idea is that as libraries evolve they will add and remove functionality.
Let's say you are developing a library. Start by using a version as 0.0.0.
Now let's say you fix a few bugs, you would only increase the release number.
So new name would be come libmylib.0.1.0 or libmylib.0.2.0 etc.. for every release that just fixes bugs but doesn't change any of the ABI.
Along the way you say. Ugh! I could've done this subfunctionality better, So you add a new set of functions to do something better, but because others are still using your library so you still leave the old (deprecated) functionality in there.
The rules are as under:
Start with version information of ‘0:0:0’ for each libtool library.
Update the version information only immediately before a public
release of your software. More frequent updates are unnecessary, and
only guarantee that the current interface number gets larger faster.
If the library source code has changed at all since the last update,
then increment revision (‘c:r:a’ becomes ‘c:r+1:a’).
If any interfaces have been added, removed, or changed since the last
update, increment current, and set revision to 0.
If any interfaces have been added since the last public release, then increment age.
If any interfaces have been removed or changed since the last public
release, then set age to 0.
You can read more about it in the libtool documentation
Update ...
The following was a comment that my explanation has an error. It does not, which, requires a bit more detail than can be put into an answer comment, so see below.
Original objection
There is an error here: on linux, the version is of the form
libmylib.(current-age).release.age, where the parentheses indicate an
expression to be evaluated. For example GLPK 4.54 with
current:revision:age = 37:1:1 on linux installs the library file
libglpk.so.36.1.1. For more info, see, e.g.,
<autotools.io/libtool/version.html>.
Rebuttal
TLDR: autotools.io's not authortative source.
Explanation
Whilst the Flameeyes is an amazing developer and he is one of Gentoo maintainers, it was he who made the mistake, and created a "rule of thumb" loose interpretation of the libtool spec. While this is not going to break systems 99% of the time, if we were to follow the ad-hoc way of updating current:
The rules of thumb, when dealing with these values are:
Always increase the revision value.
Increase the current value whenever an interface has been added,
removed or changed.
Increase the age value only if the changes made to the ABI are
backward compatible.
he then goes on to say that maintaining multiple versions of Gtk it would be best to just append the library version into the library NAME and simply dump the version number. (as they do in GTK+):
In this situation, the best option is to append part of the library's
version information to the library's name, which is exemplified by
Glib's libglib-2.0.so.0 soname. To do so, the declaration in the
Makefile.am has to be like this:
lib_LTLIBRARIES = libtest-1.0.la
libtest_1_0_la_LDFLAGS = -version-info 0:0:0
Well that's just crockpot approach to mucking up the power of dynamic linking and symbol resolution versioning completely moot!. He's saying just turn it off. Horse boogers! No wonder even experienced developers have had a hard time building and maintaining open source projects and we are constantly running into binaries dying every time new versions of libraries are installed (because they clobber each other).
The libtool versioning approach is VERY WELL THOUGHT OUT. It is an algorithm and its steps are ordered instructions 1 to 6 are to be followed every time there is an update to the code of a dynamic linked library.
For new and current developers, please read them carefully and visualize what will happen to the library version number throughout the life of your amazing software. If you do you will notice that every piece of previously linked software will always use the most current and accurate version of your amazing library correctly, and none of them will ever clobber or stomp on each other, AND you never have to add a blooming number in the name of your library (unless it's for pleasure or esthetics).

Related

Compiled shared library has dependency on specific libicuuc.so.46 version

I am compiling a shared library using g++ on SUse Linux with cmake that depends on libicuuc.so and friends.
Suse has libicuuc.so, libicuuc.so.46 and libicuuc.so.46.1 in /usr/lib.
Now when I use ldd to list the dependencies of my library it tells me that it depends on libicuuc.so.46.
Since I want to distribute my library in binary form (it takes about 45 minutes to compile on a fast PC) this dependency is a problem. The target PCs have different versions of libicuuc.so.
What can I do so that my library depends on libicuuc.so and not libicuuc.so.46?
I tried to remove the so.46 versions in my /usr/lib folder before compiling but libicuuc.so depends on libicudata.so.46 so I keep that dependency on a 46 version what I try to avoid.

Read about external library versioning here.
What can I do so that my library depends on libicuuc.so and not libicuuc.so.46?
You can't do anything about that. The libicuuc.so that you have has SONAME set to libicuuc.so.46, and the linker dutifully records that dependency (as it should).
If developers release libicuuc.so.47, they would do so because the new library is not ABI-compatible with the old one (at least that's what they should do if they are not clueless).
If your library loaded libcuuc.so.47 (as you want it to), it would most likely crash due to the ABI incompatibility. Or worse: corrupt your end user's data. So achieving your desired result would get you into worse trouble than what you have now (not running is better than randomly crashing or corrupting data).
Update:
The libicuuc.so documentation explicitly states that "Binary compatibility is for versions that share the same major+minor number."
That means: you can't link a library compiled with version 4.6 (SONAME libicuuc.so.46) and expect it work with version 4.7.
You must either rebuild your library for each version of ICUUC, or distribute matching libicuuc.so.NN with your library (and hope that the user is not already using some other version of libicuuc).
Another possible alternative: statically link libicuuc.a into your library, and hide all of libicuuc.a symbols so they don't conflict with anything else. Note: this has licensing implications.

Why should we set symbolic link when we build a library?

I have a question related to building a library in C++ for multiple platforms. I notice that many libraries expect a "symbolic link". With CMake, the symbolic link is done by the following codes:
set_target_properties({library_name}, PROPERTIES VERSION, ${library_string_version} SOVERSION {library_string_shortversion})
I cannot understand why symbolic link is necessary for a library. Moreover, it seems to me symbolic link is always related to the version of the library, and are there any relationships between them? Thanks!

The advantage of using a symbolic link is that you can easily update the library, with a new version, maintaining a consistent name, while at the same time having the version in the library name accessible. So applications can always link against the same name if even if you update it. Only when they need a specific version, they can link to that instead.
Also it makes it easier to move it around if need be, because the application doesn't need to know where it comes from.
I often wish I had symbolic links in MS Windows as well, as it makes life much easier.

It allows for side-by-side versioning of the library.
libfoo.so -> libfoo.2.so
libfoo.1.so -> libfoo.1.23.so
libfoo.1.23.so
libfoo.2.so -> libfoo.2.1.so
libfoo.2.1.so
This way, libfoo.so is always the latest version. If you know, (for compatibility reasons) that you need version 1 and not version 2, you can link against libfoo.1.so, and always have the latest v1 version.

position independent executable (-pie) for arm(cortex-m3)

I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.

The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.

There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.

How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

Linking Statically with glibc and libstdc++

I'm writing a cross-platform application which is not GNU GPL compatible. The major problem I'm currently facing is that the application is linked dynamically with glibc and libstdc++, and almost every new major update to the libraries are not backwards compatible. Hence, random crashes are seen in my application.
As a workaround, I distribute binaries of my application compiled on several different systems (with different C/C++ runtime versions). But I want to do without this. So my question is, keeping licensing and everything in mind, can I link against glibc and libstdc++ statically? Also, will this cause issues with rtld?

You don't need to.
Copy the original libraries you linked against to a directory (../lib in this example) in your application folder.
Like:
my_app_install_path
.bin
lib
documentation
Rename you app for something like app.bin. Substitute your app for a little shell script that sets the enviroment variable LD_LIBRARY_PATH to the library path (and concatenate the previous LD_LIBRARY_PATH contents, if any). Now ld should be able to find the dynamic libraries you linked against and you don't need to compile them statically to your executable.
Remember to comply with the LGPL adding the given attribution to the libraries and pointing in the documentation where the source can be downloaded.

glibc is under the LGPL. Under section 6. of LGPL 2.1, you can distribute your program linked to the library provided you comply with one of five options. The first is to provide the source code of the library, along with the object code (source is optional, not required) of your own program, so it can be relinked with the library. You can alternatively provide a written offer of the same. Your own code does not have to be under the LGPL, and you don't have to release source.
libstdc++ is under the GPL, but with a major exception. You can basically just distribute under the license of your choice without providing source for either your own code or libstdc++. The only condition is that you compile normally, without e.g. proprietary modifications or plugins to GCC.
IANAL, and you should consider consulting one if you need real legal advice.

Specifying the option -static-libgcc to the linker would cause it to link against a static version of the C library, if available on the system. Otherwise it is ignored.

I must question what the heck you are doing with the poor library functions?
I have some cross platform software as well. It runs fine on Linux systems of all sorts. Build with the oldest version of software that you want to support. The glibc and libstdc++ libraries are really very backward compatible.
I have built on CentOS 4 and run it on RHEL 6 beta. No problems.
I can build on stable Debian and run it on testing.
Now, I do sometimes have trouble with some libraries if I try to build on, say old Debian and try to run it on CentOS 5.4. That is usually due to distribution configuration choices that are different, like choosing threading or non-threading.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js