Why the .a file is smaller than the .so file? - c++

the .a file and the .so file
the CMakeLists.txt

My guess is that .a file is a simple archive format for .o object files (which in turn are more or less a result of converting textual assembly into binary), while .so is a full-fledged ELF binary, which contains additional data sections and meta-data in the header.

Related

How to link multiple LLVM IRs into one .so file?

I have three files as follows:
Counter.cpp
Counter.h
native-lib.cpp
These three files are jni code.
I want to convert the cpp file into an LLVM IR file first, and then make some modifications on the LLVM IR file.
Finally, the modified IR file is generated into a .so file.
What LLVM command can do the above mentioned?

What exactly is in a .o / .a / .so file?

I was wondering what exactly is stored in a .o or a .so file that results from compiling a C++ program. This post gives a quite good overview of the compilation process and the function of a .o file in it, and as far as I understand from this post, .a and .so files are just multiple .o files merged into a single file that is linked in a static (.a) or dynamic (.so) way.
But I wanted to check if I understand correctly what is stored in such a file. After compiling the following code
void f();
void f2(int);
const int X = 25;
void g() {
f();
f2(X);
}
void h() {
g();
}
I would expect to find the following items in the .o file:
Machine code for g(), containing some placeholder addresses where f() and f2(int) are called.
Machine code for h(), with no placeholders
Machine code for X, which would be just the number 25
Some kind of table that specifies at which addresses in the file the symbols g(), h() and X can be found
Another table that specifies which placeholders were used to refer to the undefined symbols f() and f2(int), which have to be resolved during linking.
Then a program like nm would list all the symbol names from both tables.
I suppose that the compiler could optimize the call f2(X) by calling f2(25) instead, but it would still need to keep the symbol X in the .o file since there is no way to know if it will be used from a different .o file.
Would that be about correct? Is it the same for .a and .so files?
Thanks for your help!
You're pretty much correct in the general idea for object files. In the "table that specifies at which addresses in the file" I would replace "addresses" with "offsets", but that's just wording.
.a files are simply just archives (an old format that predates tar, but does the same thing). You could replace .a files with tar files as long as you taught the linker to unpack them and just link with all the .o files contained in them (more or less, there's a little bit more logic to not link with object files in the archive that aren't necessary, but that's just an optimization).
.so files are different. They are closer to a final binary than an object file. An .so file with all symbols resolved can at least theoretically be run as a program. In fact, with PIE (position independent executables) the difference between a shared library and a program are (at least in theory) just a few bits in the header. They contain instructions for the dynamic linker how to load the library (more or less the same instructions as a normal program) and a relocation table that contains instructions telling the dynamic linker how to resolve the external symbols (again, the same in a program). All unresolved symbols in a dynamic library (and a program) are accessed through indirection tables which get populated at dynamic linking time (program start or dlopen).
If we simplify this a lot, the difference between objects and shared libraries is that much more work has been done in the shared library to not do text relocation (this is not strictly necessary and enforced, but it's the general idea). This means that in object files the assembler has only generated placeholders for addresses which the linker then fills in, for a shared library the addresses are filled in with addresses to jump tables so that the text of the library doesn't need to get changed, only a limited jump table.
Btw. I'm talking ELF. Older formats had more differences between programs and libraries.
What you described in your question (machine code for functions, initialization data and relocation tables) is pretty much exactly what is inside .o (object) and .so (shared object) files.
.a (archives) are basically multiple .o (object) files bunched together for easier reference during linking. ("Link libraries")
.so (shared object) files include some additional metadata, like which other .so's would need to be linked in. (xyz.so might reference some functions that reside in abc.so, and the information that abc.so would need to be linked in, plus optionally the path where to find abc.so (the RPATH), need to be encoded in xyz.so.)
Windows .dll (dynamic link library) files are basically shared objects (.so) with a different name.
Disclaimer: This is simplifying things significantly, but is close enough to "The Truth (tm)" to serve for everyday developer needs.

large c++ static library file after linking

I am building a library from my source code which contain both header (.hpp) and source (.cpp) files. I have a make file which compiles all the source files into respective object file individually and then a library creation (ar rcs ..) statements which combines all the *.o files and builds a static library out of it. The resulting size of the library file is huge (around 17 Mb). Instead when I do g++ -o a.out *.cpp the out file has a size of 1.4 Mb. Is the archiver command (ar rcs) not removing redundant information from all the individual object files ? I also created shared objects and those were small as well, but I need a static library file for my purpose
Try strip the library, the debug and symbol information and tables might take the extra space.
Also, the ar s option might inflate the resulting archive (again, strip, or just don't use ar rcs, just ar rc).
What happens when you link your static library into a final binary? There's a lot of extra data that should get stripped out during the link process into a binary.

Binary image code size comparison between Cocoa Touch Static Library and DLL dynamic library

I am migrating a C++ dynamic DLL library from Windows OS to iOS with static library(.a).
And some issues come up about the binary image size.
Here are some data I got.
Image size:
dynamic library (DLL compiled in Windows OS) : 1.4M
static library(.a,complied with Touch Static Library): 34M.
I checked each complied C++ source file between 'Obj' and 'o'. The size of them are almost same.while,after linked them together,the size of DLL is 1.4M, while the '.a' image size is 34M.It seems image size of '.a' approximate to the summation of all the '.o' files.
Is there any suggestion and guideline for migration C++ code to iOS?,especially in image size.
Is there any link flag for compiling C++ source code in Cocoa Touch Static Library project?
In Unix-based systems (including iOS), a ".a" file is simply an archive (man 5 ar) combining a bunch of ".o" files in a single unit (possibly with the inclusion of a symbol table). So, your observation the size of the ".a" file is the summation of the ".o" files is exactly correct.
When you link an application against a ".a" file, the final executable will include only the elements of the ".a" file that are actually referenced.

Exporting member function of a static library

Is it possible (or relevant at all) to export member functions of a static library?
When I "dumpbin /EXPORTS" my .lib file I don't see any of my defined class members.
Linking to this lib file succeeds, but I use an external tool that fails to read non-exported symbols.
Also tried adding a .def file with no results.
A static library is just a collection of .o files. This is then linked into your executable in exactly the same way as .o files so whatever works for .o files will work for static libraries.