Merge Mach-O executable with a static lib?

Merge Mach-O executable with a static lib? - c++

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.

How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

Related

Resolving C/C++ library version conflicts, mainly on Linux

I have a situation more or less as illustrated here; it exists both on Windows and Linux, but I’ll draw it using Linux notation which is slightly clearer:
libbar.so.1 and libbar.so.2 contain procedures with the same names and signatures but different behaviours. If I link the applications ‘naïvely’, then symbols from the library that is loaded last will ‘hide’ the ones from the one that is loaded first (at least, if I understand this reference correctly) and I have no way of using both libraries at the same time nor determining which library’s symbols actually get used on the day.
Certainly, the error messages I’m getting at runtime are consistent with this mental model.
Now, if I could recompile all the libraries, then as described in the reference I could use the –default-symver option (at least when compiling with and linking with gcc) to disambiguate fully the symbols from each library, and force libfoo.so to look for the function baz##libbar.so.1 and the application to look for the function baz##libbar.so.2 with no ambiguity.
What, though, if I can only recompile my application and libbar.so.2? Will libfoo.so ignore the function baz##libbar.so.2, or will it look no further than the baz part of it and, potentially, still use it instead of the one in libbar.so.1?
It’s a full cross-platform application so I have exactly the same issue to address on Windows. Here, I have an inkling that I may be able to use the SxS mechanism as the situation isn’t all that different from, say, having an application compiled in VS2019 that links to a library compiled in VS2013; so that the application would need to link to one version of the runtimes and the library would need to link to the other. But whilst I know how to handle this with system libraries, I’m less sure how to proceed with other libraries.

Detecting and intercepting linked library dependencies at runtime

On a UNIX system is there a simple way to identify whether a dynamic (shared) library depends on other dynamic libraries?
I'm exploring system level APIs such as dlopen and friends in C and C++. I have a controlled environment where I will be calling dlopen and mapping specific functions. The idea is that people will be able to write native plugins to the application (and let us assume for the moment that it is in the author's best interest to make sure the library actually performs the correct function and is not malicious).
However, as a security measure I would like to ensure at runtime that the loaded dynamic library does not link to any other dynamic libraries (the intention being to disallow system calls and only permit functions from the local maths library) - and if it does, don't load it.
Some (long and/or difficult) solutions I've come up with:
Write a compiler shim that compiles the plugin code with a very specific set of compiler options, and injects some sort of checksum function that can be used to validate that the compiler shim was used.
Individually slice out all function calls referencing function names I don't want to be used.
Intercept all syscalls with some linker path trickery.
Is there a simple(r) way of checking dynamic libraries' dependencies?

However, as a security measure I would like to ensure at runtime that the loaded dynamic library does not link to any other dynamic libraries
Note that a dynamic library can import a symbol without explicitly linking to any other dynamic library. For example:
int foo() { return open("/etc/passwd", O_RDONLY); }
gcc -fPIC -shared -o foo.so foo.c -nostdlib
Now foo.so does not have any DT_NEEDED shared library dependencies, yet will still open /etc/passwd at will.
(the intention being to disallow system calls and only permit functions from the local maths library) - and if it does, don't load it.
There is so much wrong with your intention, it's not even funny.
For a start, you don't need to link to any external library to execute system calls directly, it can be trivially done in assembly.
Also, the plugin can find and modify dynamic loader's data, and once it does, it can make your main program do arbitrary things. For example, it can hijack all of your main program's calls into libc.so, and redirect them elsewhere.
Once you run untrusted code in your process, it's game over.
solutions I've come up with
None of them have even remote chance of working. Any time you expend on this is time totally wasted.

about Linux c++ static library

here I have several problem for static libraries need your help.
From some books I learned that, a static library (.a in Linux) contains a set of compiled objects, when it's linked into an executable, the link tool will only take out those objects that are actually referenced.
So if the .a contains 1.o, 2.o and 3.o and my application only uses functions in 1.o, then only 1.o will be built into the executable, is this correct?
Then let's go further, say we have 2 .a libs, first contains 1.o, 2.o and 3.o and second contains 3.o 4.o and 5.o. If my application only uses functions in 1.o 2.o 3.o and 4.o then only these these 4 .o will be built into the executable, is it correct?
I raised this question because I'm building some .a files to use with MSVC. These .a libraries are built in MinGW and then should be compatible with MSVC. I could include these libs into the MSVC project and build my program successfully. But the generated executable is 5 MB (the total size of all .a should be about 8MB) even when my program is an empty one (only have an empty main function).
Does this mean, .a when used in MSVC or .lib (static libs for Win) will be built into executable as a whole, but not in the way it behaves under Linux?
and also I have a question for below content
If I could use -static to link to static version of lib tiff, then why it needs to link to other libs? Should a static lib already contains all the code it needs?
Thanks

A static library won't have all code it needs. Imagine a library that uses printf() to output error messages. This library will still depend on the static version of libc, it won't include the code for printf itself.
In your case, since Tiff supports various internal representation formats, one of which is jpeg, the static libtiff wants you to link the static libjpeg.
There is no fundamental difference in this between windows and linux. When you do a static link against libtiff and libjpeg, only the libjpeg functions that are actually needed by libtiff get linked, but not, for example, the parts that handle the JFIF Jpeg wrapper.
EDIT - answer to comments
There's a lot of stuff going on before your main() gets called. It's not that much on linux/unix, where the OS delivers arguments the way main() wants them, but on Windows, a different function, normally called WinMain() gets control when the program starts. This WinMain() is hidden in the library. It gets the whole command line in one string and has to parse arguments to pass them to main(), which means checking for spaces, which is probably implemented using isspace(), which pulls in ctype, which pulls in lots of locale-dependent stuff, and so on. So much of your 5 MB might be code that serves to make your program work in windows just like it would in unix. Also, if you're compiling with debug options on, these debug symbols take a lot of space as well.
Keltar's comment about a static library calling a dynamic one is right as well - but that adds a complication you don't need very often. There's more or less 2 reasons for static linkage:
You want your program be able to run even when something goes wrong with dynamic libraries. If i accidentially rm /lib/libc.so.* i will be glad if i have statically linked versions of mount and cp to copy it from somewhere else. Thus, installers and "emergency programs" are often linked static
You want to make sure your program uses a specific version of libraries, that was current on your system when you compiled it, not the dynalink version that might be installed on some system 5 years later.
Both these reasons don't make much sense if some but not all libraries are static.
There's exceptions though: imagine you need a specific feature in libtiff. You browse the documentation and it says nothing about the feature. You check the source code and find that it implements specific_feature(), with a big "This is experimental and might go away in a future version" comment. If you decide you need that feature now, you might want to link libtiff statically to protect your program from failing when future versions of libtiff don't implement the function any more. Of course, you'll still want the dynamic versions of libjpeg and libc. I'll leave the decision if this is good practise to you.
Windows is a special case as it always uses kernel.dll and user32.dll as there are no static versions, even if the rest of your program is linked statically.
So while a libtiff.a could require a libjpeg.so, and a libtiff.lib could require a libjpeg.dll, there's generally not much reason to do so.

How does the Import Library work? Details?

I know this may seem quite basic to geeks. But I want to make it crystal clear.
When I want to use a Win32 DLL, usually I just call the APIs like LoadLibrary() and GetProcAdderss(). But recently, I am developing with DirectX9, and I need to add d3d9.lib, d3dx9.lib, etc files.
I have heard enough that LIB is for static linking and DLL is for dynamic linking.
So my current understanding is that LIB contains the implementation of the methods and is statically linked at link time as part of the final EXE file. While DLL is dynamic loaded at runtime and is not part of the final EXE file.
But sometimes, there're some LIB files coming with the DLL files, so:
What are these LIB files for?
How do they achieve what they are meant for?
Is there any tools that can let me inspect the internals of these LIB files?
Update 1
After checking wikipedia, I remember that these LIB files are called import library.
But I am wondering how it works with my main application and the DLLs to be dynamically loaded.
Update 2
Just as RBerteig said, there're some stub code in the LIB files born with the DLLs. So the calling sequence should be like this:
My main application --> stub in the LIB --> real target DLL
So what information should be contained in these LIBs? I could think of the following:
The LIB file should contain the fullpath of the corresponding DLL; So the DLL could be loaded by the runtime.
The relative address (or file offset?) of each DLL export method's entry point should be encoded in the stub; So correct jumps/method calls could be made.
Am I right on this? Is there something more?
BTW: Is there any tool that can inspect an import library? If I can see it, there'll be no more doubts.

Linking to a DLL file can occur implicitly at compile link time, or explicitly at run time. Either way, the DLL ends up loaded into the processes memory space, and all of its exported entry points are available to the application.
If used explicitly at run time, you use LoadLibrary() and GetProcAddress() to manually load the DLL and get pointers to the functions you need to call.
If linked implicitly when the program is built, then stubs for each DLL export used by the program get linked in to the program from an import library, and those stubs get updated as the EXE and the DLL are loaded when the process launches. (Yes, I've simplified more than a little here...)
Those stubs need to come from somewhere, and in the Microsoft tool chain they come from a special form of .LIB file called an import library. The required .LIB is usually built at the same time as the DLL, and contains a stub for each function exported from the DLL.
Confusingly, a static version of the same library would also be shipped as a .LIB file. There is no trivial way to tell them apart, except that LIBs that are import libraries for DLLs will usually be smaller (often much smaller) than the matching static LIB would be.
If you use the GCC toolchain, incidentally, you don't actually need import libraries to match your DLLs. The version of the Gnu linker ported to Windows understands DLLs directly, and can synthesize most any required stubs on the fly.
Update
If you just can't resist knowing where all the nuts and bolts really are and what is really going on, there is always something at MSDN to help. Matt Pietrek's article An In-Depth Look into the Win32 Portable Executable File Format is a very complete overview of the format of the EXE file and how it gets loaded and run. Its even been updated to cover .NET and more since it originally appeared in MSDN Magazine ca. 2002.
Also, it can be helpful to know how to learn exactly what DLLs are used by a program. The tool for that is Dependency Walker, aka depends.exe. A version of it is included with Visual Studio, but the latest version is available from its author at http://www.dependencywalker.com/. It can identify all of the DLLs that were specified at link time (both early load and delay load) and it can also run the program and watch for any additional DLLs it loads at run time.
Update 2
I've reworded some of the earlier text to clarify it on re-reading, and to use the terms of art implicit and explicit linking for consistency with MSDN.
So, we have three ways that library functions might be made available to be used by a program. The obvious follow up question is then: "How to I choose which way?"
Static linking is how the bulk of the program itself is linked. All of your object files are listed, and get collected together in to the EXE file by the linker. Along the way, the linker takes care of minor chores like fixing up references to global symbols so that your modules can call each other's functions. Libraries can also be statically linked. The object files that make up the library are collected together by a librarian in a .LIB file which the linker searches for modules containing symbols that are needed. One effect of static linking is that only those modules from the library that are used by the program are linked to it; other modules are ignored. For instance, the traditional C math library includes many trigonometry functions. But if you link against it and use cos(), you don't end up with a copy of the code for sin() or tan() unless you also called those functions. For large libraries with a rich set of features, this selective inclusion of modules is important. On many platforms such as embedded systems, the total size of code available for use in the library can be large compared to the space available to store an executable in the device. Without selective inclusion, it would be harder to manage the details of building programs for those platforms.
However, having a copy of the same library in every program running creates a burden on a system that normally runs lots of processes. With the right kind of virtual memory system, pages of memory that have identical content need only exist once in the system, but can be used by many processes. This creates a benefit for increasing the chances that the pages containing code are likely to be identical to some page in as many other running processes as possible. But, if programs statically link to the runtime library, then each has a different mix of functions each laid out in that processes memory map at different locations, and there aren't many sharable code pages unless it is a program that all by itself is run in more than process. So the idea of a DLL gained another, major, advantage.
A DLL for a library contains all of its functions, ready for use by any client program. If many programs load that DLL, they can all share its code pages. Everybody wins. (Well, until you update a DLL with new version, but that isn't part of this story. Google DLL Hell for that side of the tale.)
So the first big choice to make when planning a new project is between dynamic and static linkage. With static linkage, you have fewer files to install, and you are immune from third parties updating a DLL you use. However, your program is larger, and it isn't quite as good citizen of the Windows ecosystem. With dynamic linkage, you have more files to install, you might have issues with a third party updating a DLL you use, but you are generally being friendlier to other processes on the system.
A big advantage of a DLL is that it can be loaded and used without recompiling or even relinking the main program. This can allow a third party library provider (think Microsoft and the C runtime, for example) to fix a bug in their library and distribute it. Once an end user installs the updated DLL, they immediately get the benefit of that bug fix in all programs that use that DLL. (Unless it breaks things. See DLL Hell.)
The other advantage comes from the distinction between implicit and explicit loading. If you go to the extra effort of explicit loading, then the DLL might not even have existed when the program was written and published. This allows for extension mechanisms that can discover and load plugins, for instance.

These .LIB import library files are used in the following project property, Linker->Input->Additional Dependencies, when building a bunch of dll's that need additional information at link time which is supplied by the import library .LIB files. In the example below to not get linker errors I need to reference to dll's A,B,C, and D through their lib files. (note for the linker to find these files you may need to include their deployment path in Linker->General->Additional Library Directories else you will get a build error about being unable to find any of the provided lib files.)
If your solution is building all dynamic libraries you may have been able to avoid this explicit dependency specification by relying instead on the reference flags exposed under the Common Properties->Framework and References dialog. These flags appear to automatically do the linking on your behalf using the *.lib files.
This however is as it says a Common Properties, which is not configuration or platform specific. If you need to support a mixed build scenario as in our application we had a build configuration to render a static build and a special configuration that built a constrained build of a subset of assemblies that were deployed as dynamic libraries. I had used the Use Library Dependency Inputs and Link Library Dependencies flags set to true under various cases to get things to build and later realizing to simplify things but when introducing my code to the static builds I introduced a ton of linker warnings and the build was incredibly slow for the static builds. I wound up introducing a bunch of these sort of warnings...
warning LNK4006: "bool __cdecl XXX::YYY() already defined in CoreLibrary.lib(JSource.obj); second definition ignored D.lib(JSource.obj)
And I wound up using the manual specification of Additional Dependencies to satisfy the linker for the dynamic builds while keeping the static builders happy by not using a common property that slowed them down. When I deploy the dynamic subset build I only deploy the dll files as these lib files are only used at link time, not at runtime.

Here are some related MSDN topics to answer my question:
Linking an Executable to a DLL
Linking Implicitly
Determining Which Linking Method to Use
Building an Import Library and Export File

There are three kinds of libraries: static, shared and dynamically loaded libraries.
The static libraries are linked with the code at the linking phase, so they are actually in the executable, unlike the shared library, which has only stubs (symbols) to look for in the shared library file, which is loaded at run time before the main function gets called.
The dynamically loaded ones are much like the shared libraries, except they are loaded when and if the need arises by the code you've written.

In my mind, there are two method to link dll to exe.
Use dll and the import library (.lib file) implicitly
Use functions like loadlibrary() explicitly

Building C++ source code as a library - where to start?

Over the months I've written some nice generic enough functionality that I want to build as a library and link dynamically against rather than importing 50-odd header/source files.
The project is maintained in Xcode and Dev-C++ (I do understand that I might have to go command line to do what I want) and have to link against OpenGL and SDL (dynamically in SDL's case). Target platforms are Windows and OS X.
What am I looking at at all?
What will be the entry point of my
library if it needs one?
What do I have to change in my code?
(calling conventions?)
How do I release it? My understanding
is that headers and the compiled
library (.dll, .dylib(, .framework),
whatever it'll be) need to be
available for the project -
especially as template functionality
can not be included in the library by
nature.
What else I need to be aware of?

I'd recommend building as a statc library rather than a DLL. A lot of the issues of exporting C++ functions and classes go away if you do this, provided you only intend to link with code produced by the same compiler you built the library with.
Building a static library is very easy as it is just an collection of .o/.obj files - a bit like a ZIP file but without compression. There is no need to export anything - just include the library in the list of files that your application links with. To access specific functions or classes, just include the relevant header file. Note you can't get rid of header files - the C++ compilation model, particularly for templates, depends on them.

It can be problematic to export a C++ class library from a dynamic library, but it is possible.
You need to mark each function to be exported from the DLL (syntax depends on the compiler). I'm poking around to see if I can find how to do this from xcode. In VC it's __declspec(dllexport) and in CodeWarrior it's #pragma export on/#pragma export off.
This is perfectly reasonable if you are only using your binary in-house. However, one issue is that C++ methods are named differently by different compilers. This means that nobody who uses a different compiler will be able to use your DLL, unless you are only exporting C functions.
Also, you need to make sure the calling conventions match in the DLL and the DLL's client. This either means you should have the same default calling convention flag passed to the compiler for both the DLL or the client, or better, explicitly set the calling convention on each exported function in the DLL, so that it won't matter what the default is for the client.
This article explains the naming issue:
http://en.wikipedia.org/wiki/Name_decoration

The C++ standard doesn't define a standard ABI, and that's bad news for people trying to build C++ libraries. This means that you get different behavior from your compiled code depending on which flags were used to compile it, and that can lead to mysterious bugs in code that compiles and links just fine.
This extends beyond just different calling conventions - C++ code can be compiled to support or not support RTTI, exception handling, and with various optimizations that can affect the the memory layout of class instances, which C++ code relies on.
So, what can you do? I would build C++ libraries inside my source tree, and make sure that they're built as part of my project's build, and that all the libraries and the code that links to them use the same compiler flags.
Note that name mangling, which was supposed to at least prevent you from linking object files that were compiled with different compilers/compiler flags only mostly works, and there are certain things you can do, especially with GCC, that will result in code that links just fine and fails at runtime.
You have to be extra careful with vendor supplied dynamic C++ libraries (QT on most Linux distributions, for example.) I've seen instances of vendor supplied libraries that were compiled in ways that prevented certain things from working properly. For example, some Redhat Linux releases (maybe all of them) disabled exceptions in QT, which made it impossible to catch exceptions in main() if the exceptions were thrown in a QT callback. Fun.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js