Is it possible to un-link object files from an executable

Is it possible to un-link object files from an executable - c++

Background: I am looking at developing a package manager similar to portage in Gentoo Linux ( I may end up forking portage). For those that know little about Gentoo it is a source based distro, which means that all packages are compiled from source code. Currently it is possible to compile a program into object files and then into executable's.
$ gcc -c a.c -o a.o
$ gcc -c b.c -o b.o
$ gcc a.o b.o -o executable
The improvements I would like to make to portage are the following.
Ability to only re-compile object files that have been updated (track changes using GIT or otherwise).
Decompile/Unlink executable to object files.
Re-compile/re-link object files replacing only the old object files with the updated object files (Changes tracked using GIT or otherwise).
Then the newly compiled package replaces the old package. (trivial task)
Reasoning: I am an Arch linux user who loves the idea of a source based distribution but cannot be bothered with the enormous task of keeping my system up to date. I also do most of my work on a laptop computer with a small hard drive, hence the reason behind de-compiling/un-linking the executable to object files rather that just keeping the object files which take up a large amount of space. It would also likely decrease the overall compile time of the system as the need to re-compile most of the source code would be greatly reduced. It would also allow for an easy way to change the USE flags on a package without the need to completely re-compile.
Question: Is it possible to compile object files into an executable and then to de-compile back into object files. An example of this is below.
$ gcc -c a.c -o a.o
$ gcc -c b.c -o b.o
$ gcc a.o b.o -o executable
and then
$ SomeCommand executable
output << a.o b.o
If this is not currently possible. Would it be doable to modify a version of GNU's linker "$ld" to log the changes it makes when linking object files, so as to make intentionally make the program "reverse Engineerable" ???
Edit: Another use for this would be too separate a singular object file from an executable of a large project to swap the separated object file with a new one and to re-link again. This would reduce the overhead of re-linking large projects from many different files when only one is updated. This would allow for incremental compilation on the binary level.

No, this is not possible. A large amount of the linker's work is replacing symbolic references (valid for any combination of object files being linked together) with numeric offsets (valid only for the particular way the linker decided to lay out that particular combination of object files, that particular time). Once the references are "baked" in this way, they cannot be recovered.

It might be doable if you alter/configure ld to keep sections for each object file apart and also keeps the relocation table for each object file in the executable. Also you have to make sure ld stores the object file names in the executable if you want to get the original file names.
Basically a linker could just join the object files together and then do the relocations, if the relocations are inversible you should be able to reverse the process.

Related

Why I cannot run an object file?

When I want to run a source code why this works :
gcc test.c -o test.o
then
./test.o
but this does not work :
gcc -c test.c
then
./test.o
and get this message
bash: ./test.o: Permission denied

First of all, you are not creating an object file but an executable file. Object files are an intermediate file used as input file for the linker to create the executable file. That you name it with an .o suffix doesn't matter.
Secondly, due to tradition if you do not specify an output filename with the -o option the compiler frontend program and linker will create an executable named a.out.
But that's not all, because with the second example you are actually creating a real object file, and those are not executable. Like mentioned above, those needs to be passed to a separate linking step to create the executable file.
You either need to create an executable file:
gcc test.c
./a.out
Or you should link the object file into an executable file:
gcc -c test.c # Create object file
gcc test.o -o test # Use object file to create executable file
./test # Run the executable file

You get that message because the compiler doesn't set the executable bit on object files, because - well, because they are not executable. If you set the executable bit manually and try to run it, you'll get something like "unknown executable format".
Now, it's not just a format problem - the point is that an object file is just half of the work to get something that can actually be executed. In particular, it's missing the linking step, where the linker finds unresolved references and patches them with the addresses from other object files - including the ones you don't specify explicitly, like the standard library - and generates a proper executable file, that the kernel knows how to load and execute.

In the first case you just name the resulting file test.o by using -o, it has been compiled assembled and linked.
In the second case you merely compiled and assemble, it can't run without being linked. See gcc --h or Overall options for gcc for -c:
-c
Compile or assemble the source files, but do not link. The linking stage simply is not done. The ultimate output is in the form of an object file for each source file.
By default, the object file name for a source file is made by replacing the suffix ‘.c’, ‘.i’, ‘.s’, etc., with ‘.o’.
Unrecognized input files, not requiring compilation or assembly, are ignored.
(Emphasis mine)
You need to link it and then execute it:
gcc -o a.out test.o

You cannot run an object file. This is not executable and need to be linked to become executable.
Try
gcc -o test test.c and run using ./test.out

This is a fundamental question in gcc. Note that never using parameter -c when you want to get an executable file in single command such as
gcc -c xx.c yy.c -o new
.But you can get an executable file with -c in following commands
gcc -c xx.c yy.c
gcc xx.o yy.o -o new
It's equivalent to
gcc xx.c yy.c -o new

Creating a Minimal Shared Library

For background, I'm creating some C++ software that uses dynamically loaded shared library plugins for hardware output (the specifics of it aren't relevant here).
I'm building the executable by compiling everything into object files and then linking the ones needed, which is simple using an exclusion list. I can then build the shared library by specifying its primary object file (the one that's dynamically loaded and accessed at runtime) along with every other object file referenced by the primary one.
My question is this: Is there a way to provide the linker with the primary object file, and create a shared library containing only the objects it depends upon? All of the object files are in the same directory, I'm not using a Makefile (yet; if one could solve the problem, it's a valid answer), and compilation speed isn't an issue.
I've looked into the linker options --as-needed, --gc-sections, and --no-undefined, but I haven't been able to piece together a working build process.
Example: For source files main.cpp, a.cpp, b.cpp, a.h, and b.h, where main.cpp and a.cpp both include b.h:
gcc -fPIC -c *.cpp -I. builds object files main.o, a.o, and b.o.
gcc -o main.out *.o builds the final executable main.out from the object files... including a.o, which is unused. (--gc-sections should fix this.)
gcc -fPIC -shared -o a.so a.o -Wl,--as-needed !(a).o builds the final shared library a.so from all of the object files... including main.o, which is unused. How do I prevent main.o from being included in a.so?

Is there a way to provide the linker with the primary object file, and create a shared library containing only the objects it depends upon?
Yes: package all objects into an archive library liball.a, then link like this:
gcc -shared -o a.so a.o liball.a
The linker will then pull out from liball.a all objects that a.o depends on, and only these objects, as explained here.
Note: liball.a may contain a.o, there is no harm (as above link explains).
Update:
Is there a way to do it without needing to create an archive first?
I don't know of any portable way to do that. The Gold linker has --start-lib and --end-lib command line flags that achieve exactly that.

Compile Time & Memory Usage of a large C++ Project?

Suppose one has about 50,000 different .cpp files.
Each .cpp file contains just one class that has about ~1000 lines of code in it (the code itself is not complicated -- involves in-memory operations on matrices & vectors -- i.e, no special libraries are used).
I need to build a project (in a Linux environment) that will have to import & use all of these 50,000 different .cpp files.
A couple of questions come to mind:
How long will it roughly take to compile this? What will be the approx. size of the compiled file?
What would be a better approach -- keep 50,000 different .so files (compiled extenstions) and have the main program import them one by one, or alternatively, unite these 50,000 different .cpp files into one large .cpp file, and just deal with that? Which method will be faster / more efficient?
Any insights are greatly appreicated.

There is no answer, just advice.
Right back at you: What are you really trying to do? Are you trying to make a code library from different source files? Or is that an executable? Did you actually code that many .cpp files?
50,000 source files is well... a massively sized project. Are you trying to do something common across all files (e.g. every source file represents a resource, record, image, or something unique). Or it just 50K disparate code files?
Most of your compile time will not be based on the size of each source file. It will be based on the amount of header files (and the headers they include) that will be brought in with each cpp file. Headers, while not usually containing implementations, just declarations, have to go through a compile process. And redundant headers across the code base can slow your build time down.
Large projects at that kind of scale use precompiled headers. You can include all the commonly used header files in one header file (common.h) and build common.h. Then all the other source files just include "common.h". The compiler can be configured to automatically use the compiled header file when it sees the #include "common.h" for each source.

(i) There are way too many factors involved in determining this, even an approximation is impossible. Compilation can be memory, cpu or hard drive bound. The complexity of the files matter (from your description, your complexity is low).
(ii) The typical way of doing this is to make a library and let the system figure out linking or loading. You can choose static or dynamic linking.
static linking
Assuming you are using gcc, this would look like this:
g++ -c file1.cpp -o file1.o
g++ -c file2.cpp -o file2.o
...
g++ -c filen.cpp -o filen.o
ar -rc libvector.a file1.o file2.o ... filen.o
Then, when you build your own code, your final link looks like this:
g++ myfile.cpp libvector.a -o mytask
dynamic linking
Again, assuming you are using gcc, this would look like this:
g++ -c file1.cpp -fPIC -o file1.o
g++ -c file2.cpp -fPIC -o file2.o
...
g++ -c filen.cpp -fPIC -o filen.o
ld -G file1.o file2.o ... filen.o -o libvector.so
Then, when you build your own code, your final link looks like this:
g++ myfile.cpp libvector.so -o mytask
You will need libvector.so to be in the loader's path for your executable to work.
In any case, as long as the 50,000 files don't change, you will only need to do the last command (which will be much faster).

You can build each object file from a '.cpp' with having the '.h' file having lots (and I MEAN LOTS) of forward declarations - so when you change a .h file it does not need to recompile the rest of the program. Usually a function/method needs the name of the object in its parmaters or what it is returing. If it needs other details - yes it needs to be included.
Please get a book by Scott Myers - Will help you a lot.
Oh - When trying to eat a big cake - divied it up. The slices are more manageable.

We can't really say the time it will take to compile, but what you should do is compile each .cpp/.h pair into a .o file:
$ g++ -c -o test.o test.cpp ...
Once you have all of these, you compile the main program as so:
$ g++ -c -o main.o main.cpp
$ g++ -o main main.o test.o blah.o otherThings.o foo.o bar.o baz.o etc...
Your idea of using .sos is pretty much asking "how quickly can I crash the program and possibly the OS?". Shared libraries are ment for large libraries in small numbers, not 50,000 .sos linked to a binary (especially if you load them dynamicly...that would be BAD).

Automatically choosing object files for compilation

I have recently begun writing unit tests (using GoogleTest) for a C++ project. Building the main project is fairly simple: I use GCC's -MM and -MD flags to automatically generate dependencies for my object files, then I link all of the object files together for the output executable. No surpirses.
But as I'm writing unit tests, is there a way to have make or GCC figure out which object files are needed to compile each test? Right now, I have a fairly naive solution (if you can call it that) which compiles ALL available object files together for EVERY unit test, which is obviously wasteful (in terms of both time and space). Is there a way (with make, gcc, sed, or whatever else) to divine which object files are needed for a given unit test in a fashion similar to how dependencies are generated for the original source files?

It sounds like you have two groups of source files: one that actually implements your program, and another that's all the unit tests. I assume each unit test has its own main function, and unit tests never need to call each other.
If all that's true, you can put all the files from the first group in a static library, and link each of the unit tests against that library. The linker will automatically pull from the library only the object files that are needed.
In concrete Makefile terms:
LIBRARY_OBJECTS = a.o b.o c.o d.o # etc
UNIT_TESTS = u1 u2 u3 u4 # etc
UNIT_TEST_OBJECTS = $(UNIT_TESTS:=.o)
libprogram.a: $(LIBRARY_OBJECTS)
ar cr $# $?
$(UNIT_TESTS): %: %.o libprogram.a
$(CC) $(CFLAGS) -o $# $< -lprogram

You should look to a higher abstraction of project management, like Cmake or GNU Automake.

In your Makefile
SOURCES.cpp = a.cpp b.cpp ...
OBJECTS = $(SOURCES.cpp:%.cpp=%.o)
all: program
program: $(OBJECTS)
$(LINK) -o $# $(OBJECTS)

Maybe, depending on how orderly your test system is.
If there's a nice one-to-one relationship between header and source files, then you can use some text-converting functions (or a call to sed) to convert the machine-generated rule you already have:
foo.o: foo.cc foo.h bar.h gaz.h
into a rule for the corresponding test:
unit_test_foo: unit_test_foo.o foo.o stub_bar.o stub_gaz.o
Or if you use a lot of stubs without corresponding headers (which is a warning sign) you can link with every stub except stub_foo.o. These object files are small and don't change often, so it's cheap.

.o files vs .a files

What is the difference between these two file types. I see that my C++ app links against both types during the construction of the executable.
How to build .a files? links, references, and especially examples, are highly appreciated.

.o files are objects. They are the output of the compiler and input to the linker/librarian.
.a files are archives. They are groups of objects or static libraries and are also input into the linker.
Additional Content
I didn't notice the "examples" part of your question. Generally you will be using a makefile to generate static libraries.
AR = ar
CC = gcc
objects := hello.o world.o
libby.a: $(objects)
$(AR) rcu $# $(objects)
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $#
This will compile hello.c and world.c into objects and then archive them into library. Depending on the platform, you may also need to run a utility called ranlib to generate the table of contents on the archive.
An interesting side note: .a files are technically archive files and not libraries. They are analogous to zip files without compression though they use a much older file format. The table of contents generated by utilities like ranlib is what makes an archive a library. Java archive files (.jar) are similar in that they are zip files that have some special directory structures created by the Java archiver.

A .o file is the result of compiling a single compilation unit (essentially a source-code file, with associated header files) while a .a file is one or more .o files packaged up as a library.

D Shawley's answer is good, I just wanted to add a couple of points because other answers reflect an incomplete understanding of what's going on.
Keep in mind that archive files (.a) are not restricted to containing object files (.o). They may contain arbitrary files. Not often useful, but see dynamic linker dependenciy info embedded in an archive for a stupid linker trick.
Also notice that object files (.o) are not necessarily the result of a single compilation unit. It is possible to partially link several smaller object files into a single larger file.
http://www.mihaiu.name/2002/library_development_linux/ -- search in this page for "partial"

There is one more aspect of linking against .a vs .o files: when linking, all .os passed as arguments are included in the final executable, whereas entries from any .a arguments are only included in the linker output if they resolve a symbol dependency in the program.
More specifically, each .a file is an archive comprising multiple .o files. You can think of each .o being an atomic unit of code. If the linker needs a symbol from one of these units, the whole unit gets sucked into the final binary; but none of the others are unless they too are needed.
In contrast, when you pass a .o on the command line, the linker sucks it in because you requested it.
To illustrate this, consider the following example, where we have a static library comprising two objects a.o and b.o. Our program will only reference symbols from a.o. We will compare how the linker treats passing a.o and b.o together, vs. the static library which comprises the same two objects.
// header.hh
#pragma once
void say_hello_a();
void say_hello_b();
// a.cc
#include "header.hh"
#include <iostream>
char hello_a[] = "hello from a";
void say_hello_a()
{
std::cout << hello_a << '\n';
}
// b.cc
#include "header.hh"
#include <iostream>
char hello_b[] = "hello from b";
void say_hello_b()
{
std::cout << hello_b << '\n';
}
// main.cc
#include "header.hh"
int main()
{
say_hello_a();
}
We can compile the code using this Makefile:
.PHONY = compile archive link all clean
all: link
compile:
#echo ">>> Compiling..."
g++ -c a.cc b.cc main.cc
archive: compile
#echo ">>> Archiving..."
ar crs lib.a a.o b.o
link: archive
#echo ">>> Linking..."
g++ -o main_o main.o a.o b.o
g++ -o main_a main.o lib.a
clean:
rm *.o *.a main_a main_o
and obtain two executables main_o and main_a that differ in that the contents of a.cc and b.cc where provided through two .os in the first case and through a .a in the second.
Lastly we examine the symbols of the final executables using the nm tool:
$ nm --demangle main_o | grep hello
00000000000011e9 t _GLOBAL__sub_I_hello_a
000000000000126e t _GLOBAL__sub_I_hello_b
0000000000004048 D hello_a
0000000000004058 D hello_b
0000000000001179 T say_hello_a()
00000000000011fe T say_hello_b()
$ nm --demangle main_a | grep hello
00000000000011e9 t _GLOBAL__sub_I_hello_a
0000000000004048 D hello_a
0000000000001179 T say_hello_a()
and observe that main_a is in fact lacking the unneeded symbols from b.o. That is, the linker did not suck in the contents of b.o within the archive lib.a because none of the symbols from b.cc were referenced.

You can use ar to create .a file (static library) from .o files (object files)
See man ar for details.

I believe an .a file is an archive that can contain multiple object files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js