Wrapping symbols during linking on OS X

Wrapping symbols during linking on OS X - c++

I'm trying to wrap one symbol by another during link. As I understand this is easily done with ld --wrap option, but on OS X it's not available. There is '-idefinition:indirection', here how I'm trying to cheat main so it will print 42:
a.cpp:
int foo() {
return 1;
}
b.cpp:
int wrap_foo() {
return 42;
}
main.cpp:
#include <cstdio>
int foo();
int wrap_foo();
int main() {
printf("%d\n", foo());
}
How I build and link them:
$ gcc -c *.cpp
$ gcc -Wl,-i__Z3foov:__Z8wrap_foov *.o
duplicate symbol __Z3foov in:
a.o
b.o
ld: 1 duplicate symbol for architecture x86_64
Is it even possible to achieve what I want?
EDIT:
In my task I don't have control over a.cpp and main.cpp (have only objects for them), but I can write any code and do whatever I need with the objects before link.
I've found following quistion before which related to similar task (and describes it better)
GNU gcc/ld - wrapping a call to symbol with caller and callee defined in the same object file
Rereading the answers I see that the case is same, symbol already present and I will have a collision.
How to remove foo from a.o without touching source code?

Does this have to be done at link versus runtime? Note entirely sure what you're trying to do, so just asking.
The problem with using that linker command is that it creates a new symbol, which is why you get the duplicate symbol.
-alias symbol_name alternate_symbol_name
Create an alias named alternate_symbol_name for the symbol
symbol_name. By default the alias symbol has global visibil-
ity. This option was previous the -idef:indir option.
If you don't have a.cpp which defines foo (ie. you are omitting foo), you can do something like this:
tmp$ gcc -Wl,-alias,__Z8wrap_foov,__Z3foov b.o main.o
tmp$ ./a.out
42
What you could always do is have a.cpp have another foo like original_foo. This way if you wanted that implementation, you could use alias to map that to foo. Not perfect, but it would achieve roughly what you want. You're already going through the effort of adjusting it at link, so it would seem potentially acceptable.

Related

C++ static instance of a user-defined class results a double-call to constructor when compiled and linked in separate steps

So I have reduced the problem to a very simple program of an empty main() function and a very simple class as follows.
A.cpp
#include <iostream>
class A {
public:
A() {std::cout<<"Inside A()"<<std::endl;}
};
static A a;
test.cpp
#include "A.cpp"
int main() {}
Now consider 2 options for building this simple program into 2 different executables:
Generating program #1:
Compile with the following command (generate .o files from the .cpp files):
g++ -c test.cpp A.cpp
And then link with the following command:
g++ test.o A.o -o linkedTest
Generating program #2:
Compile and link at once with the following command:
g++ test.cpp -o test
So at this point we have 2 programs next to the source files (alongside the intermediate .o files): linkedTest and test.
Now, running the program test (command ./test) it will execute the constructor of the class A only once and print the text "Inside A()". In contrast, running the program linkedTest (command ./linkedTest) it will execute the constructor of the class A twice!
So my questions are : Why is this happening? Shouldn't the same compiler (at least) generate the same program out of the same source-code? What is exactly happening behind the stage and how to take control over it? Is this an anticipated compiler/linker behavior or it's a (un)known bug?
Any C++ gurus out there who could shed some light on this...?
For your reference, my GCC version is : gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Remember that static var are defined per-compile unit
In case of :
g++ -c test.cpp A.cpp
g++ test.o A.o -o linkedTest
the compiler create 2 obj each of one have his own static var A.
while by building only one obj :
g++ test.cpp -o test
you get one compilation unit and so one definition of A.

When you compile both test.cpp and A.cpp, you have two compilation units that both define a variable named a. Since that variable is declared as static, that is legal (otherwise you'd get an error about a being defined twice) and causes two independent variables to be defined with the same name. And since you get two variables, you also get two constructor calls.
When you only define test.cpp, there's only one compilation unit, only one a and thus only one constructor call.
PS: It's generally a bad idea to include source files into each other because it leads to issues like this.

It's unusual, and normally a bad idea, to #include a *.cpp file.
But you would get the same behavior if you used a header file like normal, and a second *.cpp file that includes it:
// A.hpp:
#ifndef TEST_CLASS_A_HPP
#define TEST_CLASS_A_HPP
#include <iostream>
class A {
public:
A() {std::cout<<"Inside A()"<<std::endl;}
};
static A a;
#endif
// A.cpp:
#include "A.hpp"
// and nothing else!
// test.cpp:
#include "A.cpp"
int main() {}
When compiling the program above in the normal way, there are two "translation units": one for A.cpp, which includes everything in A.hpp, and one in test.cpp, which also includes everything in A.hpp. Outside of any class or function, the keyword static means a definition has "internal linkage", so that it cannot be used from another translation unit, and if another translation unit defines something similar, it's defining a different object or function with the same name. So yes, the program has and automatically creates two objects both named a of type A.
Your original program which was made from both A.cpp and test.cpp which included A.cpp similarly had two translation units, each with its own object named a of type A. The version just compiling test.cpp had just the one translation unit, and one a object.

Trying to understand basic linker behaviour

I have a minimal cpp file with contents,
int main() {}
I then compile and link using
$ g++ -c main.cpp
$ g++ -o main main.o -Wl,-u,foo
Expected Behaviour: Since the linker cannot find the symbol foo, because it is not defined anywhere, I expected the linker to throw an error saying that the symbol foo is not found/resolved.
Actual behaviour: The link step succeeds.
Can someone help me understand this behaviour and how I can force the linker to error out when it cannot find the undefined symbol, possibly using some linker flag?
$ g++ --version
g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Thanks in advance!!!
EDIT: It is probably related to my misunderstanding about undefined symbols, so let me clarify my understanding of the linker. The linker searches for undefined symbols in the list sequentially through -llib... and foo in this case is one such undefined symbol. I was expecting that at the end of traversing the whole list of libraries, there shouldn't be any undefined symbols left i.e. all symbols must be defined. Am I wrong in thinking this?

That's simply not what -Wl,u does. From the ld documentation: "Force symbol to be entered in the output file as an undefined symbol.". So the -u creates a symbol table in your main executable, and adds foo to that.
I.e. the option doesn't affect whether an output file is generated.
You may want to try g++ -c main.cpp -u foo - pass it to the compiler instead of the linker, so the compiler can put foo in the symbol table of main.o. Now the linker will see foo in its inputs, and it will need to act on that.

GNU linker: Adapt to change of name mangling algorithm

I am trying to re-compile an existing C++ application.
Unfortunately, I must rely on a proprietary library I only have a pre-compiled static archive of.
I use g++ version 7.3.0 and ld version 2.30.
Whatever GCC version it was compiled with, it is ancient.
The header file defines the method:
class foo {
int bar(int & i);
}
As nm lib.a shows, the library archive contains the corresponding exported function:
T bar__4fooRi
nm app.o shows my recent compiler employing a different kind of name mangling:
U _ZN4foo9barERi
Hence the linker cannot resolve the symbols provided by the library.
Is there any option to chose the name mangling algorithm?
Can I introduce a map or define the mangled names explicitly?

#Botje's suggestion lead me to writing a linker script like this (the spaces in the PROVIDE stanza are significant):
EXTERN(bar__4fooRi);
PROVIDE(_ZN4foo9barERi = bar__4fooRi);
As far as I understood, this will regard bar__4fooRi as an externally defined symbol (which it is). If _ZN4foo9barERi is searched for, but not defined, bar__4fooRi will take its place.
I am calling the linker from the GNU toolchain like this (mind the order – the script needs to be after the dependant object but before the defining library):
g++ -o application application.o script.ld -lfoo
It looks like this could work.
At least in theory.
The linker now regards other parts of the library, which in turn depends on other unresolvable symbols including (but not limited to) __throw, __cp_pop_exception, and __builtin_delete. I have no idea where these functions are defined nowadays. Joxean Koret shows some locations in this blog post based on guesswork (__builtin_new probably is malloc) – but I am not that confident.
These findings lead me to the conclusion that the library relies on a different style of exception handling and probably memory management, too.
EDIT: The result may be purely academical due to ABI changes as pointed out by #eukaryota, a linker script can indeed be used to "alias" symbols. Here is a complete minimal example:
foo.h:
class Foo {
public:
int bar(int);
};
foo.cpp:
#include "foo.h"
int Foo::bar(int i) {
return i+21;
}
main.cpp:
class Foo {
public:
int baa(int); // use in-place "header" to simulate different name mangling algorithm
};
int main(int, char**) {
Foo f;
return f.baa(21);
}
script.ld:
EXTERN(_ZN3Foo3barEi);
PROVIDE(_ZN3Foo3baaEi = _ZN3Foo3barEi); /* declare "alias" */
Build process:
g++ -o libfoo.o -c foo.c
ar rvs libfoo.a libfoo.o # simulate building a library
g++ -o app main.o -L. script.ld -lfoo
app is compiled, can be executed and returns expected result.

Why doesn't the linker complain of duplicate symbols?

I have a dummy.hpp
#ifndef DUMMY
#define DUMMY
void dummy();
#endif
and a dummy.cpp
#include <iostream>
void dummy() {
std::cerr << "dummy" << std::endl;
}
and a main.cpp which use dummy()
#include "dummy.hpp"
int main(){
dummy();
return 0;
}
Then I compiled dummy.cpp to three libraries, libdummy1.a, libdummy2.a, libdummy.so:
g++ -c -fPIC dummy.cpp
ar rvs libdummy1.a dummy.o
ar rvs libdummy2.a dummy.o
g++ -shared -fPIC -o libdummy.so dummy.cpp
When I try compile main and link the dummy libs
g++ -o main main.cpp -L. -ldummy1 -ldummy2
There is no duplicate symbol error produced by linker. Why does this happen when I link two identical libraries statically?
When I try
g++ -o main main.cpp -L. -ldummy1 -ldummy
There is also no duplicate symbol error, Why?
The loader seems always to choose dynamic libs and not the code compiled in the .o files.
Does it mean the same symbol is always loaded from the .so file if it is both in a .a and a .so file?
Does it mean symbols in the static symbol table in static library never conflict with those in the dynamic symbol table in a .so file?

There's no error in either Scenario 1 (dual static libraries) or Scenario 2 (static and shared libraries) because the linker takes the first object file from a static library, or the first shared library, that it encounters that provides a definition of a symbol it has not yet got a definition for. It simply ignores any later definitions of the same symbol because it already has a good one. In general, the linker only takes what it needs from a library. With static libraries, that's strictly true. With shared libraries, all the symbols in the shared library are available if it satisfied any missing symbol; with some linkers, the symbols of the shared library may be available regardless, but other versions only record the use a shared library if that shared library provides at least one definition.
It's also why you need to link libraries after object files. You could add dummy.o to your linking commands and as long as that appears before the libraries, there'll be no trouble. Add the dummy.o file after libraries and you'll get doubly-defined symbol errors.
The only time you run into problems with this double definitions is if there's an object file in Library 1 that defines both dummy and extra, and there's an object file in Library 2 that defines both dummy and alternative, and the code needs the definitions of both extra and alternative — then you have duplicate definitions of dummy that cause trouble. Indeed, the object files could be in a single library and would cause trouble.
Consider:
/* file1.h */
extern void dummy();
extern int extra(int);
/* file1.cpp */
#include "file1.h"
#include <iostream>
void dummy() { std::cerr << "dummy() from " << __FILE__ << '\n'; }
int extra(int i) { return i + 37; }
/* file2.h */
extern void dummy();
extern int alternative(int);
/* file2.cpp */
#include "file2.h"
#include <iostream>
void dummy() { std::cerr << "dummy() from " << __FILE__ << '\n'; }
int alternative(int i) { return -i; }
/* main.cpp */
#include "file1.h"
#include "file2.h"
int main()
{
return extra(alternative(54));
}
You won't be able to link the object files from the three source files shown because of the double-definition of dummy, even though the main code does not call dummy().
Regarding:
The loader seems always to choose dynamic libs and not compiled in the .o files.
No; the linker always attempts to load object files unconditionally. It scans libraries as it encounters them on the command line, collecting definitions it needs. If the object files precede the libraries, there's not a problem unless two of the object files define the same symbol (does 'one definition rule' ring any bells?). If some of the object files follow libaries, you can run into conflicts if libraries define symbols that the later object files define. Note that when it starts out, the linker is looking for a definition of main. It collects the defined symbols and referenced symbols from each object file it is told about, and keeps adding code (from libraries) until all the referenced symbols are defined.
Does it means the same symbol is always loaded from .so file, if it is both in .a and .so file?
No; it depends which was encountered first. If the .a was encountered first, the .o file is effectively copied from the library into the executable (and the symbol in the shared library is ignored because there's already a definition for it in the executable). If the .so was encountered first, the definition in the .a is ignored because the linker is no longer looking for a definition of that symbol — it's already got one.
Does it mean that symbols in static symbol table in a static library are never in conflict with those in dynamic symbol table in .so file?
You can have conflicts, but the first definition encountered resolves the symbol for the linker. It only runs into conflicts if the code that satisfies the reference causes a conflict by defining other symbols that are needed.
If I link 2 shared libs, can I get conflicts and the link phase failed?
As I noted in a comment:
My immediate reaction is "Yes, you can". It would depend on the content of the two shared libraries, but you could run into problems, I believe. […cogitation…] How would you show this problem? … It's not as easy as it seems at first sight. What is required to demonstrate such a problem? … Or am I overthinking this? … […time to go play with some sample code…]
After some experimentation, my provisional, empirical answer is "No, you can't" (or "No, on at least some systems, you don't run into a conflict"). I'm glad I prevaricated.
Taking the code shown above (2 headers, 3 source files), and running with GCC 5.3.0 on Mac OS X 10.10.5 (Yosemite), I can run:
$ g++ -O -c main.cpp
$ g++ -O -c file1.cpp
$ g++ -O -c file2.cpp
$ g++ -shared -o libfile2.so file2.o
$ g++ -shared -o libfile1.so file1.o
$ g++ -o test2 main.o -L. -lfile1 -lfile2
$ ./test2
$ echo $?
239
$ otool -L test2
test2:
libfile2.so (compatibility version 0.0.0, current version 0.0.0)
libfile1.so (compatibility version 0.0.0, current version 0.0.0)
/opt/gcc/v5.3.0/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.21.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1213.0.0)
/opt/gcc/v5.3.0/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
$
It is aconventional to use .so as the extension on Mac OS X (it's usually .dylib), but it seems to work.
Then I revised the code in the .cpp files so that extra() calls dummy() before the return, and so does alternative() and main(). After recompiling and rebuilding the shared libraries, I ran the programs. The first line of output is from the dummy() called by main(). Then you get the other two lines produced by alternative() and extra() in that order because the calling sequence for return extra(alternative(54)); demands that.
$ g++ -o test2 main.o -L. -lfile1 -lfile2
$ ./test2
dummy() from file1.cpp
dummy() from file2.cpp
dummy() from file1.cpp
$ g++ -o test2 main.o -L. -lfile2 -lfile1
$ ./test2
dummy() from file2.cpp
dummy() from file2.cpp
dummy() from file1.cpp
$
Note that the function called by main() is the first one that appears in the libraries it is linked with. But (on Mac OS X 10.10.5 at least) the linker does not run into a conflict. Note, though, that the code in each shared object calls 'its own' version of dummy() — there is disagreement between the two shared libraries about which function is dummy(). (It would be interesting to have the dummy() function in separate object files in the shared libraries; then which version of dummy() gets called?) But in the extremely simple scenario shown, the main() function manages to call just one of the dummy() functions. (Note that I'd not be surprised to find differences between platforms for this behaviour. I've identified where I tested the code. Please let me know if you find different behaviour on some platform.)

Link dynamic shared library in Linux - Undefined reference to function

I know there are many questions related to shared libraries on Linux but maybe because I'm tired of having a hard day trying to create a simple dynamic library on Linux (on Windows it would have taken less than 10 minutes) I can't find what happens in this case.
So, I am trying to create a library to be linked at build-time and used at run-time (not a static library, not a library to be embedded into the executable, in other words). For now it contains a simple function. These are my files:
1.
// gugulibrary.cpp
// This is where my function is doing its job
#include "gugulibrary.h"
namespace GuGu {
void SayHello() {
puts("Hello!");
}
}
2.
// gugulibrary.h
// This is where I declare my shared functions
#include <stdio.h>
namespace Gugu {
void SayHello();
}
3.
// guguapp.cpp
// This is the executable using the library
#include "gugulibrary.h"
int main() {
GuGu::SayHello();
return 0;
}
This is how I try to build my project (and I think this is what is wrong):
gcc -Wall -s -O2 -fPIC -c gugulibrary.cpp -o gugulibrary.o
ld -shared -o bin/libGugu.so gugulibrary.o
gcc -Wall -s -O2 guguapp.cpp -o bin/GuGu -ldl
export LD_LIBRARY_PATH=bin
This is saved as a .sh file which I click and execute in a terminal. The error I get when trying to link the library is this:
/tmp/ccG05CQD.o: In function `main':
guguapp.cpp:(.text.startup+0x7): undefined reference to `SayHello'
collect2: ld returned 1 exit status
And this is where I am lost. I want the library to sit in the same folder as the executable for now and maybe I need some symbols/definitions file or something, which I don't know how to create.
Thanks for your help!

In your C++ file, GuGu::SayHello is declared as a C++ symbol. In your header, you are wrapping it in an extern "C" block. This is actually undefined, as you aren't allowed to use C++ syntax (namespace) in that context. But my guess is that what the compiler is doing is ignoring the namespace and generating a C symbol name of "SayHello". Obviously such a function was never defined by your library. Take out the extern "C" bits, because your API as defined cannot be used from C anyway.

You are inconsistent with your GuGu, there are also Gugu's running around, this needs to be made consistent, then it works (At least on my computer are some Gugu's now)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Wrapping symbols during linking on OS X - c++

Related

C++ static instance of a user-defined class results a double-call to constructor when compiled and linked in separate steps

Trying to understand basic linker behaviour

GNU linker: Adapt to change of name mangling algorithm

Why doesn't the linker complain of duplicate symbols?

Link dynamic shared library in Linux - Undefined reference to function

Categories

Resources