I want to programmatically mangle the name of a C++ function or variable - to get the symbol name which would appear in a compiled object file. I'm using Linux and GCC.
Now, why is this not trivial? e.g. typeid(foo).name()? Because that doesn't do what you want: Consider the following program:
#include <iostream>
extern int foo(int x) { return 0; }
extern double bar(int x) { return 1.0; }
int main()
{
std::cout << typeid(foo).name() << std::endl;
std::cout << typeid(bar).name() << std::endl;
}
Let's see what that gives us:
$ g++ -o a.o -O0 -c a.cpp
$ objdump -t a.o | egrep "(bar|foo)"
00000000000000dd l F .text 0000000000000015 _GLOBAL__sub_I__Z3fooi
0000000000000000 g F .text 000000000000000e _Z3fooi
000000000000000e g F .text 000000000000001b _Z3bari
$ ./a
FiiE
FiiE
Not at all the same thing.
Notes:
Were you thinking of namespace abi? So was I. The source doesn't seem to do mangling, only demangling.
Solutions for the same problem on other platform+compiler combinations are also interesting, and if you have them I'll expand the question scope.
Motivation: I want to dlsym() a non-extern-C function or a variable.
Reading Visibility in the GNU wiki, it is clear.
Taking this example from C++ Tutorials
// classes example
#include <iostream>
using namespace std;
class Rectangle {
int width, height;
public:
void set_values (int,int);
int area() {return width*height;}
};
void Rectangle::set_values (int x, int y) {
width = x;
height = y;
}
Is it possible to make area() public and set_values(int,int) local as shown in the first link without altering the code?
I wrote my makefile to get the .so
someproj.so : someproj.cpp
g++ --std=c++11 -O3 -fPIC -shared someproj.cpp -o someproj.so
Modified to make all symbols hidden by adding -fvisibility=hidden
someproj.so : someproj.cpp
g++ --std=c++11 -O3 -fvisibility=hidden -fPIC -shared someproj.cpp -o someproj.so
Is it possible to customized which functions are exposed by modifying the compilation command above?
Currently using 4.7.2 version of gcc
Is it possible to customize which functions are exposed by modifying the compilation command above?
No. Compilation option -fvisibility=[default|internal|hidden|protected]
(and note it is not a linkage option) makes the compiler attribute the specified dynamic visibility type to all global symbols
generated in the compilation unit except those that are specifically excluded by having a countervailing __attribute__((visibility(....)))
applied in the source code. Which makes the answer to your other question:
Is it possible to make area() public and set_values(int,int) local as shown in the first link without altering the code?
also No.
How would you change the source code to make Rectangle::area() dynamically
visible while all other global symbols are hidden for dynamic linkage by -fvisibility=hidden?
Here is a walk-through:
Let's start with:
rectangle.cpp (1)
class Rectangle {
int width, height;
public:
void set_values (int,int);
int area() {return width*height;}
};
void Rectangle::set_values (int x, int y) {
width = x;
height = y;
}
and simply compile it to a PIC rectangle.o so:
$ g++ -Wall -c -fPIC rectangle.cpp
Then check the global symbol table:
$ nm -C rectangle.o
0000000000000000 T Rectangle::set_values(int, int)
Note that Rectangle::area isn't there. It's not available for
linkage at all, so the question of its dynamic visibility just does not arise.
That is because it is defined inline in the class definition and never called
in the compilation unit, so gcc need not even compile its definition. It vanishes
from the object file.
Rectangle::set_values, on the other hand, is not defined inline, so the compiler
emits a global symbol and definition.
To make Rectangle::area eligible for some visibility type, we first need to make
it a global symbol by not defining it inline:
rectangle.cpp (2)
class Rectangle {
int width, height;
public:
void set_values (int,int);
int area();
};
int Rectangle::area() {return width*height;}
void Rectangle::set_values (int x, int y) {
width = x;
height = y;
}
Recompile and again check the global symbol table:
$ g++ -Wall -c -fPIC rectangle.cpp
$ nm -C rectangle.o
000000000000001a T Rectangle::set_values(int, int)
0000000000000000 T Rectangle::area()
Good. Now a global definition of Rectangle::area appears.
Next let's make a shared library librectangle.so from rectangle.o:
$ g++ -o librectangle.so --shared rectangle.o
Here are the Rectangle::* symbols in its global symbol table:
$ nm -C librectangle.so | grep 'Rectangle::'
00000000000005d4 T Rectangle::set_values(int, int)
00000000000005ba T Rectangle::area()
And here are the Rectangle::* symbols in its dynamic symbol table:
$ nm -CD librectangle.so | grep 'Rectangle::'
00000000000005d4 T Rectangle::set_values(int, int)
00000000000005ba T Rectangle::area()
They're the same.
Now let's hide those symbols for dynamic linkage. We need to recompile rectangle.cpp
then relink the shared library:
$ g++ -Wall -c -fPIC -fvisibility=hidden rectangle.cpp
$ g++ -o librectangle.so --shared rectangle.o
Here again are the Rectangle::* symbols now in the global symbol table:
$ nm -C librectangle.so | grep 'Rectangle::'
0000000000000574 t Rectangle::set_values(int, int)
000000000000055a t Rectangle::area()
They're the same as before.
And here are the Rectangle::* symbols now in the dynamic symbol table:
$ nm -CD librectangle.so | grep 'Rectangle::'; echo Done
Done
Now there are none, thanks to -fvisibility=hidden.
Finally, let's make just Rectangle::area dynamically visible, keeping all
the other global symbols dynamically hidden. We need to change the source code
again:
rectangle.cpp (3)
class Rectangle {
int width, height;
public:
void set_values (int,int);
__attribute__((visibility("default"))) int area();
};
int Rectangle::area() {return width*height;}
void Rectangle::set_values (int x, int y) {
width = x;
height = y;
}
Then recompile and relink:
$ g++ -Wall -c -fPIC -fvisibility=hidden rectangle.cpp
$ g++ -o librectangle.so --shared rectangle.o
The global symbol table still shows:
$ nm -C librectangle.so | grep 'Rectangle::'
00000000000005a4 t Rectangle::set_values(int, int)
000000000000058a T Rectangle::area()
And the dynamic symbol table only shows:
$ nm -CD librectangle.so | grep 'Rectangle::'
000000000000058a T Rectangle::area()
Rectangle::area is now the only symbol that the shared library exposes for
dynamic linkage.
And before you go...
One thing more about:
Is it possible to make area() public and set_values(int,int) local as shown in the first link without altering the code?
Making a symbol hidden for dynamic linkage doesn't make it local. Dynamic visibility (default|internal|hidden|protected)
is an attribute of global symbols only. For linkage purposes, local symbols don't exist. The only ways to
make a symbol local that would otherwise be global are:-
In C or C++ source, qualify its definition with the static keyword
In C++ source, enclose its definition in an anonymous namespace
Then the symbol does not appear in the global, or dynamic, symbol tables.
I wanted to introduce a weak symbol into my code, however, I am unable to comprehend its behavior when *.a files are used.
This is my minimal example:
file a.h:
void foo() __attribute__((weak));
file a.c:
#include "a.h"
#include <stdio.h>
void foo() { printf("%s\n", __FILE__); }
file b.c:
#include <stdio.h>
void foo() { printf("%s\n", __FILE__); }
file main.cpp:
#include "a.h"
#include <stdio.h>
int main() { if (foo) foo(); else printf("no foo\n"); }
Now, depending if I use *.o files (make -c a.c and make -c b.c) or *.a files (ar cr a.o and ar cr b.o) the output is different:
1) g++ main.cpp a.o b.o prints b.c
2) g++ main.cpp b.o a.o prints b.c
3) g++ main.cpp a.a b.a prints no foo
4) g++ main.cpp b.a a.a prints no foo
1), 2) work just fine but the output for 3), 4) seems to be a little unexpected.
I was desperately trying to make this example work with archives so I made few changes:
file a.h:
void foo();
file a.c:
#include "a.h"
#include <stdio.h>
void __attribute__((weak)) foo() { printf("%s\n", __FILE__); }
After this modification:
1) g++ main.cpp a.a b.a prints a.c
2) g++ main.cpp b.a a.a prints b.c
So it works a bit better. After running nm a.a shows W _Z3foov so there is no violation of ODR. However, I don't know if this is a correct usage of weak attribute. According to gcc documentation:
The weak attribute causes the declaration to be emitted as a weak symbol rather than a global. This is primarily useful in defining library functions which can be overridden in user code, though it can also be used with non-function declarations. Weak symbols are supported for ELF targets, and also for a.out targets when using the GNU assembler and linker.
Yet I use weak attribute on the function definition not the declaration.
So the question is why weak doesn't work with *.a files? Is usage of weak attribute on a definition instead of a declaration correct?
UPDATE
It has dawned on me that weak attribute used with foo() method definition had no impact on the symbol resolution. Without the attribute final binary generates the same:
1) g++ main.cpp a.a b.a prints a.c
2) g++ main.cpp b.a a.a prints b.c
So simply the first definition of the symbol is used and this is consisten with default gcc behaviour. Even though nm a.a shows that a weak symbol was emitted, it doesn't seem to affect static linking.
Is it possible to use weak attribute with static linking at all?
DESCRIPTION OF THE PROBLEM I WANT TO SOLVE
I have a library that is used by >20 clients, let's call it library A. I also provide a library B which contains testing utils for A. Somehow I need to know that library A is used in testing mode, so the simplest solution seems to be replacing a symbol during linking with B (because clients are already linking with B).
I know there are cleaner solutions to this problem, however I absolutely can't impact clients' code or their build scripts (adding parameter that would indicate testing for A or some DEFINE for compilation is out of option).
To explain what's going on here, let's talk first about your original source files, with
a.h (1):
void foo() __attribute__((weak));
and:
a.c (1):
#include "a.h"
#include <stdio.h>
void foo() { printf("%s\n", __FILE__); }
The mixture of .c and .cpp files in your sample code is irrelevant to the
issues, and all the code is C, so we'll say that main.cpp is main.c and
do all compiling and linking with gcc:
$ gcc -Wall -c main.c a.c b.c
ar rcs a.a a.o
ar rcs b.a b.o
First let's review the differences between a weakly declared symbol, like
your:
void foo() __attribute__((weak));
and a strongly declared symbol, like
void foo();
which is the default:
When a weak reference to foo (i.e. a reference to weakly declared foo) is linked in a program, the
linker need not find a definition of foo anywhere in the linkage: it may remain
undefined. If a strong reference to foo is linked in a program,
the linker needs to find a definition of foo.
A linkage may contain at most one strong definition of foo (i.e. a definition
of foo that declares it strongly). Otherwise a multiple-definition error results.
But it may contain multiple weak definitions of foo without error.
If a linkage contains one or more weak definitions of foo and also a strong
definition, then the linker chooses the strong definition and ignores the weak
ones.
If a linkage contains just one weak definition of foo and no strong
definition, inevitably the linker uses the one weak definition.
If a linkage contains multiple weak definitions of foo and no strong
definition, then the linker chooses one of the weak definitions arbitrarily.
Next, let's review the differences between inputting an object file in a linkage
and inputting a static library.
A static library is merely an ar archive of object files that we may offer to
the linker from which to select the ones it needs to carry on the linkage.
When an object file is input to a linkage, the linker unconditionally links it
into the output file.
When static library is input to a linkage, the linker examines the archive to
find any object files within it that provide definitions it needs for unresolved symbol references
that have accrued from input files already linked. If it finds any such object files
in the archive, it extracts them and links them into the output file, exactly as
if they were individually named input files and the static library was not mentioned at all.
With these observations in mind, consider the compile-and-link command:
gcc main.c a.o b.o
Behind the scenes gcc breaks it down, as it must, into a compile-step and link
step, just as if you had run:
gcc -c main.c # compile
gcc main.o a.o b.o # link
All three object files are linked unconditionally into the (default) program ./a.out. a.o contains a
weak definition of foo, as we can see:
$ nm --defined a.o
0000000000000000 W foo
Whereas b.o contains a strong definition:
$ nm --defined b.o
0000000000000000 T foo
The linker will find both definitions and choose the strong one from b.o, as we can
also see:
$ gcc main.o a.o b.o -Wl,-trace-symbol=foo
main.o: reference to foo
a.o: definition of foo
b.o: definition of foo
$ ./a.out
b.c
Reversing the linkage order of a.o and b.o will make no difference: there's
still exactly one strong definition of foo, the one in b.o.
By contrast consider the compile-and-link command:
gcc main.cpp a.a b.a
which breaks down into:
gcc -c main.cpp # compile
gcc main.o a.a b.a # link
Here, only main.o is linked unconditionally. That puts an undefined weak reference
to foo into the linkage:
$ nm --undefined main.o
w foo
U _GLOBAL_OFFSET_TABLE_
U puts
That weak reference to foo does not need a definition. So the linker will
not attempt to find a definition that resolves it in any of the object files in either a.a or b.a and
will leave it undefined in the program, as we can see:
$ gcc main.o a.a b.a -Wl,-trace-symbol=foo
main.o: reference to foo
$ nm --undefined a.out
w __cxa_finalize##GLIBC_2.2.5
w foo
w __gmon_start__
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
U __libc_start_main##GLIBC_2.2.5
U puts##GLIBC_2.2.5
Hence:
$ ./a.out
no foo
Again, it doesn't matter if you reverse the linkage order of a.a and b.a,
but this time it is because neither of them contributes anything to the linkage.
Let's turn now to the different behavior you discovered by changing a.h and a.c
to:
a.h (2):
void foo();
a.c (2):
#include "a.h"
#include <stdio.h>
void __attribute__((weak)) foo() { printf("%s\n", __FILE__); }
Once again:
$ gcc -Wall -c main.c a.c b.c
main.c: In function ‘main’:
main.c:4:18: warning: the address of ‘foo’ will always evaluate as ‘true’ [-Waddress]
int main() { if (foo) foo(); else printf("no foo\n"); }
See that warning? main.o now contains a strongly declared reference to foo:
$ nm --undefined main.o
U foo
U _GLOBAL_OFFSET_TABLE_
so the code (when linked) must have a non-null address for foo. Proceeding:
$ ar rcs a.a a.o
$ ar rcs b.a b.o
Then try the linkage:
$ gcc main.o a.o b.o
$ ./a.out
b.c
And with the object files reversed:
$ gcc main.o b.o a.o
$ ./a.out
b.c
As before, the order makes no difference. All the object files are linked. b.o provides
a strong definition of foo, a.o provides a weak one, so b.o wins.
Next try the linkage:
$ gcc main.o a.a b.a
$ ./a.out
a.c
And with the order of the libraries reversed:
$ gcc main.o b.a a.a
$ ./a.out
b.c
That does make a difference. Why? Let's redo the linkages with diagnostics:
$ gcc main.o a.a b.a -Wl,-trace,-trace-symbol=foo
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
main.o
(a.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
main.o: reference to foo
a.a(a.o): definition of foo
Ignoring the default libraries, the only object files of ours that get
linked were:
main.o
(a.a)a.o
And the definition of foo was taken from the archive member a.o of a.a:
a.a(a.o): definition of foo
Reversing the library order:
$ gcc main.o b.a a.a -Wl,-trace,-trace-symbol=foo
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
main.o
(b.a)b.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
main.o: reference to foo
b.a(b.o): definition of foo
This time the object files linked were:
main.o
(b.a)b.o
And the definition of foo was taken from b.o in b.a:
b.a(b.o): definition of foo
In the first linkage, the linker had an unresolved strong reference to
foo for which it needed a definition when it reached a.a. So it
looked in the archive for an object file that provides a definition,
and found a.o. That definition was a weak one, but that didn't matter. No
strong definition had been seen. a.o was extracted from a.a and linked,
and the reference to foo was thus resolved. Next b.a was reached, where
a strong definition of foo would have been found in b.o, if the linker still needed one
and looked for it. But it didn't need one any more and didn't look. The linkage:
gcc main.o a.a b.a
is exactly the same as:
gcc main.o a.o
And likewise the linkage:
$ gcc main.o b.a a.a
is exactly the same as:
$ gcc main.o b.o
Your real problem...
... emerges in one of your comments to the post:
I want to override [the] original function implementation when linking with a testing framework.
You want to link a program inputting some static library lib1.a
which has some member file1.o that defines a symbol foo, and you want to knock out
that definition of foo and link a different one that is defined in some other object
file file2.o.
__attribute__((weak)) isn't applicable to that problem. The solution is more
elementary. You just make sure to input file2.o to the linkage before you input
lib1.a (and before any other input that provides a definition of foo).
Then the linker will resolve references to foo with the definition provided in file2.o and will not try to find any other
definition when it reaches lib1.a. The linker will not consume lib1.a(file1.o) at all. It might as well not exist.
And what if you have put file2.o in another static library lib2.a? Then inputting
lib2.a before lib1.a will do the job of linking lib2.a(file2.o) before
lib1.a is reached and resolving foo to the definition in file2.o.
Likewise, of course, every definition provided by members of lib2.a will be linked in
preference to a definition of the same symbol provided in lib1.a. If that's not what
you want, then don't like lib2.a: link file2.o itself.
Finally
Is it possible to use [the] weak attribute with static linking at all?
Certainly. Here is a first-principles use-case:
foo.h (1)
#ifndef FOO_H
#define FOO_H
int __attribute__((weak)) foo(int i)
{
return i != 0;
}
#endif
aa.c
#include "foo.h"
int a(void)
{
return foo(0);
}
bb.c
#include "foo.h"
int b(void)
{
return foo(42);
}
prog.c
#include <stdio.h>
extern int a(void);
extern int b(void);
int main(void)
{
puts(a() ? "true" : "false");
puts(b() ? "true" : "false");
return 0;
}
Compile all the source files, requesting a seperate ELF section for each function:
$ gcc -Wall -ffunction-sections -c prog.c aa.c bb.c
Note that the weak definition of foo is compiled via foo.h into both
aa.o and bb.o, as we can see:
$ nm --defined aa.o
0000000000000000 T a
0000000000000000 W foo
$ nm --defined bb.o
0000000000000000 T b
0000000000000000 W foo
Now link a program from all the object files, requesting the linker to
discard unused sections (and give us the map-file, and some diagnostics):
$ gcc prog.o aa.o bb.o -Wl,--gc-sections,-Map=mapfile,-trace,-trace-symbol=foo
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
prog.o
aa.o
bb.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
aa.o: definition of foo
This linkage is no different from:
$ ar rcs libaabb.a aa.o bb.o
$ gcc prog.o libaabb.a
Despite the fact that both aa.o and bb.o were loaded, and each contains
a definition of foo, no multiple-definition error results, because each definition
is weak. aa.o was loaded before bb.o and the definition of foo was linked from aa.o.
So what happened to the definition of foo in bb.o? The mapfile shows us:
mapfile (1)
...
...
Discarded input sections
...
...
.text.foo 0x0000000000000000 0x13 bb.o
...
...
The linker discarded the function section that contained the definition
in bb.o
Let's reverse the linkage order of aa.o and bb.o:
$ gcc prog.o bb.o aa.o -Wl,--gc-sections,-Map=mapfile,-trace,-trace-symbol=foo
...
prog.o
bb.o
aa.o
...
bb.o: definition of foo
And now the opposite thing happens. bb.o is loaded before aa.o. The
definition of foo is linked from bb.o and:
mapfile (2)
...
...
Discarded input sections
...
...
.text.foo 0x0000000000000000 0x13 aa.o
...
...
the definition from aa.o is chucked away.
There you see how the linker arbitrarily chooses one of multiple
weak definitions of a symbol, in the absence of a strong definition. It simply
picks the first one you give it and ignores the rest.
What we've just done here is effectively what the GCC C++ compiler does for us when we
define a global inline function. Rewrite:
foo.h (2)
#ifndef FOO_H
#define FOO_H
inline int foo(int i)
{
return i != 0;
}
#endif
Rename our source files *.c -> *.cpp; compile and link:
$ g++ -Wall -c prog.cpp aa.cpp bb.cpp
Now there is a weak definition of foo (C++ mangled) in each of aa.o and bb.o:
$ nm --defined aa.o bb.o
aa.o:
0000000000000000 T _Z1av
0000000000000000 W _Z3fooi
bb.o:
0000000000000000 T _Z1bv
0000000000000000 W _Z3fooi
The linkage uses the first definition it finds:
$ g++ prog.o aa.o bb.o -Wl,-Map=mapfile,-trace,-trace-symbol=_Z3fooi
...
prog.o
aa.o
bb.o
...
aa.o: definition of _Z3fooi
bb.o: reference to _Z3fooi
and throws away the other one:
mapfile (3)
...
...
Discarded input sections
...
...
.text._Z3fooi 0x0000000000000000 0x13 bb.o
...
...
And as you may know, every instantiation of the C++ function template in
global scope (or instantiation of a class template member function) is
an inline global function. Rewrite again:
#ifndef FOO_H
#define FOO_H
template<typename T>
T foo(T i)
{
return i != 0;
}
#endif
Recompile:
$ g++ -Wall -c prog.cpp aa.cpp bb.cpp
Again:
$ nm --defined aa.o bb.o
aa.o:
0000000000000000 T _Z1av
0000000000000000 W _Z3fooIiET_S0_
bb.o:
0000000000000000 T _Z1bv
0000000000000000 W _Z3fooIiET_S0_
each of aa.o and bb.o has a weak definition of:
$ c++filt _Z3fooIiET_S0_
int foo<int>(int)
and the linkage behaviour is now familiar. One way:
$ g++ prog.o aa.o bb.o -Wl,-Map=mapfile,-trace,-trace-symbol=_Z3fooIiET_S0_
...
prog.o
aa.o
bb.o
...
aa.o: definition of _Z3fooIiET_S0_
bb.o: reference to _Z3fooIiET_S0_
and the other way:
$ g++ prog.o bb.o aa.o -Wl,-Map=mapfile,-trace,-trace-symbol=_Z3fooIiET_S0_
...
prog.o
bb.o
aa.o
...
bb.o: definition of _Z3fooIiET_S0_
aa.o: reference to _Z3fooIiET_S0_
Our program's behavior is unchanged by the rewrites:
$ ./a.out
false
true
So the application of the weak attribute to symbols in the linkage of ELF objects -
whether static or dynamic - enables the GCC implementation of C++ templates
for the GNU linker. You could fairly say it enables the GCC implementation of modern C++.
I find that here is the best explanation:
The linker will only search through libraries to resolve a reference if it cannot resolve that reference after searching all input objects. If required, the libraries are searched from left to right according to their position on the linker command line. Objects within the library will be searched by the order in which they were archived. As soon as armlink finds a symbol match for the reference, the searching is finished, even if it matches a weak definition.
The ELF ABI section 4.6.1.2 says:
"A weak definition does not change the rules by which object files are selected from libraries. However, if a link set contains both a weak definition and a non-weak definition, the non-weak definition will always be used."
The "link set" is the set of objects that have been loaded by the linker. It does not include objects from libraries that are not required.
Therefore archiving two objects where one contains the weak definition of a given symbol and the other contains the non-weak definition of that symbol, into a library or separate libraries, is not recommended.
Observe the following. Basically renamed mv a.c definition.c mv b.c noweak.c and mv second_a.c declaration.c.
> for i in Makefile *.c; do echo "cat $i <<EOF"; cat $i; echo EOF; done
cat Makefile <<EOF
tgt=
tgt+=only_weak_1.out only_weak_2.out
tgt+=definition.out declaration.out noweak.out
tgt+=definition_static.out declaration_static.out noweak_static.out
tgt+=1.out 2.out 3.out 4.out
tgt+=5.out 6.out 7.out 8.out
tgt+=10.out 11.out 12.out
tgt+=13.out
tgt+=14.out
only_weak_1_obj= definition.o declaration.o
only_weak_2_obj= declaration.o definition.o
definition_obj= definition.o
declaration_obj= declaration.o
noweak_obj= noweak.o
definition_static_obj= definition.a
declaration_static_obj= declaration.a
noweak_static_obj= noweak.a
1_obj= declaration.o noweak.o
2_obj= noweak.o declaration.o
3_obj= declaration.a noweak.a
4_obj= noweak.a declaration.a
5_obj= definition.o noweak.o
6_obj= noweak.o definition.o
7_obj= definition.a noweak.a
8_obj= noweak.a definition.a
10_obj= noweak.a definition.a declaration.a
11_obj= definition.a declaration.a noweak.a
12_obj= declaration.a definition.a noweak.a
13_obj= all.a
14_obj= all.o
.PRECIOUS: % %.o %.c %.a
def: run
.PHONY: run
run: $(tgt)
{ $(foreach a,$^,echo "$($(a:.out=)_obj)#->#$(a)#:#$$(./$(a))";) } | { echo; column -t -s'#' -N 'objects, ,executable, ,output' -o' '; echo; }
.SECONDEXPANSION:
%.out: main.o $$(%_obj)
$(CC) -o $# $^
%.o: %.c
$(CC) -c -o $# $^
%.a: %.o
ar cr $# $^
all.a: declaration.o definition.o noweak.o
ar cr $# $^
all.o: declaration.o definition.o noweak.o
$(LD) -i -o $# $^
clean:
rm -fv *.o *.a *.out
EOF
cat declaration.c <<EOF
#include <stdio.h>
__attribute__((__weak__)) void foo();
void foo() { printf("%s\n", __FILE__); }
EOF
cat definition.c <<EOF
#include <stdio.h>
__attribute__((__weak__)) void foo() { printf("%s\n", __FILE__); }
EOF
cat main.c <<EOF
#include <stdio.h>
void foo();
int main() {
if (foo) foo(); else printf("no foo\n");
return 0;
}
EOF
cat noweak.c <<EOF
#include <stdio.h>
void foo() { printf("%s\n", __FILE__); }
EOF
> make
cc -c -o definition.o definition.c
cc -c -o declaration.o declaration.c
cc -c -o main.o main.c
cc -o only_weak_1.out main.o definition.o declaration.o
cc -o only_weak_2.out main.o declaration.o definition.o
cc -o definition.out main.o definition.o
cc -o declaration.out main.o declaration.o
cc -c -o noweak.o noweak.c
cc -o noweak.out main.o noweak.o
ar cr definition.a definition.o
cc -o definition_static.out main.o definition.a
ar cr declaration.a declaration.o
cc -o declaration_static.out main.o declaration.a
ar cr noweak.a noweak.o
cc -o noweak_static.out main.o noweak.a
cc -o 1.out main.o declaration.o noweak.o
cc -o 2.out main.o noweak.o declaration.o
cc -o 3.out main.o declaration.a noweak.a
cc -o 4.out main.o noweak.a declaration.a
cc -o 5.out main.o definition.o noweak.o
cc -o 6.out main.o noweak.o definition.o
cc -o 7.out main.o definition.a noweak.a
cc -o 8.out main.o noweak.a definition.a
cc -o 10.out main.o noweak.a definition.a declaration.a
cc -o 11.out main.o definition.a declaration.a noweak.a
cc -o 12.out main.o declaration.a definition.a noweak.a
ar cr all.a declaration.o definition.o noweak.o
cc -o 13.out main.o all.a
ld -i -o all.o declaration.o definition.o noweak.o
cc -o 14.out main.o all.o
{ echo "definition.o declaration.o#->#only_weak_1.out#:#$(./only_weak_1.out)"; echo "declaration.o definition.o#->#only_weak_2.out#:#$(./only_weak_2.out)"; echo "definition.o#->#definition.out#:#$(./definition.out)"; echo "declaration.o#->#declaration.out#:#$(./declaration.out)"; echo "noweak.o#->#noweak.out#:#$(./noweak.out)"; echo "definition.a#->#definition_static.out#:#$(./definition_static.out)"; echo "declaration.a#->#declaration_static.out#:#$(./declaration_static.out)"; echo "noweak.a#->#noweak_static.out#:#$(./noweak_static.out)"; echo "declaration.o noweak.o#->#1.out#:#$(./1.out)"; echo "noweak.o declaration.o#->#2.out#:#$(./2.out)"; echo "declaration.a noweak.a#->#3.out#:#$(./3.out)"; echo "noweak.a declaration.a#->#4.out#:#$(./4.out)"; echo "definition.o noweak.o#->#5.out#:#$(./5.out)"; echo "noweak.o definition.o#->#6.out#:#$(./6.out)"; echo "definition.a noweak.a#->#7.out#:#$(./7.out)"; echo "noweak.a definition.a#->#8.out#:#$(./8.out)"; echo "noweak.a definition.a declaration.a#->#10.out#:#$(./10.out)"; echo "definition.a declaration.a noweak.a#->#11.out#:#$(./11.out)"; echo "declaration.a definition.a noweak.a#->#12.out#:#$(./12.out)"; echo "all.a#->#13.out#:#$(./13.out)"; echo "all.o#->#14.out#:#$(./14.out)"; } | { echo; column -t -s'#' -N 'objects, ,executable, ,output' -o' '; echo; }
objects executable output
definition.o declaration.o -> only_weak_1.out : definition.c
declaration.o definition.o -> only_weak_2.out : declaration.c
definition.o -> definition.out : definition.c
declaration.o -> declaration.out : declaration.c
noweak.o -> noweak.out : noweak.c
definition.a -> definition_static.out : definition.c
declaration.a -> declaration_static.out : declaration.c
noweak.a -> noweak_static.out : noweak.c
declaration.o noweak.o -> 1.out : noweak.c
noweak.o declaration.o -> 2.out : noweak.c
declaration.a noweak.a -> 3.out : declaration.c
noweak.a declaration.a -> 4.out : noweak.c
definition.o noweak.o -> 5.out : noweak.c
noweak.o definition.o -> 6.out : noweak.c
definition.a noweak.a -> 7.out : definition.c
noweak.a definition.a -> 8.out : noweak.c
noweak.a definition.a declaration.a -> 10.out : noweak.c
definition.a declaration.a noweak.a -> 11.out : definition.c
declaration.a definition.a noweak.a -> 12.out : declaration.c
all.a -> 13.out : declaration.c
all.o -> 14.out : noweak.c
In case only weak symbols are used (case only_weak_1 and only_weak_2) the first definition is used.
In case of only static libraries (case 3, 4, 7, 8, 10, 11, 12, 13) the first definition is used.
In case only object files are used (cases 1, 2, 5, 6, 14) the weak symbols are omitted and only the symbol from noweak is used.
From the link I provided:
There are different ways to guarantee armlink selecting a non-weak version of a given symbol:
- Do not archive such objects
- Ensure that the weak and non-weak symbols are contained within the same object before archiving
- Use partial linking as an alternative.
I have a C++ dynamic library (on macOS) that has a templated function with some explicit instantiations that are exported in the public API. Client code only sees the template declaration; they have no idea what goes on inside it and are relying on these instantiations to be available at link time.
For some reason, only some of these explicit instantiations are made visible in the dynamic library.
Here is a simple example:
// libtest.cpp
#define VISIBLE __attribute__((visibility("default")))
template<typename T> T foobar(T arg) {
return arg;
}
template int VISIBLE foobar(int);
template int* VISIBLE foobar(int*);
I would expect both instantiations to be visible, but only the non-pointer one is:
$ clang++ -dynamiclib -O2 -Wall -Wextra -std=c++1z -stdlib=libc++ -fvisibility=hidden -fPIC libtest.cpp -o libtest.dylib
$ nm -gU libtest.dylib | c++filt
0000000000000f90 T int foobar<int>(int)
This test program fails to link because the pointer one is missing:
// client.cpp
template<typename T> T foobar(T); // assume this was in the library header
int main() {
foobar<int>(1);
foobar<int*>(nullptr);
return 0;
}
$ clang++ -O2 -Wall -Wextra -std=c++1z -stdlib=libc++ -L. -ltest client.cpp -o client
Undefined symbols for architecture x86_64:
"int* foobar<int*>(int*)", referenced from:
_main in client-e4fe7d.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
There does seem to be some connection between the types and the visibility. If I change the return type to void, they are all visible (even if the template arguments are still pointers or whatever). Especially bizarre, this exports both:
template auto VISIBLE foobar(int) -> int;
template auto VISIBLE foobar(int*) -> int*;
Is this a bug? Why would apparent syntactic sugar change behavior?
It works if I change the template definition to be visible, but it seems non-ideal because only a few of these instantiations should be exported... and I still want to understand why this is happening, either way.
I am using Apple LLVM version 8.0.0 (clang-800.0.42.1).
Your problem is reproducible on linux:
$ clang++ --version
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ clang++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden \
-fPIC libtest.cpp -o libtest.so
$ nm -C libtest.so | grep foobar
0000000000000620 W int foobar<int>(int)
0000000000000630 t int* foobar<int*>(int*)
The non-pointer overload is weakly global but the pointer overload is
local.
The cause of this is obscured by clang's slack diagnosing of the __attribute__
syntax extension, which after all is a GCC invention. If we compile with
g++ instead we get:
$ g++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
libtest.cpp:9:36: warning: ‘visibility’ attribute ignored on non-class types [-Wattributes]
template int * VISIBLE foobar(int *);
^
Notice that g++ ignores the visibility attribute only in the pointer overload,
and, just like clang - and consistent with that warning - it emits code with:
$ nm -C libtest.so | grep foobar
0000000000000610 W int foobar<int>(int)
0000000000000620 t int* foobar<int*>(int*)
Clearly clang is doing the same thing, but not telling us why.
The difference between the overloads that satisfies g++ with one and
dissatisfies it with the other is the difference between int and int *.
On that basis we'd expect g++ to be satisfied with the change:
template int VISIBLE foobar(int);
//template int * VISIBLE foobar(int *);
template float VISIBLE foobar(float);
And so it is:
$ g++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -C libtest.so | grep foobar
0000000000000650 W float foobar<float>(float)
0000000000000640 W int foobar<int>(int)
And so is clang:
$ clang++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -C libtest.so | grep foobar
0000000000000660 W float foobar<float>(float)
0000000000000650 W int foobar<int>(int)
Both of them will do what you want for overloads with T a non-pointer type, but
not with T a pointer type.
What you face here, however, is not a ban on dynamically visible functions
that return pointers rather than non-pointers. It couldn't have escaped notice if
visibility was as broken as that. It is just a ban on types of the form:
D __attribute__((visibility("...")))
where D is a pointer or reference type, as distinct from types of the form:
E __attribute__((visibility("..."))) *
or:
E __attribute__((visibility("..."))) &
where E is not a pointer or reference type. The distinction is between:
A (pointer or reference that has visibility ...) to type D
and:
A (pointer or reference to type E) that has visibility ...
See:
$ cat demo.cpp
int xx ;
int __attribute__((visibility("default"))) * pvxx; // OK
int * __attribute__((visibility("default"))) vpxx; // Not OK
int __attribute__((visibility("default"))) & rvxx = xx; // OK,
int & __attribute__((visibility("default"))) vrxx = xx; // Not OK
$ g++ -shared -Wall -Wextra -std=c++1z -fvisibility=hidden -o libdemo.so demo.cpp
demo.cpp:3:46: warning: ‘visibility’ attribute ignored on non-class types [-Wattributes]
int * __attribute__((visibility("default"))) vpxx; // Not OK
^
demo.cpp:5:46: warning: ‘visibility’ attribute ignored on non-class types [-Wattributes]
int & __attribute__((visibility("default"))) vrxx = xx; // Not OK
^
$ nm -C libdemo.so | grep xx
0000000000201030 B pvxx
0000000000000620 R rvxx
0000000000201038 b vpxx
0000000000000628 r vrxx
0000000000201028 b xx
The OK declarations become global symbols; the Not OK ones become local,
and only the former are dynamically visible:
nm -CD libdemo.so | grep xx
0000000000201030 B pvxx
0000000000000620 R rvxx
This behaviour is reasonable. We can't expect a compiler to attribute
global, dynamic visibility to a pointer or reference that could point or
refer to something that does not have global or dynamic visibility.
This reasonable behaviour only appears to frustrate your objective because
- as you probably now see:
template int VISIBLE foobar(int);
template int* VISIBLE foobar(int*);
doesn't mean what you thought it did. You thought that, for given type U,
template U VISIBLE foobar(U);
declares a template instantiating function that has default
visibility, accepting an argument of type U and returning the same. In fact,
it declares a template instantiating function that accepts an argument of
type U and returns type:
U __attribute__((visibility("default")))
which is allowed for U = int, but disallowed for U = int *.
To express your intention that instantations of template<typename T> T foobar(T arg)
shall be dynamically visible functions, qualify the type of the template function
itself with the visibility attribute. Per GCC's documentation of the __attribute__
syntax - which admittedly
says nothing specific concerning templates - you must make an attribute
qualification of a function in a declaration other than its definition. So complying
with that, you'd revise your code like:
// libtest.cpp
#define VISIBLE __attribute__((visibility("default")))
template<typename T> T foobar(T arg) VISIBLE;
template<typename T> T foobar(T arg) {
return arg;
}
template int foobar(int);
template int* foobar(int*);
g++ no longer has any gripes:
$ g++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -CD libtest.so | grep foobar
0000000000000640 W int foobar<int>(int)
0000000000000650 W int* foobar<int*>(int*)
and both of the overloads are dynamically visible. The same goes for clang:
$ clang++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -CD libtest.so | grep foobar
0000000000000650 W int foobar<int>(int)
0000000000000660 W int* foobar<int*>(int*)
With any luck, you'll have the same result with clang on Mac OS
I'm on Linux with g++ 5.3.0.
I thought I'd make myself an object file that, when linked, would initialized global variables Argc and Argv so that the main arguments would be available throughout the process.
Argv.hh:
#pragma once
extern char** Argv;
extern int Argc;
Argv.cc:
char** Argv;
int Argc;
static int set_argv(int argc, char** argv, char** env) { Argc = argc; Argv = argv; return 0; }
/* Put the function into the init_array */
__attribute__((section(".init_array"))) static void *ctr = (void*)&set_argv;
main.cc
#include "Argv.hh"
#include <stdio.h>
int main(){
for (int i = 0; i < Argc; ++i){
puts(Argv[i]);
}
return 0;
}
My original build script was:
com='g++ -std=c++1y'
for cc in *.cc; do $com -c $cc; done
g++ *.o
but it kept giving me a linking error. So I changed com to
gcc -x c -std=c99 and it worked, and it also worked with plain com=g++.
Each of the three compilers successfully compiles, only the linking part fails with g++ -std=c++1y.
nm *.o outputs:
For gcc -x c -std=c99:
Argv.o:
0000000000000004 C Argc
0000000000000008 C Argv
0000000000000000 t ctr
0000000000000000 t set_argv
main.o:
U Argc
U Argv
0000000000000000 T main
U puts
For g++:
Argv.o:
0000000000000008 B Argc
0000000000000000 B Argv
0000000000000000 t _ZL3ctr
0000000000000000 t _ZL8set_argviPPcS0_
main.o:
U Argc
U Argv
0000000000000000 T main
U puts
And for g++ -std=c++1y:
Argv.o:
0000000000000008 B Argc
0000000000000000 B Argv
0000000000000000 t _ZL3ctr
0000000000000000 t _ZL8set_argviPPcS0_
main.o:
0000000000000008 B Argc
0000000000000000 B Argv
0000000000000000 T main
U puts
The last set of object files fails to link with
main.o:(.bss+0x0): multiple definition of `Argv'
Argv.o:(.bss+0x0): first defined here
main.o:(.bss+0x8): multiple definition of `Argc'
Argv.o:(.bss+0x8): first defined here
collect2: error: ld returned 1 exit status
Why does g++ -std=c++1y generate B symbols for extern declarations when the other two generate (as they should?) undefined references? Is this a bug?