This question already has answers here:
Hiding instantiated templates in shared library created with g++
(5 answers)
Closed 2 years ago.
I have a C++ library with a C API, and I have set the -fvisibility=hidden compiler flag,
and then I have set __attribute__ ((visibility ("default"))) on C API methods.
However, I still see visible C++ symbols. When I create a debian package for my library,
I get the following symbols file
Why are these symbols still visible ?
You should run your symbols file through c++filt which converts the "mangled" symbol names to what is readable [in the c++ sense].
If you do, you'll find that two thirds of the symbols are std::whatever, and not your symbols. So, they are being pulled in because of the STL. You may not be able to control them.
The other symbols are grk_*, if that helps.
There are object file utilities (e.g. readelf, objdump, objcopy, etc) that may allow you to edit/patch your object files.
Or, you might be able to use a linker script.
Or, you could compile with -S to get a .s file. You could then write a [perl/python] script to modify the asm source and add/change whatever attribute(s) you need to change the visibility. Then, just do: c++ -c modified.s
For a given symbol (e.g.):
int __attribute__((visibility("hidden")))
main(void)
{
return 0;
}
The asm file is:
.file "main.c"
.text
.globl main
.hidden main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",#progbits
Notice the asm directive:
.hidden main
Even without such a directive, it should be easy to write a script to add one [after the corresponding .globl]
Related
Ftrace supports dynamic trace, that is, it can trace any global function in the kernel and modules. It uses the -pg compilation option of gcc to add a stub at the beginning of each function, so that when needed, the function can be controlled to jump to the specified code for execution. gcc 4.6 newly added -pg -mfentry support, so that an instruction to call fentry can be inserted at the very beginning of the function, like:
[root#localhost kernel-4.4.27]# echo 'void foo(){}' | gcc -x c -S -o - - -pg -mfentry
foo:
.LFB0:
.cfi_startproc
call __fentry__
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
What I need to do is like -pg option, however I wanna leave 5 extra bytes both at the begin and the end of each function to modify for my use, so I need some advice about how to modify clang or llvm to realize it (because gcc maybe too hard for me to understand and modify)
Why is the file size of the executable file larger than the source? I made the example below (the simplest one I could think of) and it the executable is still so huge compared to the source, even though (I think) it's not using any libraries.
Simplest.cpp: 33 bytes
Simplest.s: 386 bytes
Simplest.exe: 60076 bytes
Simplest.cpp:
int main(void)
{
return 0;
}
Simplest.s:
.file "Simplest.cpp"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
call ___main
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.3"
Not sure if it's relevant, but I'm using g++ compiler on cygwin on Windows 8 with an Intel processor.
The executable is linked with lots of libraries. So, when your compiler's done linking, the file size increases. libc or libc++ are always linked against when you're building a C or C++ program.
You could read this article about gcc's linking process.
ld's manpage says
ld combines a number of object and archive files, relocates their data and ties up symbol references. Usually the last step in compiling a program is to run ld.
All in all, linkers may put lots of stuff into the executable. No wonder why its size can be greater than the source file's size.
Note: the links above are about linking on Unix, not Windows, but Cygwin tries to somehow simulate the behavior of Linux/Unix systems, so they're still relevant.
ForceBru has explained what's going on at a high level, but it seems that you already understand that linking libraries could increase executable size, but (mistakenly) believe that your program uses no libraries.
Actually, because you linked your program by running gcc, when ld was invoked, gcc passed some extra options. To control this, read about gcc Link Options
Of particular interest are the -nostdlib and -nodefaultlibs options, described as follows:
-nodefaultlibs
Do not use the standard system libraries when linking. Only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored. The standard startup files are used normally, unless -nostartfiles is used.
The compiler may generate calls to memcmp, memset, memcpy and memmove. These entries are usually resolved by entries in libc. These entry points should be supplied through some other mechanism when this option is specified.
-nostdlib
Do not use the standard system startup files or libraries when linking. No startup files and only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored.
The compiler may generate calls to memcmp, memset, memcpy and memmove. These entries are usually resolved by entries in libc. These entry points should be supplied through some other mechanism when this option is specified.
One of the standard libraries bypassed by -nostdlib and -nodefaultlibs is libgcc.a, a library of internal subroutines which GCC uses to overcome shortcomings of particular machines, or special needs for some languages. (See Interfacing to GCC Output, for more discussion of libgcc.a.) In most cases, you need libgcc.a even when you want to avoid other standard libraries. In other words, when you specify -nostdlib or -nodefaultlibs you should usually specify -lgcc as well. This ensures that you have no unresolved references to internal GCC library subroutines. (An example of such an internal subroutine is __main, used to ensure C++ constructors are called; see collect2.)
Because you haven't used these options, your code is in fact being linked with multiple libraries.
To understand some of the behavior provided by those libraries, without which even your tiny program will fail, you might read the blog series Hello from a libc-free world! (Part 2)
Consider the following code:
#include <stdio.h>
void __attribute__ ((constructor)) a_constructor()
{
printf("%s\n", __func__);
}
void __attribute__ ((constructor)) b_constructor()
{
printf("%s\n", __func__);
}
int main()
{
printf("%s\n",__func__);
}
I compile the above code as : gcc -ggdb prog2.c -o prog2. The code runs as expected.
a_constructor
b_constructor
main
But when I see its dump using objdump -d prog2 > f. There is neither a call to __do_global_ctors_aux anywhere in _init or anywhere else, nor a definition of __do_global_ctors_aux. So, how do the constructors get called? Where is the definition of __do_global_ctors_aux? Is this some optimization?
I also tried compiling it with no optimization like this: gcc -ggdb -O0 prog2.c -o prog2. Please Clarify.
The compilation is being done on 32 bit linux machine.
EDIT
My output from gdb bt is:
Breakpoint 1, a_constructor () at prog2.c:5
5 printf("%s\n", __func__);
(gdb) bt
#0 a_constructor () at prog2.c:5
#1 0x080484b2 in __libc_csu_init ()
#2 0xb7e31a1a in __libc_start_main (main=0x8048445 <main>, argc=1, argv=0xbffff014, init=0x8048460 <__libc_csu_init>,
fini=0x80484d0 <__libc_csu_fini>, rtld_fini=0xb7fed180 <_dl_fini>, stack_end=0xbffff00c) at libc-start.c:246
#3 0x08048341 in _start ()
So, how do the constructors get called?
If you look at the disassembly produced with gcc -g -O0 -S -fverbose-asm prog2.c -o prog2.s, there is the following:
.text
.Ltext0:
.globl a_constructor
.type a_constructor, #function
a_constructor:
.LFB0:
.file 1 "test.c"
.loc 1 4 0
.cfi_startproc
pushq %rbp #
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp #,
.cfi_def_cfa_register 6
.loc 1 5 0
movl $__func__.2199, %edi #,
call puts #
.loc 1 6 0
popq %rbp #
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size a_constructor, .-a_constructor
.section .init_array,"aw"
.align 8
.quad a_constructor
In the above, function a_constructor is put into .text section. And a pointer to the function is also appended to .init_array section. Before calling main glibc iterates over this array and invokes all constructor functions found there.
The details are implementation-specific and you don't mention your implementation.
A perfectly valid strategy used by some implementations is to create a run-time library that contains the real entry point for your program. That real entry point first calls all constructors, and then calls main. If your program is dynamically linked and the code behind that real entry point resides in a shared library (like, say, libc), then clearly disassembling your program cannot possibly show you where the constructor gets called.
A simple approach for figuring where precisely the call is coming from is by loading your program in a debugger, setting a breakpoint on one of the constructors, and asking for the call stack when the breakpoint is hit. For example, on Cygwin:
$ gdb ./test
GNU gdb (GDB) 7.8
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-cygwin".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...done.
(gdb) b a_constructor
Breakpoint 1 at 0x4011c6: file test.cc, line 5.
(gdb) run
Starting program: /home/Harald van Dijk/test
[New Thread 4440.0x1734]
[New Thread 4440.0xa8c]
b_constructor
Breakpoint 1, a_constructor () at test.cc:5
5 printf("%s\n", __func__);
(gdb) bt
#0 a_constructor () at test.cc:5
#1 0x61006986 in __main () from /usr/bin/cygwin1.dll
#2 0x004011f6 in main () at test.cc:14
(gdb)
This shows that on Cygwin, a variant of the strategy I mentioned is used: the real entry point is the main function, but the compiler inserts a call to a Cygwin-specific __main function right at the start, and it's that __main function that searches for all constructors and calls them directly.
(Incidentally, clearly this breaks if main is called recursively: the constructors would run a second time. This is why C++ does not allow main to be called recursively. C does allow it, but then, standard C doesn't have constructor functions.)
And you can get a hint of how that __main function searches for them, by not disassembling the executable program, but asking the compiler for the generated assembly:
$ gcc -S test.c -o -
I won't copy the whole assembly listing here, but it shows that on this particular implementation, constructor functions get emitted in a .ctors segment, so it would be easy for a __main function to simply call all functions in that segment, without the compiler having to enumerate each such function one by one.
It was confused me that what is the LLVM native code look like? For example, there are two kinds of main.s, which one is native code?
This one:
.file "main.bc"
.text
.globl main
.align 16, 0x90
.type main,#function
main: # #main
# BB#0:
subl $12, %esp
movl $.L.str, (%esp)
calll puts
calll foo
xorl %eax, %eax
addl $12, %esp
ret
.Ltmp0:
.size main, .Ltmp0-main
.type .L.str,#object # #.str
.section .rodata.str1.1,"aMS",#progbits,1
.L.str:
.asciz "This is a shared library test..."
.size .L.str, 33
.section ".note.GNU-stack","",#progbits
or:
#.str = private unnamed_addr constant [33 x i8] c"This is a shared library test...\00", align 1
define i32 #main() nounwind {
%1 = alloca i32, align 4
store i32 0, i32* %1
%2 = call i32 #puts(i8* getelementptr inbounds ([33 x i8]* #.str, i32 0, i32 0))
call void #foo()
ret i32 0
}
declare i32 #puts(i8*)
declare void #foo()
The first one was generated by llvm-llc, and the second was by -emit-llvm -S.
If I want to use LLVM to transfer a static library or a shared library into native code, how can I do with LLVM?
There is no such thing as "LLVM native code".
The first code section is in (what looks like to be) x86 assembly. These files usually get the ".s" extension, and they can be converted into object files by an assembler. LLVM's "llc" tool generates those by default, but there's nothing LLVM-specific about those files - every x86 compiler can generate them. This is also sometimes called x86 native code.
The second code section is in "LLVM bitcode" or "LLVM Intermediate Representation" (IR). This is an LLVM-specific intermediate language, and usually get the ".ll" extension. However, running "clang -emit-llvm -S" will by default generate those under ".s" extension, which might explain your confusion here.
You ask:
If I want to use LLVM to transfer a static library or a shared library into native code, how can I do with LLVM?
If you are talking about static libraries and shared libraries that are already built - e.g. ".so" or ".lib" files - then they are already in "native code", though you may want to use a disassembler if you want to get a human-friendly representation of them. If those libraries are not already built, you can use Clang to build them, just like any other compiler. If those libraries are provided in LLVM bitcode, you can use LLVM's "llc" to convert them to assembly files (though "llc" doesn't do optimizations - you need to manually run LLVM's "opt" too for that).
I used llvm online compiler to compile my sample C code,
int main() { return 0; }
the generated LLVM assembly,
; ModuleID = '/tmp/webcompile/_31588_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define i32 #main() nounwind uwtable {
%1 = alloca i32, align 4
store i32 0, i32* %1
ret i32 0
}
then I compiled LLVM assembly to obj file,
llc -filetype=obj a.ll
when I tried to link the obj file using link.exe a.o I got the error
fatal error LNK1107: invalid or corrupt file: cannot read at 0x438
How can I generate the right obj file to feed into link.exe?
More information
I built LLVM using Visual Studio 11. I didn't have cygwin installed.
link.exe is also from Visual Studio 11
LLVM was built from latest source code.
If I compile the same code using VC++ compiler into assembly, it looks like this,
; Listing generated by Microsoft (R) Optimizing Compiler Version 17.00.50402.0
include listing.inc
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC main
; Function compile flags: /Odtp
_TEXT SEGMENT
main PROC
; File c:\tmp\u.c
; Line 1
xor eax, eax
ret 0
main ENDP
_TEXT ENDS
END
llc -filetype=asm j.ll generates the following code. It fails with ml.exe either.
.def _main;
.scl 2;
.type 32;
.endef
.text
.globl _main
.align 16, 0x90
_main: # #main
# BB#0:
pushl %eax
movl $0, (%esp)
xorl %eax, %eax
popl %edx
ret
You are in a strange land, you can read over: http://llvm.org/docs/GettingStartedVS.html which may help
You IR is for x86-64/linux as stated in target triplet. So, llc will generate (by default) ELF object file for you, not COFF. Surely link.exe will not accept it.
Note that you cannot just change the target triple to some windows and assume everything will work:
COFF object code emission is WIP
C/C++ are not target-independent languages, so, you cannot obtain target-independent IR here: http://llvm.org/docs/FAQ.html#platformindependent