Failed to link obj files generated by LLVM compiler using MS Linker - llvm

I used llvm online compiler to compile my sample C code,
int main() { return 0; }
the generated LLVM assembly,
; ModuleID = '/tmp/webcompile/_31588_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define i32 #main() nounwind uwtable {
%1 = alloca i32, align 4
store i32 0, i32* %1
ret i32 0
}
then I compiled LLVM assembly to obj file,
llc -filetype=obj a.ll
when I tried to link the obj file using link.exe a.o I got the error
fatal error LNK1107: invalid or corrupt file: cannot read at 0x438
How can I generate the right obj file to feed into link.exe?
More information
I built LLVM using Visual Studio 11. I didn't have cygwin installed.
link.exe is also from Visual Studio 11
LLVM was built from latest source code.
If I compile the same code using VC++ compiler into assembly, it looks like this,
; Listing generated by Microsoft (R) Optimizing Compiler Version 17.00.50402.0
include listing.inc
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC main
; Function compile flags: /Odtp
_TEXT SEGMENT
main PROC
; File c:\tmp\u.c
; Line 1
xor eax, eax
ret 0
main ENDP
_TEXT ENDS
END
llc -filetype=asm j.ll generates the following code. It fails with ml.exe either.
.def _main;
.scl 2;
.type 32;
.endef
.text
.globl _main
.align 16, 0x90
_main: # #main
# BB#0:
pushl %eax
movl $0, (%esp)
xorl %eax, %eax
popl %edx
ret

You are in a strange land, you can read over: http://llvm.org/docs/GettingStartedVS.html which may help

You IR is for x86-64/linux as stated in target triplet. So, llc will generate (by default) ELF object file for you, not COFF. Surely link.exe will not accept it.
Note that you cannot just change the target triple to some windows and assume everything will work:
COFF object code emission is WIP
C/C++ are not target-independent languages, so, you cannot obtain target-independent IR here: http://llvm.org/docs/FAQ.html#platformindependent

Related

Advice about modify compiler to add extra bytes for each function

Ftrace supports dynamic trace, that is, it can trace any global function in the kernel and modules. It uses the -pg compilation option of gcc to add a stub at the beginning of each function, so that when needed, the function can be controlled to jump to the specified code for execution. gcc 4.6 newly added -pg -mfentry support, so that an instruction to call fentry can be inserted at the very beginning of the function, like:
[root#localhost kernel-4.4.27]# echo 'void foo(){}' | gcc -x c -S -o - - -pg -mfentry
foo:
.LFB0:
.cfi_startproc
call __fentry__
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
What I need to do is like -pg option, however I wanna leave 5 extra bytes both at the begin and the end of each function to modify for my use, so I need some advice about how to modify clang or llvm to realize it (because gcc maybe too hard for me to understand and modify)

Hiding C++ symbols with -fvisibility=hidden [duplicate]

This question already has answers here:
Hiding instantiated templates in shared library created with g++
(5 answers)
Closed 2 years ago.
I have a C++ library with a C API, and I have set the -fvisibility=hidden compiler flag,
and then I have set __attribute__ ((visibility ("default"))) on C API methods.
However, I still see visible C++ symbols. When I create a debian package for my library,
I get the following symbols file
Why are these symbols still visible ?
You should run your symbols file through c++filt which converts the "mangled" symbol names to what is readable [in the c++ sense].
If you do, you'll find that two thirds of the symbols are std::whatever, and not your symbols. So, they are being pulled in because of the STL. You may not be able to control them.
The other symbols are grk_*, if that helps.
There are object file utilities (e.g. readelf, objdump, objcopy, etc) that may allow you to edit/patch your object files.
Or, you might be able to use a linker script.
Or, you could compile with -S to get a .s file. You could then write a [perl/python] script to modify the asm source and add/change whatever attribute(s) you need to change the visibility. Then, just do: c++ -c modified.s
For a given symbol (e.g.):
int __attribute__((visibility("hidden")))
main(void)
{
return 0;
}
The asm file is:
.file "main.c"
.text
.globl main
.hidden main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",#progbits
Notice the asm directive:
.hidden main
Even without such a directive, it should be easy to write a script to add one [after the corresponding .globl]

__do_global_ctors_aux not shown in objdump

Consider the following code:
#include <stdio.h>
void __attribute__ ((constructor)) a_constructor()
{
printf("%s\n", __func__);
}
void __attribute__ ((constructor)) b_constructor()
{
printf("%s\n", __func__);
}
int main()
{
printf("%s\n",__func__);
}
I compile the above code as : gcc -ggdb prog2.c -o prog2. The code runs as expected.
a_constructor
b_constructor
main
But when I see its dump using objdump -d prog2 > f. There is neither a call to __do_global_ctors_aux anywhere in _init or anywhere else, nor a definition of __do_global_ctors_aux. So, how do the constructors get called? Where is the definition of __do_global_ctors_aux? Is this some optimization?
I also tried compiling it with no optimization like this: gcc -ggdb -O0 prog2.c -o prog2. Please Clarify.
The compilation is being done on 32 bit linux machine.
EDIT
My output from gdb bt is:
Breakpoint 1, a_constructor () at prog2.c:5
5 printf("%s\n", __func__);
(gdb) bt
#0 a_constructor () at prog2.c:5
#1 0x080484b2 in __libc_csu_init ()
#2 0xb7e31a1a in __libc_start_main (main=0x8048445 <main>, argc=1, argv=0xbffff014, init=0x8048460 <__libc_csu_init>,
fini=0x80484d0 <__libc_csu_fini>, rtld_fini=0xb7fed180 <_dl_fini>, stack_end=0xbffff00c) at libc-start.c:246
#3 0x08048341 in _start ()
So, how do the constructors get called?
If you look at the disassembly produced with gcc -g -O0 -S -fverbose-asm prog2.c -o prog2.s, there is the following:
.text
.Ltext0:
.globl a_constructor
.type a_constructor, #function
a_constructor:
.LFB0:
.file 1 "test.c"
.loc 1 4 0
.cfi_startproc
pushq %rbp #
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp #,
.cfi_def_cfa_register 6
.loc 1 5 0
movl $__func__.2199, %edi #,
call puts #
.loc 1 6 0
popq %rbp #
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size a_constructor, .-a_constructor
.section .init_array,"aw"
.align 8
.quad a_constructor
In the above, function a_constructor is put into .text section. And a pointer to the function is also appended to .init_array section. Before calling main glibc iterates over this array and invokes all constructor functions found there.
The details are implementation-specific and you don't mention your implementation.
A perfectly valid strategy used by some implementations is to create a run-time library that contains the real entry point for your program. That real entry point first calls all constructors, and then calls main. If your program is dynamically linked and the code behind that real entry point resides in a shared library (like, say, libc), then clearly disassembling your program cannot possibly show you where the constructor gets called.
A simple approach for figuring where precisely the call is coming from is by loading your program in a debugger, setting a breakpoint on one of the constructors, and asking for the call stack when the breakpoint is hit. For example, on Cygwin:
$ gdb ./test
GNU gdb (GDB) 7.8
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-cygwin".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test...done.
(gdb) b a_constructor
Breakpoint 1 at 0x4011c6: file test.cc, line 5.
(gdb) run
Starting program: /home/Harald van Dijk/test
[New Thread 4440.0x1734]
[New Thread 4440.0xa8c]
b_constructor
Breakpoint 1, a_constructor () at test.cc:5
5 printf("%s\n", __func__);
(gdb) bt
#0 a_constructor () at test.cc:5
#1 0x61006986 in __main () from /usr/bin/cygwin1.dll
#2 0x004011f6 in main () at test.cc:14
(gdb)
This shows that on Cygwin, a variant of the strategy I mentioned is used: the real entry point is the main function, but the compiler inserts a call to a Cygwin-specific __main function right at the start, and it's that __main function that searches for all constructors and calls them directly.
(Incidentally, clearly this breaks if main is called recursively: the constructors would run a second time. This is why C++ does not allow main to be called recursively. C does allow it, but then, standard C doesn't have constructor functions.)
And you can get a hint of how that __main function searches for them, by not disassembling the executable program, but asking the compiler for the generated assembly:
$ gcc -S test.c -o -
I won't copy the whole assembly listing here, but it shows that on this particular implementation, constructor functions get emitted in a .ctors segment, so it would be easy for a __main function to simply call all functions in that segment, without the compiler having to enumerate each such function one by one.

How I'm supposed to use the sanitizer in clang?

I'm sorry if this is a uber-easy concept, but I find hard to acquire the right mindset in order to correctly use the sanitizer provided by clang.
float foo(float f) { return (f / 0); }
I compile this small snippet with
clang++ -fsanitize=float-divide-by-zero -std=c++11 -stdlib=libc++ -c source.cpp -o osan
and I also compile a "normal" version of my object without using the sanitizer
clang++ -std=c++11 -stdlib=libc++ -c source.cpp -o onorm
I was expecting some verbose output, or some error from the console, but when inspecting the file with nm I only found 1 difference
nm o* --demangle
onorm:
0000000000000000 T foo(float)
osan:
U __ubsan_handle_divrem_overflow
0000000000000000 T foo(float)
So in the sanitized version there is an undefined symbol with a name that resembles the sanitizer that I was using when compiling this; but everything is really "silent" with no output at all from the clang frontend .
How I'm supposed to use the sanitizer and what is the right workflow ? What's the point of that undefined symbol ?
The undefined symbol is a function that implements the sanitizer's check. If you look at generated code:
No sanitizer:
_Z3foof: # #_Z3foof
.cfi_startproc
# BB#0:
xorps %xmm1, %xmm1
divss %xmm1, %xmm0
ret
With sanitizer:
_Z3foof: # #_Z3foof
.cfi_startproc
.long 1413876459 # 0x54460aeb
.quad _ZTIFffE
# BB#0:
pushq %rax
.Ltmp1:
.cfi_def_cfa_offset 16
movss %xmm0, 4(%rsp) # 4-byte Spill
movd %xmm0, %esi
movl $__unnamed_1, %edi
xorl %edx, %edx
callq __ubsan_handle_divrem_overflow
xorps %xmm1, %xmm1
movss 4(%rsp), %xmm0 # 4-byte Reload
divss %xmm1, %xmm0
popq %rax
ret
You see it's added the code to do the check using that function.
The compiler should automatically link in the appropriate sanitizer library and then for me the following complete program:
float foo(float f) { return (f / 0); }
int main() {
foo(1.0f);
}
Produces the following output when executed:
main.cpp:1:32: runtime error: division by zero
I built and ran using the command clang++ -fsanitize=undefined main.cpp && ./a.out
If you want compile-time checks you want to either enable more compiler warnings or the static analyzer. However there doesn't seem to be any warning or static analysis check for floating point divide-by-zero errors.
Here's a program that produces an analyzer report:
#include <malloc.h>
int main() {
int *i = (int*) malloc(sizeof(int));
}
Compiled with clang++ -std=c++11 main.cpp it produces no diagnostics, but compiled with clang++ -std=c++11 --analyze main.cpp it reports the following:
main.cpp:4:10: warning: Value stored to 'i' during its initialization is never read
int *i = (int*) malloc(sizeof(int));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
main.cpp:5:1: warning: Potential leak of memory pointed to by 'i'
}
^
The dead store can also be detected with -Weverything [-Wunused-value], but the leak is only detected by the analyzer.
By default full analysis results are written to a plist file. You can also run the analyzer with the commands:
clang++ --analyze -Xanalyzer -analyzer-output=text main.cpp
clang++ --analyze -Xanalyzer -analyzer-output=html -o html-dir main.cpp
To get detailed walk-throughs of detected issues on the standard output or via html display of annotated source code respectively, instead of in a plist.
Analyzer checks are listed here.
Note that to work best the analyzer needs to analyze whole programs, which means it needs to tie into the build system. The usual interface is via an IDE (Xcode) or the scan-build tool with make. CMake has some clang features such as producing clang JSON compilation database files but I'm not sure off hand if CMake has any built in support for the clang analyzer.
So if we look at the documentation in the the Controlling Code Generation it says (emphasis mine):
Turn on runtime checks for various forms of undefined or suspicious behavior.
This option controls whether Clang adds runtime checks for various forms of undefined or suspicious behavior, and is disabled by default. If a check fails, a diagnostic message is produced at runtime explaining the problem.
so these are runtime checks not compile time checks. So if you used foo in your code then you would see the following output:
runtime error: division by zero
See this example live using -fsanitize=undefined:
float foo(float f) { return (f / 0); }
int main()
{
int x = 1 << 100 ;
foo( 2.0f ) ;
}
it generates two run-time messages:
main.cpp:6:19: runtime error: shift exponent 100 is too large for 32-bit type 'int'
main.cpp:2:36: runtime error: division by zero
Update
With respect to static checkers, in my answer to A C++ implementation that detects undefined behavior? I mention several tools: STACK, kcc and of course Frama-C.
Apparently clang allows you to use --analyze to run it's static checker but it seems like it may be disabled eventually and the the correct way to run it would be through scan-build.
Also in my self-answered question Why do constant expressions have an exclusion for undefined behavior? I show how constexprs can be used to catch undefined behavior at compile time.

Can a libXXX.so transfer into a native code file? What ls LLVM native code looks like?

It was confused me that what is the LLVM native code look like? For example, there are two kinds of main.s, which one is native code?
This one:
.file "main.bc"
.text
.globl main
.align 16, 0x90
.type main,#function
main: # #main
# BB#0:
subl $12, %esp
movl $.L.str, (%esp)
calll puts
calll foo
xorl %eax, %eax
addl $12, %esp
ret
.Ltmp0:
.size main, .Ltmp0-main
.type .L.str,#object # #.str
.section .rodata.str1.1,"aMS",#progbits,1
.L.str:
.asciz "This is a shared library test..."
.size .L.str, 33
.section ".note.GNU-stack","",#progbits
or:
#.str = private unnamed_addr constant [33 x i8] c"This is a shared library test...\00", align 1
define i32 #main() nounwind {
%1 = alloca i32, align 4
store i32 0, i32* %1
%2 = call i32 #puts(i8* getelementptr inbounds ([33 x i8]* #.str, i32 0, i32 0))
call void #foo()
ret i32 0
}
declare i32 #puts(i8*)
declare void #foo()
The first one was generated by llvm-llc, and the second was by -emit-llvm -S.
If I want to use LLVM to transfer a static library or a shared library into native code, how can I do with LLVM?
There is no such thing as "LLVM native code".
The first code section is in (what looks like to be) x86 assembly. These files usually get the ".s" extension, and they can be converted into object files by an assembler. LLVM's "llc" tool generates those by default, but there's nothing LLVM-specific about those files - every x86 compiler can generate them. This is also sometimes called x86 native code.
The second code section is in "LLVM bitcode" or "LLVM Intermediate Representation" (IR). This is an LLVM-specific intermediate language, and usually get the ".ll" extension. However, running "clang -emit-llvm -S" will by default generate those under ".s" extension, which might explain your confusion here.
You ask:
If I want to use LLVM to transfer a static library or a shared library into native code, how can I do with LLVM?
If you are talking about static libraries and shared libraries that are already built - e.g. ".so" or ".lib" files - then they are already in "native code", though you may want to use a disassembler if you want to get a human-friendly representation of them. If those libraries are not already built, you can use Clang to build them, just like any other compiler. If those libraries are provided in LLVM bitcode, you can use LLVM's "llc" to convert them to assembly files (though "llc" doesn't do optimizations - you need to manually run LLVM's "opt" too for that).