How I'm supposed to use the sanitizer in clang? - c++

I'm sorry if this is a uber-easy concept, but I find hard to acquire the right mindset in order to correctly use the sanitizer provided by clang.
float foo(float f) { return (f / 0); }
I compile this small snippet with
clang++ -fsanitize=float-divide-by-zero -std=c++11 -stdlib=libc++ -c source.cpp -o osan
and I also compile a "normal" version of my object without using the sanitizer
clang++ -std=c++11 -stdlib=libc++ -c source.cpp -o onorm
I was expecting some verbose output, or some error from the console, but when inspecting the file with nm I only found 1 difference
nm o* --demangle
onorm:
0000000000000000 T foo(float)
osan:
U __ubsan_handle_divrem_overflow
0000000000000000 T foo(float)
So in the sanitized version there is an undefined symbol with a name that resembles the sanitizer that I was using when compiling this; but everything is really "silent" with no output at all from the clang frontend .
How I'm supposed to use the sanitizer and what is the right workflow ? What's the point of that undefined symbol ?

The undefined symbol is a function that implements the sanitizer's check. If you look at generated code:
No sanitizer:
_Z3foof: # #_Z3foof
.cfi_startproc
# BB#0:
xorps %xmm1, %xmm1
divss %xmm1, %xmm0
ret
With sanitizer:
_Z3foof: # #_Z3foof
.cfi_startproc
.long 1413876459 # 0x54460aeb
.quad _ZTIFffE
# BB#0:
pushq %rax
.Ltmp1:
.cfi_def_cfa_offset 16
movss %xmm0, 4(%rsp) # 4-byte Spill
movd %xmm0, %esi
movl $__unnamed_1, %edi
xorl %edx, %edx
callq __ubsan_handle_divrem_overflow
xorps %xmm1, %xmm1
movss 4(%rsp), %xmm0 # 4-byte Reload
divss %xmm1, %xmm0
popq %rax
ret
You see it's added the code to do the check using that function.
The compiler should automatically link in the appropriate sanitizer library and then for me the following complete program:
float foo(float f) { return (f / 0); }
int main() {
foo(1.0f);
}
Produces the following output when executed:
main.cpp:1:32: runtime error: division by zero
I built and ran using the command clang++ -fsanitize=undefined main.cpp && ./a.out
If you want compile-time checks you want to either enable more compiler warnings or the static analyzer. However there doesn't seem to be any warning or static analysis check for floating point divide-by-zero errors.
Here's a program that produces an analyzer report:
#include <malloc.h>
int main() {
int *i = (int*) malloc(sizeof(int));
}
Compiled with clang++ -std=c++11 main.cpp it produces no diagnostics, but compiled with clang++ -std=c++11 --analyze main.cpp it reports the following:
main.cpp:4:10: warning: Value stored to 'i' during its initialization is never read
int *i = (int*) malloc(sizeof(int));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
main.cpp:5:1: warning: Potential leak of memory pointed to by 'i'
}
^
The dead store can also be detected with -Weverything [-Wunused-value], but the leak is only detected by the analyzer.
By default full analysis results are written to a plist file. You can also run the analyzer with the commands:
clang++ --analyze -Xanalyzer -analyzer-output=text main.cpp
clang++ --analyze -Xanalyzer -analyzer-output=html -o html-dir main.cpp
To get detailed walk-throughs of detected issues on the standard output or via html display of annotated source code respectively, instead of in a plist.
Analyzer checks are listed here.
Note that to work best the analyzer needs to analyze whole programs, which means it needs to tie into the build system. The usual interface is via an IDE (Xcode) or the scan-build tool with make. CMake has some clang features such as producing clang JSON compilation database files but I'm not sure off hand if CMake has any built in support for the clang analyzer.

So if we look at the documentation in the the Controlling Code Generation it says (emphasis mine):
Turn on runtime checks for various forms of undefined or suspicious behavior.
This option controls whether Clang adds runtime checks for various forms of undefined or suspicious behavior, and is disabled by default. If a check fails, a diagnostic message is produced at runtime explaining the problem.
so these are runtime checks not compile time checks. So if you used foo in your code then you would see the following output:
runtime error: division by zero
See this example live using -fsanitize=undefined:
float foo(float f) { return (f / 0); }
int main()
{
int x = 1 << 100 ;
foo( 2.0f ) ;
}
it generates two run-time messages:
main.cpp:6:19: runtime error: shift exponent 100 is too large for 32-bit type 'int'
main.cpp:2:36: runtime error: division by zero
Update
With respect to static checkers, in my answer to A C++ implementation that detects undefined behavior? I mention several tools: STACK, kcc and of course Frama-C.
Apparently clang allows you to use --analyze to run it's static checker but it seems like it may be disabled eventually and the the correct way to run it would be through scan-build.
Also in my self-answered question Why do constant expressions have an exclusion for undefined behavior? I show how constexprs can be used to catch undefined behavior at compile time.

Related

Advice about modify compiler to add extra bytes for each function

Ftrace supports dynamic trace, that is, it can trace any global function in the kernel and modules. It uses the -pg compilation option of gcc to add a stub at the beginning of each function, so that when needed, the function can be controlled to jump to the specified code for execution. gcc 4.6 newly added -pg -mfentry support, so that an instruction to call fentry can be inserted at the very beginning of the function, like:
[root#localhost kernel-4.4.27]# echo 'void foo(){}' | gcc -x c -S -o - - -pg -mfentry
foo:
.LFB0:
.cfi_startproc
call __fentry__
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
What I need to do is like -pg option, however I wanna leave 5 extra bytes both at the begin and the end of each function to modify for my use, so I need some advice about how to modify clang or llvm to realize it (because gcc maybe too hard for me to understand and modify)

How can I compile a hybrid (asm, C++) source code into a 32-bit program?

I am using Cygwin 32-bit under Win7 in a 64-bit machine.
The following program
makefile:
runme: main.cpp asm.o
g++ main.cpp asm.o -o executable
asm.o: asm.asm
nasm -f elf asm.asm -o asm.o
asm.asm:
section .data
section .bss
section .text
global GetValueFromASM
GetValueFromASM:
mov eax, 9
ret
main.cpp:
#include <iostream>
using namespace std;
extern "C" int GetValueFromASM();
int main()
{
cout<<"GetValueFromASM() returned = "<<GetValueFromASM()<<endl;
return 0;
}
is giving me the following error:
$ make
nasm -f elf asm.asm -o asm.o
g++ main.cpp asm.o -o executable
/tmp/cc3F1pPh.o:main.cpp:(.text+0x26): undefined reference to `GetValueFromASM'
collect2: error: ld returned 1 exit status
make: *** [makefile:2: runme] Error 1
I am not understanding why this error is being generated.
How can I get rid of this issue?
You have to prefix your symbols with _, as is customary in Windows/Cygwin:
section .data
section .bss
section .text
global _GetValueFromASM
_GetValueFromASM:
mov eax, 9
ret
The rest of your code should work fine.
An alternative would be to compile with -fno-leading-underscore. However, this may break linking with other (Cygwin system) libraries. I suggest using the first option if portability to other platforms does not matter to you.
Quoting from the GNU Online Docs:
-fleading-underscore
This option and its counterpart, -fno-leading-underscore, forcibly change the way C symbols are represented in the object file. One use is to help link with legacy assembly code.
Warning: the -fleading-underscore switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface. Not all targets provide complete support for this switch.

LLVM IR optimization

I am trying to follow this link in order to generate an IR representation for a c code. The c code that I am using is as follows
void main() {
int c1 = 17;
int c2 = 25;
int c3 = c1 + c2;
printf("Value = %d\n", c3);
}
Which I save it as const.c. Once it is saved, I use the following command in order to generate a .bc file.
clang -c -emit-llvm const.c -o const.bc
Once the .bc file is generated, I want to use the following command in order to generate the optimized version of the const.bc file which is named const.reg.bc.
opt -mem2reg const.bc > const.reg.bc
I don't have any issues generating these files but for some reason both of them are exactly the same and no optimization happens. The results should be different, I mean const.reg.bc should be an optimized version of the const.bc file. But for some reason it does not happen. Can someone tell me what is it that I am not doing right?
This option can be used with clang -Xclang -disable-O0-optnone to prevent generation of optnone attribute.
When you run clang somefile.c, it defaults to -O0 optimization level, which emits main function with optnone attribute. This attribute prevents optimizations, which is why you don't see the result of mem2reg.
You have to remove optnone attribute if you want opt to do work:
clang -S -emit-llvm const.c -o - | sed s/optnone// | opt -S -mem2reg
Note thet mem2reg and its counterpart reg2mem passes are not strictly optimizing. They are just converting the IR from/to SSA form.

Why is the file size of the executable file larger than the .cpp source file with no includes?

Why is the file size of the executable file larger than the source? I made the example below (the simplest one I could think of) and it the executable is still so huge compared to the source, even though (I think) it's not using any libraries.
Simplest.cpp: 33 bytes
Simplest.s: 386 bytes
Simplest.exe: 60076 bytes
Simplest.cpp:
int main(void)
{
return 0;
}
Simplest.s:
.file "Simplest.cpp"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
call ___main
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE0:
.ident "GCC: (GNU) 4.8.3"
Not sure if it's relevant, but I'm using g++ compiler on cygwin on Windows 8 with an Intel processor.
The executable is linked with lots of libraries. So, when your compiler's done linking, the file size increases. libc or libc++ are always linked against when you're building a C or C++ program.
You could read this article about gcc's linking process.
ld's manpage says
ld combines a number of object and archive files, relocates their data and ties up symbol references. Usually the last step in compiling a program is to run ld.
All in all, linkers may put lots of stuff into the executable. No wonder why its size can be greater than the source file's size.
Note: the links above are about linking on Unix, not Windows, but Cygwin tries to somehow simulate the behavior of Linux/Unix systems, so they're still relevant.
ForceBru has explained what's going on at a high level, but it seems that you already understand that linking libraries could increase executable size, but (mistakenly) believe that your program uses no libraries.
Actually, because you linked your program by running gcc, when ld was invoked, gcc passed some extra options. To control this, read about gcc Link Options
Of particular interest are the -nostdlib and -nodefaultlibs options, described as follows:
-nodefaultlibs
Do not use the standard system libraries when linking. Only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored. The standard startup files are used normally, unless -nostartfiles is used.
The compiler may generate calls to memcmp, memset, memcpy and memmove. These entries are usually resolved by entries in libc. These entry points should be supplied through some other mechanism when this option is specified.
-nostdlib
Do not use the standard system startup files or libraries when linking. No startup files and only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored.
The compiler may generate calls to memcmp, memset, memcpy and memmove. These entries are usually resolved by entries in libc. These entry points should be supplied through some other mechanism when this option is specified.
One of the standard libraries bypassed by -nostdlib and -nodefaultlibs is libgcc.a, a library of internal subroutines which GCC uses to overcome shortcomings of particular machines, or special needs for some languages. (See Interfacing to GCC Output, for more discussion of libgcc.a.) In most cases, you need libgcc.a even when you want to avoid other standard libraries. In other words, when you specify -nostdlib or -nodefaultlibs you should usually specify -lgcc as well. This ensures that you have no unresolved references to internal GCC library subroutines. (An example of such an internal subroutine is __main, used to ensure C++ constructors are called; see collect2.)
Because you haven't used these options, your code is in fact being linked with multiple libraries.
To understand some of the behavior provided by those libraries, without which even your tiny program will fail, you might read the blog series Hello from a libc-free world! (Part 2)

Failed to link obj files generated by LLVM compiler using MS Linker

I used llvm online compiler to compile my sample C code,
int main() { return 0; }
the generated LLVM assembly,
; ModuleID = '/tmp/webcompile/_31588_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
define i32 #main() nounwind uwtable {
%1 = alloca i32, align 4
store i32 0, i32* %1
ret i32 0
}
then I compiled LLVM assembly to obj file,
llc -filetype=obj a.ll
when I tried to link the obj file using link.exe a.o I got the error
fatal error LNK1107: invalid or corrupt file: cannot read at 0x438
How can I generate the right obj file to feed into link.exe?
More information
I built LLVM using Visual Studio 11. I didn't have cygwin installed.
link.exe is also from Visual Studio 11
LLVM was built from latest source code.
If I compile the same code using VC++ compiler into assembly, it looks like this,
; Listing generated by Microsoft (R) Optimizing Compiler Version 17.00.50402.0
include listing.inc
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC main
; Function compile flags: /Odtp
_TEXT SEGMENT
main PROC
; File c:\tmp\u.c
; Line 1
xor eax, eax
ret 0
main ENDP
_TEXT ENDS
END
llc -filetype=asm j.ll generates the following code. It fails with ml.exe either.
.def _main;
.scl 2;
.type 32;
.endef
.text
.globl _main
.align 16, 0x90
_main: # #main
# BB#0:
pushl %eax
movl $0, (%esp)
xorl %eax, %eax
popl %edx
ret
You are in a strange land, you can read over: http://llvm.org/docs/GettingStartedVS.html which may help
You IR is for x86-64/linux as stated in target triplet. So, llc will generate (by default) ELF object file for you, not COFF. Surely link.exe will not accept it.
Note that you cannot just change the target triple to some windows and assume everything will work:
COFF object code emission is WIP
C/C++ are not target-independent languages, so, you cannot obtain target-independent IR here: http://llvm.org/docs/FAQ.html#platformindependent