I've recently compiled clang on windows (host: x86_64-pc-windows64 ; compiler: i686-pc-mingw32 ; target: i686-pc-mingw32).
The CMakeCache (for the config) can be found: here
My issue is that while clang works fine (for C), clang++ (for C++) will "successfully" compile and link, but the resulting program itself won't run and will exit with an error code 1. Here's a sample below (oh-my-zsh):
➜ bin cat test.c
#include <stdio.h>
int main()
{
printf("Hello World!\n");
return 0;
}
➜ bin cat test.cpp
#include <iostream>
int main()
{
std::cout<<"Hello World!"<<std::endl;
return 0;
}
➜ bin ./clang++ test.cpp -o a.exe
➜ bin ./clang test.c -o b.exe
➜ bin ./a.exe
➜ bin ./b.exe
Hello World!
➜ bin
as is visible here, b.exe (in C) works fine, but a.exe (C++), while compiled and links, gives no output.
Could anyone hint me unto why this is so, and how can I fix it?
Note: the pre-compiled snapshot of clang for windows (also 32 bit) works fine with my current path configuration.
Note: a.exe (C++, failed) returns non-zero.
DATA:
CLANG VERSIONS:
Snap: clang version 3.5 (208017) ; Comp: clang version 3.4 (tags/RELEASE_34/final)
LLVM FILES: snapshot ; compiled ; diff
PREPROCESSING FILES: snapshot ; compiled ; diff
ASM FILES: snapshot ; compiled ; diff
VERBOSE OUTPUT: snapshot ; compiled
You new clang uses different (incorrect) calling convention, not the x86_thiscallcc.
snap.s from good clang:
movl $__ZStL8__ioinit, %ecx
calll __ZNSt8ios_base4InitC1Ev
movl %esp, %ecx
movl $__ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, (%ecx)
movl %eax, %ecx
calll __ZNSolsEPFRSoS_E
Same code from your custom clang, comp.s:
leal __ZStL8__ioinit, %eax
movl %eax, (%esp)
calll __ZNSt8ios_base4InitC1Ev
movl %eax, (%esp)
movl %ecx, 4(%esp)
calll __ZNSolsEPFRSoS_E
and several other.
In llvm bitcode (*.ll files) right calling convention is marked with x86_thiscallcc in function definitions and after call instruction:
< call void #_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"* #_ZStL8__ioinit)
> call x86_thiscallcc void #_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"* #_ZStL8__ioinit)
< declare void #_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"*) #0
> declare x86_thiscallcc void #_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"*) #0
32c33
< declare void #_ZNSt8ios_base4InitD1Ev(%"class.std::ios_base::Init"*) #0
> declare x86_thiscallcc void #_ZNSt8ios_base4InitD1Ev(%"class.std::ios_base::Init"*) #0
< call void #_ZNSt8ios_base4InitD1Ev(%"class.std::ios_base::Init"* #_ZStL8__ioinit)
> call x86_thiscallcc void #_ZNSt8ios_base4InitD1Ev(%"class.std::ios_base::Init"* #_ZStL8__ioinit)
< %3 = call %"class.std::basic_ostream"* #_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"* %2, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)* #_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_)
> %call1 = call x86_thiscallcc %"class.std::basic_ostream"* #_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"* %call, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)* #_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_)
< declare %"class.std::basic_ostream"* #_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"*, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)*) #0
> declare x86_thiscallcc %"class.std::basic_ostream"* #_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"*, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)*) #0
In preprocessed file I see the difference. In snap.E many functions are defined with __attribute__((__cdecl__)) and in comp.E they are defined with just __cdecl__. You should check why the definitions are different after preprocessing. I think, new clang may predefine different set of macro (gcc had -dM -E option to dump predefined, not know how to do this in clang). Or your clang just uses different headers (or different versions of headers, you can list used headers with -H option of clang compilation).
Other way is to check, is __attribute__((__cdecl__)) should be equal to __cdecl__, and does newer version of clang change anything in handling them.
Related
When I use Address Sanitizer(clang v3.4) to detect memory leak, I found that using -O(except -O0) option would always lead to a no-leak-detected result.
The code is simple:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int* array = (int *)malloc(sizeof(int) * 100);
for (int i = 0; i < 100; i++) //Initialize
array[i] = 0;
return 0;
}
when compile with -O0,
clang -fsanitize=address -g -O0 main.cpp
it will detect memory correctly,
==2978==WARNING: Trying to symbolize code, but external symbolizer is not initialized!
=================================================================
==2978==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 400 byte(s) in 1 object(s) allocated from:
#0 0x4652f9 (/home/mrkikokiko/sdk/MemoryCheck/a.out+0x4652f9)
#1 0x47b612 (/home/mrkikokiko/sdk/MemoryCheck/a.out+0x47b612)
#2 0x7fce3603af44 (/lib/x86_64-linux-gnu/libc.so.6+0x21f44)
SUMMARY: AddressSanitizer: 400 byte(s) leaked in 1 allocation(s).
however, when -O added,
clang -fsanitize=address -g -O main.cpp
nothing is detected! And I find nothing about it in official document.
This is because your code is completely optimized away. The resulting assembly is something like:
main: # #main
xorl %eax, %eax
retq
Without any call to malloc, there is no memory allocation... and therefore no memory leak.
In order to to have AddressSanitizer detect the memory leak, you can either:
Compile with optimizations disabled, as Simon Kraemer mentioned in the comments.
Mark array as volatile, preventing the optimization:
main: # #main
pushq %rax
movl $400, %edi # imm = 0x190
callq malloc # <<<<<< call to malloc
movl $9, %ecx
.LBB0_1: # =>This Inner Loop Header: Depth=1
movl $0, -36(%rax,%rcx,4)
movl $0, -32(%rax,%rcx,4)
movl $0, -28(%rax,%rcx,4)
movl $0, -24(%rax,%rcx,4)
movl $0, -20(%rax,%rcx,4)
movl $0, -16(%rax,%rcx,4)
movl $0, -12(%rax,%rcx,4)
movl $0, -8(%rax,%rcx,4)
movl $0, -4(%rax,%rcx,4)
movl $0, (%rax,%rcx,4)
addq $10, %rcx
cmpq $109, %rcx
jne .LBB0_1
xorl %eax, %eax
popq %rcx
retq
Look into the generated code.
Both GCC & Clang actually know about the semantics of malloc. Because on my Linux/Debian system <stdlib.h> contains
extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;
and the __attribute_malloc__ & _wur (and __THROW) are macros defined elsewhere. Read about Common Function Attributes in GCC documentation, and Clang documentation says:
Clang aims to support a broad range of GCC extensions.
I strongly suspect that with -O the call to malloc is optimized by removing it.
On my Linux/x86-64 machine using clang -O -S psbshdk.c (with clang 3.8) I am indeed getting:
.globl main
.align 16, 0x90
.type main,#function
main: # #main
.cfi_startproc
# BB#0:
xorl %eax, %eax
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
The address sanitizer is working on the emitted binary (which won't contain any malloc call).
BTW, you could compile with clang -O -g then use valgrind, or compile with clang -O -fsanitize=address -g. Both clang & gcc are able to optimize and give some debug information (which might be "approximate" when optimizing a lot).
I've added an intrinsic to an input code using an LLVM pass. I'm able to see the intrinsic call, yet I can't figure out how to compile the code to my target architecture (x86_64). I'm running the following command:
clang++ $(llvm-config --ldflags --libs all) ff.s -o foo
But the linker complains about undefined references:
/tmp/ff-2ada42.o: In function `fact(unsigned int)':
/home/rubens/Desktop/ff.cpp:9: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
/tmp/ff-2ada42.o: In function `fib(unsigned int)':
/home/rubens/Desktop/ff.cpp:16: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
/home/rubens/Desktop/ff.cpp:16: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
/home/rubens/Desktop/ff.cpp:16: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
Despite using ldflags from llvm-config, the compilation does not proceed. Any ideas on what should be done for the code to compile properly?
To generate the assembly code, I've done the following:
# Generating optimized code
clang++ $(llvm-config --cxxflags) -emit-llvm -c ff.cpp -o ff.bc
opt ff.bc -load path/to/mypass.so -mypass > opt_ff.bc
# Generating assembly
llc opt_ff.bc -o ff.s
I'm currently using llvm version 3.4.2; clang version 3.4.2 (tags/RELEASE_34/dot2-final); gcc version 4.9.2 (GCC); and Linux 3.17.2-1-ARCH x86_64.
Edit: adding the IR with the intrinsic:
File ~/llvm/include/llvm/IR/IntrinsicsX86.td:
...
589 // Thread synchronization ops.
590 let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
591 def int_x86_sse3_monitor : GCCBuiltin<"__builtin_ia32_monitor">,
592 Intrinsic<[], [llvm_ptr_ty,
593 llvm_i32_ty, llvm_i32_ty], []>;
594 def int_x86_sse3_mwait : GCCBuiltin<"__builtin_ia32_mwait">,
595 Intrinsic<[], [llvm_i32_ty,
596 llvm_i32_ty], []>;
597 }
...
And calls (from file ff.s):
...
.Ltmp2:
callq llvm.x86.sse3.mwait.i32.i32
movl $_ZStL8__ioinit, %edi
callq _ZNSt8ios_base4InitC1Ev
movl $_ZNSt8ios_base4InitD1Ev, %edi
movl $_ZStL8__ioinit, %esi
movl $__dso_handle, %edx
callq __cxa_atexit
popq %rax
ret
...
Edit 2: Here's how I'm adding the intrinsic during the opt pass:
Function *f(bb->getParent());
Module *m(f->getParent());
std::vector<Type *> types(2, Type::getInt32Ty(getGlobalContext()));
Function *mwait = Intrinsic::getDeclaration(m, Intrinsic::x86_sse3_mwait, types);
std::vector<Value *> args;
IRBuilder<> builder(&bb->front());
for (uint32_t i : {1, 2}) args.push_back(builder.getInt32(i));
ArrayRef<Value *> args_ref(args);
builder.CreateCall(mwait, args_ref);
EDIT:
I am currently writing an LLVM pass that is basicaly doing what you tried to do in this question. The problem with your code is the following:
std::vector<Type *> types(2, Type::getInt32Ty(getGlobalContext()));
Function *mwait = Intrinsic::getDeclaration(m, Intrinsic::x86_sse3_mwait, types);
You are trying to get the deceleration for an Intrinsic function with the name llvm.x86.sse3.mwait.i32.i32 and this Intrinsic does not exist. However, llvm.x86.sse3.mwait exists and therefor you have to write this:
Function *mwait = Intrinsic::getDeclaration(m, Intrinsic::x86_sse3_mwait);
notice the missing type argument to the call. This is because llvm.x86.sse3.mwait has no overloadings.
I hope you figured it out in the meantime.
Ok since I want be able to answer you for a while here is a wild guess answer.
The problem is the way you add the intrinsic through your optimizer pass. It looks like you are just creating a function with the same name as the intrinsic not the intrinsic itself.
Here is a little C++ code that just uses the clang built-in to get the intrinsic inside the IR (I use clang 3.5 but this should not have any impact).
int main ()
{
__builtin_ia32_mwait(4,2);
}
Compiling it with clang -emit-llvm -S I get:
; ModuleID = 'intrin.cpp'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
call void #llvm.x86.sse3.mwait(i32 4, i32 2)
ret i32 0
}
; Function Attrs: nounwind
declare void #llvm.x86.sse3.mwait(i32, i32) #1
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5.0 "}
Please not that the SSE3 intrinsic has no type overloads like in your version.
Using llc on the generated file provides me:
.Ltmp2:
.cfi_def_cfa_register %rbp
movl $4, %ecx
movl $2, %eax
mwait
xorl %eax, %eax
popq %rbp
retq
Proper assembly was created.
So I assume the way you are introducing the intrinsic into the function is wrong in your opt pass.
Get the intrinsic function and call it:
vector<Type*> types;
types.push_back(IntegerType::get(/*LLVM context*/, 32));
types.push_back(IntegerType::get(/*LLVM context*/, 32));
Function* func = Intrinsic::getDeclaration(/* module */, Intrinsic::x86_sse3_mwait, types);
CallInst* call = CallInst::Create(func, /* arguments */);
Environment Details:
Machine: Core i5 M540 processor running Centos 64 bits in a virtual machine in VMware player.
GCC: 4.8.2 built from source tar.
Issue:
I am trying to learn more about SIMD functions in C/C++ and for that I created the following helloworld program.
#include <iostream>
#include <pmmintrin.h>
int main(void){
__m128i a, b, c;
a = _mm_set_epi32(1, 1, 1, 1);
b = _mm_set_epi32(2, 3, 4, 5);
c = _mm_add_epi32(a,b);
std::cout << "Value of first int: " << c[0];
}
When I look at the assembly output for it using the following command I do not see the SIMD instructions.
g++ -S -I/usr/local/include/c++/4.8.2 -msse3 -O3 hello.cpp
Sample of the assembly generated:
movl $.LC2, %esi
movl $_ZSt4cout, %edi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
movabsq $21474836486, %rsi
movq %rax, %rdi
call _ZNSo9_M_insertIxEERSoT_
xorl %eax, %eax
Please advise the correct way of writing or compiling the SIMD code.
Thanks you!!
It looks like your compiler is optimizing away the calls to _mm_foo_epi32, since all the values are known. Try taking all the relevant inputs from the user and see what happens.
Alternately, compile with -O0 instead of -O3 and see what happens.
Have a look at this piece of code:
#include <iostream>
#include <string>
void foo(int(*f)()) {
std::cout << f() << std::endl;
}
void foo(std::string(*f)()) {
std::string s = f();
std::cout << s << std::endl;
}
int main() {
auto bar = [] () -> std::string {
return std::string("bla");
};
foo(bar);
return 0;
}
Compiling it with
g++ -o test test.cpp -std=c++11
leads to:
bla
like it should do. Compiling it with
clang++ -o test test.cpp -std=c++11 -stdlib=libc++
leads to:
zsh: illegal hardware instruction ./test
And Compiling it with
clang++ -o test test.cpp -std=c++11 -stdlib=stdlibc++
leads also to:
zsh: illegal hardware instruction ./test
Clang/GCC Versions:
clang version 3.2 (tags/RELEASE_32/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
gcc version 4.7.2 (Gentoo 4.7.2-r1 p1.5, pie-0.5.5)
Anyone any suggestions what is going wrong?
Thanks in advance!
Yes, it is a bug in Clang++. I can reproduce it with CLang 3.2 in i386-pc-linux-gnu.
And now some random analysis...
I've found that the bug is in the conversion from labmda to pointer-to-function: the compiler creates a kind of thunk with the appropriate signature that calls the lambda, but it has the instruction ud2 instead of ret.
The instruction ud2, as you all probably know, is an instruction that explicitly raises the "Invalid Opcode" exception. That is, an instruction intentionally left undefined.
Take a look at the disassemble: this is the thunk function:
main::$_0::__invoke():
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
movl %eax, (%esp)
movl %ecx, 4(%esp)
calll main::$_0::operator()() const ; this calls to the real lambda
subl $4, %esp
ud2 ; <<<-- What the...!!!
So a minimal example of the bug will be simply:
int main() {
std::string(*f)() = [] () -> std::string {
return "bla";
};
f();
return 0;
}
Curiously enough, the bug doesn't happen if the return type is a simple type, such as int. Then the generated thunk is:
main::$_0::__invoke():
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl %eax, (%esp)
calll main::$_0::operator()() const
addl $8, %esp
popl %ebp
ret
I suspect that the problem is in the forwarding of the return value. If it fits in a register, such as eax all goes well. But if it is a big struct, such as std::string it is returned in the stack, the compiler is confused and emits the ud2 in desperation.
This is most likely a bug in clang 3.2. I can't reproduce the crash with clang trunk.
I've seen a few tools like Pin and DynInst that do dynamic code manipulation in order to instrument code without having to recompile. These seem like heavyweight solutions to what seems like it should be a straightforward problem: retrieving accurate function call data from a program.
I want to write something such that in my code, I can write
void SomeFunction() {
StartProfiler();
...
StopProfiler();
}
and post-execution, retrieve data about what functions were called between StartProfiler() and StopProfiler() (the whole call tree) and how long each of them took.
Preferably I could read out debug symbols too, to get function names instead of addresses.
Here's one interesting hint at a solution I discovered.
gcc (and llvm>=3.0) has a -pg option when compiling, which is traditionally for gprof support. When you compile your code with this flag, the compiler adds a call to the function mcount to the beginning of every function definition. You can override this function, but you'll need to do it in assembly, otherwise the mcount function you define will be instrumented with a call to mcount and you'll quickly run out of stack space before main even gets called.
Here's a little proof of concept:
foo.c:
int total_calls = 0;
void foo(int c) {
if (c > 0)
foo(c-1);
}
int main() {
foo(4);
printf("%d\n", total_calls);
}
foo.s:
.globl mcount
mcount:
movl _total_calls(%rip), %eax
addl $1, %eax
movl %eax, _total_calls(%rip)
ret
compile with clang -pg foo.s foo.c -o foo. Result:
$ ./foo
6
That's 1 for main, 4 for foo and 1 for printf.
Here's the asm that clang emits for foo:
_foo:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -8(%rbp) ## 4-byte Spill
callq mcount
movl -8(%rbp), %edi ## 4-byte Reload
...