LLVM - What optimizations frontend has done - llvm

I know that frontend (such as llvm-clang or llvm-gcc ) has also done some optimizations from native code to IR level.
But what's optimizations that frontend has done ? Is there a list or a document I can check.
Thanks.

You can print all the passes which the code goes through by using:
clang -O2 -Rpass=.* code.cc -o code
This will also print the information from each of the optimization passes that were used to process the code when O2 level is used with clang, for example.
See this link for more details: http://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports

Related

llc: unsupported relocation on symbol

Problem
llc is giving me the following error:
LLVM ERROR: unsupported relocation on symbol
Detailed compilation flow
I am implementing an LLVM frontend for a middle-level IR (MIR) of a compiler, and after I convert various methods to many bitcode files, I link them (llvm-link), optimize them (opt), convert them to machine code (llc), make them a shared library (clang for it's linker wrapper), and dynamically load them.
llc step fails for some of the methods that I am compiling!
Step 1: llvm-link: Merge many bitcode files
I may have many functions calling each other, so I llvm-link the different bitcode files that might interact with each other. This step has no issues. Example:
llvm-link function1.bc function2.bc -o lnk.bc
Step 2: opt: Run optimization passes
For now I am using the following:
opt -O3 lnk.bc -o opt.bc
This step proceeds with no issues, but that's the one that CAUSES the problem!
Also, it's necessary because in the future I will need this step to pass extra passes, e.g. loop-unroll
Step 3: llc: Generate machine code (PIC)
I am using the following command:
llc -march=thumb -arm-reserve-r9 -mcpu=cortex-a9 -filetype=obj -relocation-model pic opt.bc -o obj.o
I have kept the arch specific flags I've set just in case they contribute to the issue. I am using Position Independent Code because on next step I will be building a shared object.
This command fails with the error I've written on top of this answer.
Step 4: clang: Generate Shared Object
For the cases that Step 3 fails, this step isn't reached.
If llc succeeds, this step will succeed too!
Additional information
Configuration
The following run on an llvm3.6, which runs on an arm device.
Things I've noticed
If I omit -O3 (or any other level) with the opt step, then llc would work.
If I don't, and instead I omit them from llc, llc would still fail. Which makes me think that opt -O<level> is causing the issue.
If I use llc directly it will work, but I won't be able to run specific passes that opt allows me, so this is not a option for me.
I've faced this issue ONLY with 2 functions that I've compiled so far (from their original MIR), which use loops. The others produce working code!
If I don't use pic model at llc, it can generate obj.o, but then I'll have problems creating an .so from it!
Questions
Why is this happening??!!
Why opt has -relocation-model option? Isn't that supposed to be just an llc thing? I've tried setting this both at opt and llc to pic, but still fails.
I am using clang because it has a wrapper to a linker to get the the .so. Is there a way to do this step with an LLVM tool instead?
First of all, do not use neither llc nor opt. These are developer-side tools that should be never used in any production environment. Instead of this, implement your own proper optimization and codegeneration runtime via LLVM libraries.
As for this particular bug - the thumb codegenerator might contain some bugs. Please reduce the problem and report it. Or don't use Thumb mode at all :)

TI DM3730 (Design reference: beagleboard) computes wrong floating point operation results

The Situation
We have a board with a TI DM3730 processor (also known from the Beagleboard) with a Cortex A8 core (r3p2) in use with the following parameters:
Beagleboard Reference Design: Beagleboard-xM Rev-C
Kernel version: 3.2.8
Open CV library: 2.4.6
U-Boot: uboot-2013.04
Toolchain: Sourcery CodeBench ARM 2011.03
Buildroot: 2012.02
The setup is derived from this blog
Now we have written a program (written in C++ and compiled with GCC Version 4.5.2.) which uses the OpenCV library (to calculate some scores using support vector machines) and which behaves in some strange way:
The program runs on the board in its own process using defined test data: It produces repeatedly correct results.
The program runs in two or more processes (with the same defined test data): The results start to become wrong for each process, processes die with segfaults. The last remaining process runs correctly again.
The program runs in its own process (with the same defined test data again). Additionally, another process changes some exposure settings of an attached camera: The program starts to produce wrong results.
So we assume this is a very low level floating point problem.
What we tried
The complete system (all libraries, kernel, boot loader, etc.) have been compiled with compiler flags as suggested on the pandorawiki.org regarding Floating_Point_Optimization
-O3 -mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
-ffast-math -fsingle-precision-constant
We tried to enable L1NEON in Cortex-A8 aux ctrl register according to the Beagle board FAQ and tried the other options mentioned there as well, but unfortunately to no avail.
All three different behaviors are reproducible, but not in the form of a minimal working example.
The same program source and the first and second scenario run correctly on Windows (using Visual Studio) and on a desktop running Linux (GCC), so it's probably not something our code does.
So the questions are now:
Are there any other known bugs with this setup and floating point operations which we are not aware of?
Are there any known compiler options which should be set or omitted which can lead to the observed results?
If a MWE would be helpful, we will look into providing one.
Any clues are welcome.
Ok, we now use an up-to-date buildroot (2014.08) with the included toolchain (arm-buildroot-linux-uclibcgueabi-), Linux-kernel 3.9.11, boost 1.55, Qt 4.8.6, and still OpenCV 2.4.6.
When compiling, we optimize for size (–Os) and for target-optimization we only use –pipe.
The following compiler-flags are currently not used anymore:
-mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp -ffast-math -fsingle-precision-constant
Unfortunately, we still don't know the exact reason for the original problem, but we are quite happy that the problem went away with this setup.
So maybe this answer helps some poor soul in the future... ;)

clang interleaved source and assembly

Wondering if it is possible to generate interleaved source and assembly from clang?
I am looking for something equivalent to gcc command (as demonstrated at http://www.fclose.com/240/generate-a-mixed-source-and-assembly-listing-using-gcc/)
gcc -Wa,-adhln -g source_code.c > assembly_list.s
I have visited Link: How do you get assembler output from C/C++ source in gcc? but it gets so far as to list the assembly - but no interleaving.
Also Visual Studio does give you pretty nice interleaved assembly output, details here: How to view the assembly behind the code using Visual C++?
Thank you for all the help.
Sarang
There seems to be a bug reported sometimes last year stating exactly this: http://llvm.org/bugs/show_bug.cgi?id=16647
Bug 16647 - No option to produce mixed source + assembly listing?
So since it is still NEW I guess clang does not have this supported yet.
As an alternative, how about compiling your code and then use objdump -S ? The output format is somewhat similar ...
As of August 2016, the bug that #dragosht mentioned still is open. However, there is a workaround offered by the linked bug 17465: clang -no-integrated-as -Xassembler -adhln. It disables the clang-integrated assembler and calls an external assembler, which hopefully supports the listing-generating options.
That works OK in Linux, but it doesn't work in Mac OS X (as of 10.11.6). The problem is that even the external assembler in OS X does not support the listing-generating options - you can check that with man as.
objdump -S is an alternative that also works well in Linux, but Mac OS X's alternative to objdump is otool, which does provide disassembly but not source interlacing. Hopefully that will change soon-ish, because otool seems to be on its way out while llvm grows its own objdump. See man llvm-otool.
Finally, for OS X the best option seems to be using gobjdump -S, from binutils. It can be installed with MacPorts or brew.
You can Generate Assembly Code from a .cc/.cpp source file by using this command
clang++ -c -S test-function.cc

How to make the Clang Static Analyzer output its working from command line?

I'm running Clang 3.4 on Ubuntu 12.10 (from http://llvm.org/apt/). I ran the analyzer (clang --analyze) over some code, and it found a couple of issues:
Blah.C:429:9: warning: Declared variable-length array (VLA) has zero size
unsigned char separatedData[groupDataLength];
^~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
But the specific issue isn't important. I want to know the steps of how it came to that conclusion (the code is complex enough for me not to see it within 15 mins).
I see a screenshot from the Clang site that shows steps of working viewed in a web browser:
That's probably obtained from Xcode.
The question is: how do I get Clang to output such steps of working from the command line? Or even output results to a browser if it so wishes? This would make the analyzer significantly more useful, and make fixing things much quicker.
(I have noticed that GCC's documentation is very excellent, but Clang/LLVM's documentation is very poor. I've tried "clang --analyze -Xanalyzer '-v'" as a stab in the dark to tell the analyzer to be more verbose -- the -Xanalyzer switch was from the man pages.)
In addition to text output on the console:
clang++ --analyze -Xanalyzer -analyzer-output=text main.cpp
You can get the full html output:
clang++ --analyze -Xanalyzer -analyzer-output=html -o html-dir main.cpp
Additionally, you can select specific checkers to enable. This page lists available checks. For example, you can enable all of the C++ checks in the alpha group using the flags:
-Xanalyzer -analyzer-checker=alpha.cplusplus
http://coliru.stacked-crooked.com/a/7746c4004704d4a7
main.cpp:5:1: warning: Potential leak of memory pointed to by 'x'
}
^
main.cpp:4:12: note: Memory is allocated
int *x = new int;
^~~~~~~
main.cpp:5:1: note: Potential leak of memory pointed to by 'x'
}
^
Apparently the front end exposes
-analyzer-config <Option Name>=<Value>
E.g.
-analyzer-config -analyzer-checker=alpha.cplusplus
which might be better supported than -Xanalyzer and may be getting extended to support options to individual checkers: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-October/039552.html
You are on the right track, but to get the full trace leading to a bug you additionally need to ask clang for output in text format (don't ask why). Since you will probably need to adjust e.g. include paths or defines for your project anyway I'd suggest you use clang-check which acts as a wrapper around clang's analyzer pass. It can also hook into the static analyzer tools exposed in e.g. scan-build. You can then
$ clang-check -analyze -extra-arg -Xclang -extra-arg -analyzer-output=text
Like you wrote the documentation for these very nice tools is abysmal. I cobbled above call together from bits and pieces from Chandler Carruth's GoingNative2013 talk.
You have to use scanbuild: http://clang-analyzer.llvm.org/scan-build.html
You type the commands that generate your build, but you pre-pend them with scan-build.
Example:
instead of
make
type
scan-build make
instead of
./configure
make
type
scan-build ./configure
scan-build make
Clear the build before launching the analyzer, otherwise make will state that everything has been built already and the analyzer will not run.

What information does GCC Profile Guided Optimization (PGO) collect and which optimizations use it?

Which information does GCC collect when I enable -fprofile-generate and which optimization does in fact uses the collected information (when setting the -fprofile-use flag) ?
I need citations here. I've searched for a while but didn't found anything documented.
Information regarding link-time optimization (LTO) would be a plus! =D
-fprofile-generate enables -fprofile-arcs, -fprofile-values and -fvpt.
-fprofile-use enables -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops and -ftracer
Source: http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html#Optimize-Options
PS. Information about LTO also on that page.
"What Every Programmer Should Know About Memory" by Ulrich Drepper
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
http://www.akkadia.org/drepper/cpumemory.pdf
In section 7.4
compilation with --profile-generate generates .gcno file for each object file. (the same file that is used for gcov coverage reports)
then you must run a few tests, during runtime it records coverage data into .gcda files
recompile with --profile-use : it will gather the coverage data and infer if an branch is likely (__builtin_expect( .. , 1 ) or unlikely (__builtin_expect( .. , 0)
The result should run faster as it should be better at prefetching code into the processor instruction cache.