How to reproduce clang's -O2 optimizations with LLVM tools? - llvm

Suppose I generate an unoptimized foo.ll using clang -S -emit-llvm foo.c.
What steps do I need to take to get from foo.ll to optimized.s that's optimized as if it was produced by clang directly using -O2?
(opt -S -O2 foo.ll -o optimized.ll; llc optimized.ll is not it.)

You can compile using
llc -O2 -optimize-regalloc foo.ll
-optimize-regalloc allows better physical register allocation.

Related

LLVM why I get nothing in llc -print-after=stack-protector?

I have the same problem with Run LLVM pass with opt
I followed the
llc -print-before=stack-protector hello.bc
but got nothing.
the -print-before-all works well, but there is no stack-protector pass in the output.
I used the clang -emit-llvm -S -fstack-protector hello.c -o hello.bc and the stack protector will be generated. So the pass worked, but why cannot be printed?
llvm version: 4.0

Vectorization with gcc5 gives no information

I was trying some vectorisation after upgrading g++ from version 4.8.5 to 5.4.1. With this flags:
g++ particles-v3.cpp -o v3 -O3 -msse4.2 -mfpmath=sse -ftree-vectorizer-verbose=5 -ffast-math -m32 -march=native -std=c++11
While the same command gives over 4000 lines of detailed information about the vectorization with g++-4.8, with g++-5.4 it does not say anything.
Is there some major change in g++-5 that makes the -ftree-vectorizer-verbose=X unusable, or is there simply somethin wrong in the line? How to change it so that it works?
EDIT:
found out that using -fopt-info-vec-all gives exacty the info I wanted. Thus question solved.

What is option -O3 for g++ and nvcc?

I'm new to nvcc and I've seen a library where compilation is done with option -O3, for g++ and nvcc.
CC=g++
CFLAGS=--std=c++11 -O3
NVCC=nvcc
NVCCFLAGS=--std=c++11 -arch sm_20 -O3
What is -O3 doing ?
It's optimization on level 3, basically a shortcut for
several other options related to speed optimization etc. (see link below).
I can't find any documentation on it.
... it is one of the best known options:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#options-for-altering-compiler-linker-behavior

Unable to link PAPI library with opt llvm

I am working on a project where I need to generate just the bitcode using clang, run some optimization passes using opt and then create an executable and measure its hardware counters.
I am able to link through clang directly using:
clang -g -O0 -w -I/opt/apps/papi/5.3.0/include -Wl,-rpath,$PAPI_LIB -L$PAPI_LIB \
-lpapi /scratch/02681/user/papi_helper.c prog.c -o a.out
However now I want to link it after using the front end of clang and applying optimization passes using opt.
I am trying the following way:
clang -g -O0 -w -c -emit-llvm -I/opt/apps/papi/5.3.0/include -Wl,-rpath,$PAPI_LIB -L$PAPI_LIB \
-lpapi /scratch/02681/user/papi_helper.c prog.c -o prog.o
llvm-link prog.o papi_helper.o -o prog-link.o
// run optimization passes
opt -licm prog-link.o -o prog-opt.o
llc -filetype=obj prog-opt.o -o prog-exec.o
clang prog-exec.o
After going through the above process I get the following error:
undefined reference to `PAPI_event_code_to_name'
It's not able to resolve papi functions. Thanks in advance for any help.
Clearly, you need to add -lpapi to the last clang invocation. How else the linker would know about libpapi?

what optimization passes are done for -O4 in clang?

We are trying to implement a jit compiler whose performance is supposed to be same as doing it with clang -o4. Is there a place where I could easily get the list of optimization passes invoked by clang with -o4 is specified?
As far as I know -O4 means same thing as -O3 + enabled LTO (Link Time Optimization).
See the folloing code fragments:
Tools.cpp // Manually translate -O to -O2 and -O4 to -O3;
Driver.cpp // Check for -O4.
Also see here:
You can produce bitcode files from clang using -emit-llvm or -flto, or the -O4 flag which is synonymous with -O3 -flto.
For optimizations used with -O3 flag see this PassManagerBuilder.cpp file (look for OptLevel variable - it will have value 3).
Note that as of LLVM version 5.1 -O4 no longer implies link time optimization. If you want that you need to pass -flto. See Xcode 5 Release Notes.