LLVM opt tool does not vectorize IR generated by clang -O0 - llvm

I'm trying to build JIT compiler based on optimization pipeline borrowed from opt tool.
However I stuck with a problem that my JIT does not vectorize the code.
I tried to reproduce it with opt on simple example here. https://godbolt.org/z/eRKrLa
In that example clang -O3 emits vectorized IR, however if I try to optimize IR generated by clang -O0 it does not do any changes.
What am I doing wrong?

This is expected. The output of clang -O0 is not intended to be re-optimized. You'd need to do something like clang -O3 -mllvm -disable-llvm-optzns

Related

Cross compile C++ for ARM64/x86_64, using clang, with core2-duo enabled

OK, so I am new to cross compilation.
I am writing some shell-scripts to compile some C++ files, on my Mac.
I want to build a "Fat universal binary", so I want this to work for Arm64 and x86_64.
After a lot of searching, I found using: --arch arm64 --arch x86_64 will solve my problem of cross compilation.
However, my old "optimisation flags" conflicted this. I used them to make my code run faster, for a computer-game I was making. Well... these are the flags:
-march=core2 -mfpmath=sse -mmmx -msse -msse2 -msse4.1 -msse4.2
Unfortunately... clang can't figure out that I mean to use this, for the intel build only. I get error message of:
clang: error: the clang compiler does not support '-march=core2'
If I remove --arch arm64 --arch x86_64 the code compiles again.
I tried various things like --arch x86_64+sse4 which seem allowed by the gcc documentation, but clang does not recognise them.
As far as I know, gcc/clang do not compile sse4.2 instructions by default. Despite the CPUs being released about 17 years ago. This is quite a bad assumption I think.

GNU Fortran compiler optimisation flags for Ivy Bridge architecture

May I please ask for your suggestions on the GNU Fortran compiler (v6.3.0) flags to optimise the code for the Ivy Bridge architecture (Intel Xeon CPU E5-2697v2 Ivy Bridge # 2.7 GHz)?
At the moment I’m compiling the code with the following flags:
-O3 -march=ivybridge -mtune=ivybridge -ffast-math -mavx -m64 -w
Unless you use intrinsics specific to Ivy bridge, Sandy bridge flag os sufficient. I expect you should find some advantage by setting additionally -funroll-loops --param max-unroll-times=2
Sometimes -O2 -ftree-vectorize will work out better than -O3.
If you have complex data type you will want to check vs. -fno-cx-limited-range as the default of -ffast-math may be too aggressive.

What is the -O5 flag for compiling gfortran .f90 files?

I see the flag in the documentation of how to compile some f90 code I have acquired (specifically, mpfi90 -O5 file.f90), but researching the -O5 flag turned up nothing in the gfortran docs, mpfi docs, or anywhere else. I assume it is an optimization flag like -O1, etc., but I'm not sure.
Thanks!
Source: http://publib.boulder.ibm.com/infocenter/comphelp/v7v91/index.jsp?topic=%2Fcom.ibm.xlf91a.doc%2Fxlfug%2Fhu00509.htm
The flag -O5 is an optimizer like -O3 and -O2. The linked source says,
qnoopt/-O0 Fast compilation, debuggable code, conserved program
semantics.
-O2 (same as -O) Comprehensive low-level optimization; partial debugging support.
-O3 More extensive optimization; some precision trade-offs.
-O4 and -O5 Interprocedural optimization; loop optimization; automatic machine tuning.
With each higher number containing all the optimizations of the lower levels.

How to generate assembly code with clang in Intel syntax?

As this question shows, with g++, I can do g++ -S -masm=intel test.cpp.
Also, with clang, I can do clang++ -S test.cpp, but -masm=intel is not supported by clang (warning argument unused during compilation: -masm=intel). How do I get intel syntax with clang?
As noted below by #thakis, newer versions of Clang (3.5+) accept the -masm=intel argument.
For older versions, this should get clang to emit assembly code with Intel syntax:
clang++ -S -mllvm --x86-asm-syntax=intel test.cpp
You can use -mllvm <arg> to pass in llvm options from the clang command line. Sadly this option doesn't appear to be well documented, and thus I only found it by browsing through the llvm mailing lists.
As of clang r208683 (clang 3.5+), it understands -masm=intel. So if your clang is new enough, you can just use that.
Presuming you can have Clang emit normal LLVM byte codes, you can then use llc to compile to assembly language, and use its --x86-asm-syntax=intel option to get the result in Intel syntax.

What is the difference between -O0 ,-O1 and -g

I am wondering about the use of -O0,-O1 and -g for enabling debug symbols in a lib.
Some suggest to use -O0 to enable debug symbols and some suggest to use -g.
So what is the actual difference between -g and -O0 and what is the difference between -01 and -O0 and which is best to use.
-O0 is optimization level 0 (no optimization, same as omitting the -O argument)
-O1 is optimization level 1.
-g generates and embeds debugging symbols in the binaries.
See the gcc docs and manpages for further explanation.
For doing actual debugging, debuggers are usually not able to make sense of stuff that's been compiled with optimization, though debug symbols are useful for other things even with optimization, such as generating a stacktrace.
-OX specify the optimisation level that the compiler will perform. -g is used to generate debug symbols.
From GCC manual
http://gcc.gnu.org/onlinedocs/
3.10 Options That Control Optimization`
-O
-O1
Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.`
-O2
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.`
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-cp-clone options.`
-O0
Reduce compilation time and make debugging produce the expected results. This is the default. `
-g
Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF 2). GDB can work with this debugging information.`
-O0 doesn't enable debug symbols, it just disables optimizations in the generated code so debugging is easier (the assembly code follows the C code more or less directly). -g tells the compiler to produce symbols for debugging.
It's possible to generate symbols for optimized code (just continue to specify -g), but trying to step through code or set breakpoints may not work as you expect because the emitted code will likely not "follow along" with the original C source closely. So debugging in that situation can be considerably trickier.
-O1 (which is the same as -O) performs a minimal set of optimizations. -O0 essentially tells the compiler not to optimize. There are a slew of options that allow a very fine control over how you might want the compiler to perform: http://gcc.gnu.org/onlinedocs/gcc-4.6.3/gcc/Optimize-Options.html#Optimize-Options
As mentioned by others, -O set of options indicate the levels of optimization that must be done by the compiler whereas, the -g option adds the debugging symbols.
For a more detailed understanding, please refert to the following links
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options