I am trying to find a command line option to specify "generate code for this host your are compiling on, it should take advantage of all the CPU features available, and needs not run on any other system" for llvm.
I have fairly recent llvm versions across all the platforms I use, they are all arm, arm64, x86 or x86-64. All code is either C or C++.
Is there a generic option for this for llvm?
The correct command line parameters are: -mtune=native and -mcpu=native
Related
Suppose you have a cross-compilation tool-chain that produces binaries for the ARM architecture.
Your tool-chain is like this (running on a X86_64 machine with Linux):
arm-linux-gnueabi-gcc.exe : for cross-compilation for Linux, running on ARM.
arm-gcc.exe : for bare-metal cross-compilation targeting ARM.
... and the plethora of other tools for cross-compilation on ARM.
Points that I'm interested in are:
(E)ABI differences between binaries (if any)
limitations in case of bare-metal (like dynamic memory allocations, usage of static constructors in case of C++, threading models, etc)
binary-level differences between the 2 cases in terms of information specific to each of them (like debug info support, etc);
ABI differences is up to how you invoke the compiler, for example GCC has -mabi and that can be one of ‘apcs-gnu’, ‘atpcs’, ‘aapcs’, ‘aapcs-linux’ and ‘iwmmxt’.
On bare-metal limitations for various runtime features exists because someone hasn't provided them. Be them initializing zero allocated areas or providing C++ features. If you can supply them, they will work.
Binary level differences is also up to how you invoke compiler.
You can check GCC ARM options online.
I recently started a little project to use a Linux standard C library in a bare-metal environment. I've been describing it on my blog: http://ellcc.org/blog/?page_id=289
Basically what I've done is set up a way to handle Linux system calls so that by implementing simplified versions of certain system calls I can use functions from the standard library. For example, the current state for the ARM implements simplified versions of read(), readv(), write(), writev() and brk(). This allows me to use printf(), fgets(), and malloc() unchanged.
I'm my case, I use the same compiler for targeting Linux and bare-metal. Since it is clang/LLVM based, I can also use the same compiler to target other processors. I'm working on a bare-metal example for the Mips right now.
So I guess the answer is that there doesn't have to be any difference.
Is it possible to cross-compile D source code for MIPS?
For example, I want to compile a D "Hello, world." program that will run on TI AR7-based devices, which have MIPS32 processor and typically run Linux 2.4.17 kernel with MontaVista patches and uClibc (using the MIPS I generic target; ELF 32-bit LSB executable, MIPS, MIPS-I version 1 SYSV).
http://en.wikipedia.org/wiki/TI-AR7
The reference compiler, DMD, does not generate MIPS code, so you'll have to use GDC and LDC2, which support generating code for whatever architectures their backends support (GCC and LLVM, respectively).
However, it's not a simple as generating the code. To get all of D's features workable, you'll need to port druntime and phobos to MIPS, as druntime is quite architecture specific. Without that, you'll be stuck without a GC, and all the features that entails.
So it is possible, but how possible definitely depends on how dedicated you are.
I have written a benchmark method to test my C++ program (which searches a game tree), and I am noticing that compiling with the "LLVM compiler 2.0" option in XCode 4.0.2 gives me a significantly faster binary than if I compile with the latest version of clang++ from MacPorts.
If I understand correctly I am using a clang front-end and llvm back-end in both cases. Has Apple made improvements to their clang/llvm distribution to produce faster binaries for Mac OS? I can't find much information about the project.
Here are the benchmarks my program produces for various compilers, all using -O3 optimization (higher is better):
(Xcode) "gcc 4.2": 38.7
(Xcode) "llvm gcc 4.2": 51.2
(Xcode) "llvm compiler 2.0": 50.6
g++-mp-4.6: 43.4
clang++: 40.6
Also, how do I compile with the clang/llvm XCode is using from the terminal? I can't find the command.
EDIT: The scores I output are "thousands of games per second" which are calculated over a long enough run of the program. Scores are very consistent over multiple runs, and recent major algorithmic improvements have given me 1% - 5% speed ups, for example. A 25% speed up of 40 to 50 is huge for my program.
UPDATE: I wasn't invoking clang++ from the command line with -flto. Now when I compare clang++ -O3 -flto to /Developer/usr/bin/clang++ -O3 -flto from the command line the results are closer, but the Apple one is still 6.5% faster.
Now how to enable link time optimization for gcc? When I try g++ -flto I get the following error:
cc1plus: error: LTO support has not been enabled in this configuration
Apple LLVM Compiler should be available under /Developer/usr/bin/clang.
I can't think of any particular reason why MacPorts clang++ would generate slower code... I would check whether you're passing in comparable command-line options. One thing that would make a large difference is if you're producing 32-bit code with one compiler, and 64-bit code with the other.
If GCC has no LTO then you need to build it yourself:
http://solarianprogrammer.com/2012/07/21/compiling-gcc-4-7-1-mac-osx-lion/
For LTO you need to add 'libelf' to the instructions.
http://sourceforge.net/apps/trac/mingw-w64/wiki/LTO%20and%20GCC
Exact speed of an algorithm can depend on all kinds of things that are totally out of your's and the compiler's power. You may have a loop where the execution time depends on precisely how the instructions are aligned in memory, in a way that the compiler couldn't predict. I have seen cases where a loop could enter different "states" with different execution times per iteration (so after a context switch, it could enter a state where it took either 12 or 13 cycles, rather randomly). This can all be coincidence.
And you might be using different libraries, which is quite possible the reason. In MacOS X, they are using a new and presumably faster implementation of std::string and std::vector, for example.
Well... When i was searching for a good compiler I came across clang/LLVM. This compiler gives me same result as other compilers like icc, pgi. But the problem is there are very few tutorials on this compiler... Kindly let me know where can I find the tutorials on the clang compiler.
Note by:
I have compiled my c code using the following flags clang -O3 -mfpmath=sse file.c
Clang (the command line compiler) takes gcc-compatible options, but accepts and ignores a lot of flags that GCC takes (like -mfpmath=sse). We aim to generate good code out of the box. There are some flags that allow clang to violate the language standards that can be useful in some scenarios, like -ffast-math though.
If you're looking for good performance, I highly recommend experimenting with link-time-optimization, which allows clang to optimize across source files in your application. Depending on what platform you're on, this is enabled by passing -O4 to the compiler. If you're on linux, you need to use the "gold" linker (see http://llvm.org/docs/GoldPlugin.html). If you're on the mac, it should "just work" with any recent version of Xcode.
The clang is not a compiler, it is just frontend of LLVM compiler. So, when you calls clang, it parses c/c++ file but the optimization and code generation is handled in LLVM itself.
Here you can found a documentation of LLVM optimization and analysis options: http://llvm.org/docs/Passes.html
The full documentation is here http://llvm.org/docs/
Also useful options are listed here http://linux.die.net/man/1/llvmc (I suggest clang will accept most of them too)
Are there c++ compilers that I can run on a brand x system that will create machine code that will run on a brand y system? Would this be considered a cross compiler?
Yes, that's exactly what a cross compiler is.
The GNU Compiler Collection (which includes gcc and g++) are a prime example of a portable cross compiler with hundreds of supported CPU types.
There are many ways to configure GCC to be a cross compiler. Most of it depends on the target machine and what's available from it - mostly because of the C runtime environment. For example, compiling GCC for an ARM Linux target requires an ARM Linux glibc pre-compiled to build the cross compiler libstdc++.
This is quite common using gcc.
There are a number of tutorials around that describe building your own cross-compiler environment.
This is one, but a quick google will probably provide a link to someone doing exactly the mix of environments that you're after.
edit:
This is a more thorough tutorial, using crosstool
Yes there are such compilers; yes, they are called cross compilers. For example, GCC can be configured in this manner, so that I can run the compiler on a 32-bit x86 system, but produce 64-bit code for an x64 system, and that's just the tip of the iceberg.
Crosstool is a really handy tool suite for creating a cross-compiling GCC.
HTH,
Eric Melski
Electric Cloud, Inc.
Yes this is what cross-compilation is about (although I would say "brand X" isn't appropriate classification term).