Clang's different stages of processing - c++

Similar to GCC, clang supports stopping at different stages when processing C/C++. For example, passing a -E flag causes it to stop after the pre-processor and -c stops before linking.
So far, I am aware of,
-E : pre-processing
-fsyntax-only : syntax checking
-S : assembly
-c : object code
Am I missing any stopping points between those, or is that it?

You can also use -S -emit-llvm to generate LLVM IR assembly files and just -emit-llvm for LLVM bitcode object files. These are the language-independent code representations that clang and other LLVM front-ends generate and pass to LLVM to compile into an executable.

Related

Is there a Gfortran flag similar to intel ifort's -ipo-c, that would generate single optimized object from all compiled files?

Intel Fortran compiler/linker has the optional flag -ipo-c or /Qipo-c which enables the generation of a single interprocedurally-optimized object file from all files, which can be later used for linking. Is there an equivalent flag to Intel's -ipo-cin gfortran?
GCC has -fwhole-program, does that work for gfortran?
Or if you don't want to pass all the Fortran source files on one giant command line, there's -flto link-time optimization which uses a linker "plugin" to run the optimizer on GIMPLE stored in .o files (instead of or as well as machine code).
LTO means you should pass all your optimization options to the invocation of gfortran that does the linking, as well as the gfortran -c that compiles to .o.
So you might use gfortran -ffast-math -O3 -march=native -flto to compile and link, assuming gfortran supports the same options as gcc. (And that -march=native is what you want: make an executable optimized for the computer you compiled on, which might SIGILL on other computers without all the ISA extensions this one supports.)

Object code generation for new RISCV instruction emitted by LLVM backend

From https://github.com/riscv/riscv-llvm,
Using the llvm-riscv is fairly simple to build a full executable
however you need riscv64-unknown-*-gcc to do the assembling and
linking. An example of compiling hello world:
$ clang -target riscv64 -mriscv=RV64IAMFD -S hello.c -o hello.S
$ riscv64-unknown-elf-gcc -o hello.riscv hello.S
My question is: if I change the LLVM backend and get it to emit a new instruction in the hello.S file, how will riscv64-unknown-elf-gcc know how to convert it into object code? Do I also need to make changes in riscv64-unknown-elf-gcc so that it knows the format of the new instruction?
riscv64-unknown-elf-gcc calls as, i.e. usually GNU as from the binutils to assemble assembly code (i.e. hello.S in your snippet) into executable machine code. Thus you would have to modify the binutils if you want to assemble a new instruction.

Does "clang -S -emit-llvm file.cpp" run any other executables except clang?

Does clang -S -emit-llvm file.cpp (compiling c++ source code to LLVM IR) run any other executables except clang behind the scene (like linker or smth)?
Kind of. Clang will spawn another instance of clang, as what you're really spawning with this kind of invocation is merely the driver, which then runs the compiler proper and then possibly invokes the assembler, the linker and any other tools necessary — but none are needed in the case of just -S -emit-llvm.
You can see it for yourself by running Clang with -v, it will print all the processes spawned, their arguments and their output.

Get the compiler options from a compiled executable?

It there a way to see what compiler and flags were used to create an executable file in *nix? I have an old version of my code compiled and I would like to see whether it was compiled with or without optimization. Google was not too helpful, but I'm not sure I am using the correct keywords.
gcc has a -frecord-gcc-switches option for that:
-frecord-gcc-switches
This switch causes the command line that was used to invoke the compiler to
be recorded into the object file that is being created. This switch is only
implemented on some targets and the exact format of the recording is target
and binary file format dependent, but it usually takes the form of a section
containing ASCII text.
Afterwards, the ELF executables will contain .GCC.command.line section with that information.
$ gcc -O2 -frecord-gcc-switches a.c
$ readelf -p .GCC.command.line a.out
String dump of section '.GCC.command.line':
[ 0] a.c
[ 4] -mtune=generic
[ 13] -march=x86-64
[ 21] -O2
[ 25] -frecord-gcc-switches
Of course, it won't work for executables compiled without that option.
For the simple case of optimizations, you could try using a debugger if the file was compiled with debug info. If you step through it a little, you may notice that some variables were 'optimized out'. That suggests that optimization took place.
If you compile with the -frecord-gcc-switches flag, then the command line compiler options will be written in the binary in the note section. See also the docs.
Another option is -grecord-gcc-swtiches (note, not -f but -g). According to gcc docs it'll put flags into dwarf debug info. And looks like it's enabled by default since gcc 4.8.
I've found dwarfdump program to be useful to extract those cflags. Note, strings program does not see them. Looks like dwarf info is compressed.
As long as the executable was compiled by gcc with -g option, the following should do the trick:
readelf --debug-dump=info /path/to/executable | grep "DW_AT_producer"
For example:
% cat test.c
int main() {
return 42;
}
% gcc -g test.c -o test
% readelf --debug-dump=info ./test | grep "DW_AT_producer"
<c> DW_AT_producer : (indirect string, offset: 0x2a): GNU C17 10.2.0 -mtune=generic -march=x86-64 -g
Sadly, clang doesn't seem to record options in similar way, at least in version 10.
Of course, strings would turn this up too, but one has to have at least some idea of what to look for as inspecting all the strings in real-world binary with naked eyes is usually impractical. E.g. with the binary from above example:
% strings ./test | grep march
GNU C17 10.2.0 -mtune=generic -march=x86-64 -g -O3
This is something that would require compiler support. You don't mention what compiler you are using but since you tagged your question linux I will assume you are using gcc -- which does not default the feature you're asking about (but -frecord-gcc-switches is an option to perform this).
If you want to inspect your binary, the strings command will show you everything that appears to be a readable character string within the file.
If you still have the compiler (same version) you used, and it is only one flag you're unsure about, you can try compiling your code again, once with and once without the flag. Then you can compare the executables. Your old one should be identical, or very similar, to one of the new ones.
I highly doubt it is possible:
int main()
{
}
When compiled with:
gcc -O3 -ffast-math -g main.c -o main
None of the parameters can be found in the generated object:
strings main | grep -O3
(no output)

Compiling previously preprocessed file changes output

I have a source file which I preprocess using the options -E and -P (using GCC 4.1.2 for a vxWorks-based embedded platform). All other options are the same as when I compile the file. These options are:
-Wall
-march=pentium
-nostdinc
-O0
-fno-builtin
-fno-defer-pop
-g
-c
-o
as well as all include-paths. Now when I compile this preprocessed file, the resulting object-file is much smaller (about 30%) than when I compile the original directly. And when I then link the program, the linker complains about missing symbols (all in user-code), which again does not happen when using the original source-file. Why is there a difference? Is there any way to make this work?
You're sure you're not missing any -D defines from your command line? Your result would be consistent with parts not being compiled due to conditionals.
Another possibility (since you don't name the compiler specifically) is that you're using a generic gcc -E rather than the arch-specific cross compiler for your vxWorks environment. The cross-gcc will predefine some variables that you'll need for gcc -E.
When compiling the preprocessed output, try passing the -fpreprocessed option to tell GCC not to preprocess again.
The only difference I can think of is macros that result in expanding to an identifier that's a macro name that has already been expanded - the preprocessor stops expansion at that point, but if you ran the preprocessor again, the identifier would be expanded again. I would have expected any instances of this to probably cause a compiler error, but who knows?