Object code generation for new RISCV instruction emitted by LLVM backend - llvm

From https://github.com/riscv/riscv-llvm,
Using the llvm-riscv is fairly simple to build a full executable
however you need riscv64-unknown-*-gcc to do the assembling and
linking. An example of compiling hello world:
$ clang -target riscv64 -mriscv=RV64IAMFD -S hello.c -o hello.S
$ riscv64-unknown-elf-gcc -o hello.riscv hello.S
My question is: if I change the LLVM backend and get it to emit a new instruction in the hello.S file, how will riscv64-unknown-elf-gcc know how to convert it into object code? Do I also need to make changes in riscv64-unknown-elf-gcc so that it knows the format of the new instruction?

riscv64-unknown-elf-gcc calls as, i.e. usually GNU as from the binutils to assemble assembly code (i.e. hello.S in your snippet) into executable machine code. Thus you would have to modify the binutils if you want to assemble a new instruction.

Related

Add custom llvm optimization command (opt) after compilation within CMake projects

I created my own LLVM optimization pass in LLVM 3.7.0 .
I want to use this pass within a cmake project.
I need to run the pass as last, after all optimization passes of -O2 (or -O3) are executed by clang (or clang++).
Unfortunately, I did not find a mechanism to invoke the pass by passing the flag directly from the clang (if you point me out a way to do that, this would already be helpful).
Assuming there is no way to run the pass by giving a flag to clang, I need an extra optimization pass in my toolchain to be placed between the compilation and the linking phase. I need it throughout the whole cmake project.
The commands I would need to generate a binary from a two source files are:
clang -c -g -emit-llvm -O3 mySource0.c -o mySource0.bc
clang -c -g -emit-llvm -O3 mySource1.c -o mySource.bc
llvm-link mySource0.bc mySource1.bc -o main.bc
opt -load myAnalysis.so -myAnalysis main.bc -o main.analysis.bc
clang <libraryRelatedFlags> main.analysis.bc -o myExecutable
My pass is registered as:
static RegisterPass<myAnalysis> X("myAnalysis", "Implement my analysis", false, false);
as in:
http://llvm.org/docs/WritingAnLLVMPass.html#basic-code-required
If I understand your question correctly, you are simply aiming to add your pass so that it is run under the -O3.
You'll need to edit $(llvm-dir)/tools/opt/opt.cpp to get your pass to run -O3. You'll need to find where the OptLevelO3 bool is used to add passes and make sure to add your pass there as well.
If instead you just want your pass to be run on it's own flag you'll need to initialize your pass properly on top of registering it. We can look at DependenceAnalysis.cpp as a good example of how to do this:
INITIALIZE_PASS_BEGIN(DependenceAnalysis, "da", "Dependence Analysis", true, true)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
INITIALIZE_AG_DEPENDENCY(AliasAnalysis)
INITIALIZE_PASS_END(DependenceAnalysis, "da", "Dependence Analysis", true, true)
You also mentioned that you want your pass to run after some other passes. Simply mark them as DA did it with:
INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
to make sure that your pass is run AFTER the pass that you want to depend on.

Why does a 2-stage command-line build with clang not generate a dSYM directory?

I have a simple project I want to debug want to produce dSYM folder with debugging symbols.
Running:
clang++ -std=c++14 -stdlib=libc++ -g -o Lazy Lazy.cpp
Creates Lazy.dSYM as I expect.
However:
clang++ -std=c++14 -stdlib=libc++ -g -c Lazy.cpp
clang++ -stdlib=libc++ -g -o Lazy Lazy.o
Does not create Lazy.dSYM (It seems that the symbols are embedded in the binary).
Sadly the 2-step build is what my modified makefile does. How can I generate Lazy.dSYM from a 2-stage compile-and-link build?
I don't need a dSYM directory, just debugging symbols, but would like to understand when and why it is created.
The creation of the .dSYM bundle is done by a tool called dsymutil. When Apple added support for DWARF debugging information, they decided to separate "executable linking" from "debug information linking". As such, the debug information linking is not done by the normal linker, it's done by dsymutil.
As a convenience, when you build a program all in one step, the compiler invokes dsymutil on your behalf. That's because it knows it has all of the inputs. If you add the -v (a.k.a. --verbose) option to the compile command, you will see the invocation of dsymutil as the last step it does.
In other cases, though, it doesn't do that. It leaves the debug information linking step for the user to do manually. You can do it by simply issuing the command:
dsymutil <your_program>
Here's an article by an Apple engineer who helped design and implement Apple's support for DWARF explaining their thinking. He also answered a question here on Stack Overflow about this stuff.

Clang's different stages of processing

Similar to GCC, clang supports stopping at different stages when processing C/C++. For example, passing a -E flag causes it to stop after the pre-processor and -c stops before linking.
So far, I am aware of,
-E : pre-processing
-fsyntax-only : syntax checking
-S : assembly
-c : object code
Am I missing any stopping points between those, or is that it?
You can also use -S -emit-llvm to generate LLVM IR assembly files and just -emit-llvm for LLVM bitcode object files. These are the language-independent code representations that clang and other LLVM front-ends generate and pass to LLVM to compile into an executable.

F77: problem to compile with g77 a program which was normally compiled with Absoft compiler

I am not a Fortran programmer (just a short experience), but I need to compile a program partly written in F77. Someone has compiled it with Absoft compiler before me, but now I need to repeat the procedure on another machine with g77. For Absoft, the makefile has
f77 -f -w -O -B100 -B108 -c *.f
mv *.f flib && mv *.o olib
f77 -B100 -o runme olib/*.o clib/*.o -L/usr/X11R6/lib64 -L/usr/X11R6/lib -lX11 -L$PVM_ROOT/lib/$PVM_ARCH -lfpvm3 -lpvm3 -L$ABSOFT/lib -lU77
I have modified these lines to be
g77 -w -O -B100 -B108 -c *.f
mv *.f flib && mv *.o olib
g77 -B100 -o runme olib/*.o clib/*.o -L/usr/X11R6/lib64 -L/usr/X11R6/lib -lX11 -L$PVM_ROOT/lib/$PVM_ARCH -lfpvm3 -lpvm3 -lgfortran -lgfortranbegin
But I get the following error messages
somefile.f:(.text+0x93): undefined reference to `for_open'
somefile.f:(.text+0xf4): undefined reference to `for_write_seq_fmt'
somefile.f:(.text+0x128): undefined reference to `for_write_seq_fmt_xmit'
somefile.f:(.text+0x454): undefined reference to `for_read_seq'
How can I fix this?
UPDATE1
If I add -libifcore to the end of the last line (linker), then I get
/usr/bin/ld: cannot find -libifcore
I have located the library
$ find /opt/intel/* -name 'libifcore*'
/opt/intel/fce/9.1.036/lib/libifcore.a
/opt/intel/fce/9.1.036/lib/libifcore.so
/opt/intel/fce/9.1.036/lib/libifcore.so.5
/opt/intel/fce/9.1.036/lib/libifcore_pic.a
/opt/intel/fce/9.1.036/lib/libifcoremt.a
/opt/intel/fce/9.1.036/lib/libifcoremt.so
/opt/intel/fce/9.1.036/lib/libifcoremt.so.5
/opt/intel/fce/9.1.036/lib/libifcoremt_pic.a
But even if I do the following in the source directory
$ export PATH=$PATH:/opt/intel/fce/9.1.036/lib/
$ ln -s /opt/intel/fce/9.1.036/lib/libifcore.so
it is not found.
Moreover, it is the same machine where I get another problem How to pass -libm to MPICC? libimf.so: warning: feupdateenv is not implemented and will always fail
It seems that the compiler should find the library, if needed
$ echo $LD_LIBRARY_PATH
/opt/intel/fce/9.1.036/lib:/opt/intel/cce/9.1.042/lib:/usr/local/lib/openmpi:/usr/local/lib:/usr/lib:
Absoft accepted an extended version of Fortran 77 that is not completely compatible with the extended version of Fortran 77 accepted by g77.
So there is no guarantee that you can do this without editing the code. I seem to recall that the Absoft compiler accepted a handy initialization syntax that can not be replicated with g77.
If you want to compile & link using g77, the easiest way is to use the command "g77". (What compiler does f77 invoke on your computer? Try "f77 -v" or similar to find out...) It should automatically find the g77 Fortran-specific libraries. You should not need to explicitly link to Fortran libraries, and especially not to the libraries of gfortran, which is a different compiler. You could also compile & link with gfortran -- it will probably recognize that the source code is Fortran 77 and compile appropriately if the files have the correct file type, otherwise you will have to use options -- for this compiler, use the command "gfortran".
With g77 and gfortran it should not need Intel libraries -- maybe f77 is connected to ifort, the Intel compiler, on your computer?
Edited later:
I suggest trying something simpler first to test your setup.
Try this FORTRAN 77 program as file "junk.f"
C234567
write (6, *) "Hello World"
stop
end
Try this command:
g77 junk.f -o junk.exe
Run it via:
./junk.exe
This will test whether g77 is working.
it looks like you are trying to link with libifcore.
Edit:
You can include this library by adding
'-lifcore' to your compiler options. To quote the gcc tutorial
In general, the compiler option -lNAME will attempt to link object files with a library file ‘libNAME.a’ in the standard library directories.
why do you use g77 and not gfortran?
what do you mean with multiprocessing? openmp or vectorized?
you can use openmp with the gfortran compiler and when you want to use vector mode like the ifort compiler does, you have to specify sse explicitly in the compiler options.
It seems that the problem was in an error in one of the source files, which wasn't a big deal for Absoft compiler. g77 was giving a warning about it, but compiling this file and producing the original errors (mentioned in the question) without a binary.
When I tried ifort, compilation of that file was aborted, but other files were compiled and a binary was created.
fortcom: Error: somefile.f, line 703: An extra comma appears in the format list. [)]
& (1p5e12.3,5h ...,))
-------------------------^
compilation aborted for somefile.f (code 1)
When I removed the extra comma, then both compilers have compiled everything and created binaries, although ifort produced a number of warnings.
Then, when I tried to run both binaries, the one made by Intel comiler was working fine, but the one by g77 was behaving very strange and didn't really do what I wanted.
So now the original problem is resolved, however the code doesn't run in multiprocessing mode, so the binary is unfortunately useless for me.

GCC dies trying to compile 64bit code on OSX 10.6

I have a brand-new off-the-cd OSX 10.6 installation.
I'd now like to compile the following trivial C program as a 64bit binary:
#include <stdio.h>
int main()
{
printf("hello world");
return 0;
}
I invoke gcc as follows:
gcc -m64 hello.c
However, this fails with the following error:
Undefined symbols:
"___gxx_personality_v0", referenced from:
_main in ccUAOnse.o
CIE in ccUAOnse.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
What's going on here? Why is gcc dying?
Compiling without the -m64 flag works fine.
Two things:
I don't think you actually used gcc -m64 hello.c. The error you got is usually the result of doing something like gcc -m64 hello.cc- using the C compiler to compile C++ code.
shell% gcc -m64 hello.c
shell% ./a.out
hello world [added missing newline]
shell% cp hello.c hello.cc
shell% gcc -m64 hello.cc
Undefined symbols:
"___gxx_personality_v0", referenced from:
_main in ccYaNq32.o
CIE in ccYaNq32.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
You can "get this to work" with the following:
shell% gcc -m64 hello.cc -lstdc++
shell% ./a.out
hello world
Second, -m64 is not the preferred way of specifying that you'd like to generate 64-bit code on Mac OS X. The preferred way is to use -arch ARCH, where ARCH is one of ppc, ppc64, i386, or x86_64. There may be more (or less) architectures available depending on how your tools are set up (i.e., iPhone ARM, ppc64 deprecated, etc). Also, on 10.6, gcc defaults to -arch x86_64, or generating 64-bit code by default.
Using this style, it's possible to have the compiler create "fat binaries" automatically- you can use -arch multiple times. For example, to create a "Universal Binary":
shell% gcc -arch x86_64 -arch i386 -arch ppc hello.c
shell% file a.out
a.out: Mach-O universal binary with 3 architectures
a.out (for architecture x86_64): Mach-O 64-bit executable x86_64
a.out (for architecture i386): Mach-O executable i386
a.out (for architecture ppc7400): Mach-O executable ppc
EDIT: The following was added to answer the OPs question "I did make a mistake and call my file .cc instead of .c. I'm still confused about why this should matter?"
Well... that's a sort of complicated answer. I'll give a brief explanation, but I'll ask that you have a little faith that "there's actually a good reason."
It's fair to say that "compiling a program" is a fairly complicated process. For both historical and practical reasons, when you execute gcc -m64 hello.cc, it's actually broken up in to several discrete steps behind the scenes. These steps, each of which usually feeds the result of each step to the next step, are approximately:
Run the C Pre-Processor, cpp, on the source code that is being compiled. This step is responsible for performing all the #include statements, various #define macro expansions, and other "pre-processing" stuff.
Run the C compiler proper on the C Pre-Processed results. The output of this step is a .s file, or the result of the C code compiled to assembly language.
Run the as assembler on the .s source. This assembles the assembly language in to a .o object file.
Run the ld linker on the .o file(s) to link the various compiled object files and various static and dynamically linked libraries in to a useable executable.
Note: This is a "typical" flow for most compilers. An individual implementation of a compiler doesn't have to follow the above steps. Some compilers combine multiple steps in to one for performance reasons. Modern versions of gcc, for example, don't use a separate cpp pass. The tcc compiler, on the other hand, performs all the above steps in one pass, using no additional external tools or intermediate steps.
In the above, traditional compiler tool chain flow, the cc (or, in our case, gcc) command is called a "compiler driver". It's a "logical front end" to all of the above tools and steps and knows how to intelligently apply all the steps and tools (like the assembler and linker) in order to create a final executable. In order to do this, though, it usually needs to know "the kind of" file it is dealing with. You can't really feed an assembled .o file to the C compiler, for example. Therefore, there are a couple of "standard" .* designations used to specify the "kind" of file (see man gcc for more info):
.c, .h C source code and C header files.
.m Objective-C source code.
.cc, .cp, .cpp, .cxx, .c++ C++ Source code.
.hh C++ header file.
.mm, .M Objective-C++ source code.
.s Assembly language source code.
.o Assembled object code.
.a ar archive or static library.
.dylib Dynamic shared library.
It's also possible to over-ride this "automatically determined file type" using various compiler flags (see man gcc for how to do this), but it's generally MUCH easier to just stick with the standard conventions so that everything "just works" automatically.
And, in a round about way, if you had used the C++ "compiler driver", or g++, in your original example, you wouldn't have encountered this problem:
shell% g++ -m64 hello.cc
shell% ./a.out
hello world
The reason for this is gcc essentially says "Use C rules when driving the tool chain" and g++ says "Use C++ rules when driving the tool chain". g++ knows that to create a working executable, it needs to pass -lstdc++ to the linker stage, whereas gcc obviously doesn't think this is necessary even though it knew to use the C++ compiler at the "Compile the source code" stage because of the .cc file ending.
Some of the other C/C++ compilers available to you on Mac OS X 10.6 by default: gcc-4.0, gcc-4.2, g++-4.0, g++-4.2, llvm-gcc, llvm-g++, llvm-gcc-4.0, llvm-g++-4.0, llvm-gcc-4.2, llvm-g++-4.2, clang. These tools (usually) swap out the first two steps in the tool chain flow and use the same lower-level tools like the assembler and linker. The llvm- compilers use the gcc front end to parse the C code and turn it in to an intermediate representation, and then use the llvm tools to transform that intermediate representation in to code. Since the llvm tools use a "low-level virtual machine" as its near-final output, it allows for a richer set of optimization strategies, the most notable being that it can perform optimizations across different, already compiled .o files. This is typically called link time optimization. clang is a completely new C compiler that also targets the llvm tools as its output, allowing for the same kinds of optimizations.
So, there you go. The not so short explanation of why gcc -m64 hello.cc failed for you. :)
EDIT: One more thing...
It's a common "compiler driver technique" to have commands like gcc and g++ sym-link to the same "all-in-one" compiler driver executable. Then, at run time, the compiler driver checks the path and file name that was used to create the process and dynamically switch rules based on whether that file name ends with gcc or g++ (or equivalent). This allows the developer of the compiler to re-use the bulk of the front end code and then just change the handful of differences required between the two.
I don't know why this happens (works fine for me), but
Try compile with g++, or link to libstdc++. ___gxx_personality_v0 is a symbol used by GNU C++ to set up the SjLj callback for the destructors, so for some reason C++ code creeps into your C code. or
Remove the -m64 flag. Binaries generated by GCC 4.2 on 10.6 defaults to 64-bit as far as I know. You can check by file the output and ensure it reads "Mach-O 64-bit executable x86_64". or
Reinstall the latest Xcode from http://developer.apple.com/technology/xcode.html.
OK:
adding -m64 doesn't do anything, a normal gcc with no options is a 64-bit compile
if you are really just off-the-cd then you should update and install a new xcode
your program works fine for me with or without -m64 on 10.6.2
We had a similar issue when working with CocoaPods when creating and using a pod that contained Objective-C++ code so I thought it's also worth mentioning:
You should edit the .podspec of the pod that contains the c++ code, and add a line like:
s.xcconfig = {
'OTHER_LDFLAGS' => '$(inherited) -lstdc++',
}