How to generate machine code with llvm - c++

I'm currently working on a compiler project using llvm. I have followed various tutorials to the point where I have a parser to create a syntax tree and then the tree is converted into an llvm Module using the provided IRBuilder.
My goal is to create an executable, and I am confused as what to do next. All the tutorials I've found just create the llvm module and print out the assembly using Module.dump(). Additionally, the only documentation I can find is for llvm developers, and not end users of the project.
If I want to generate machine code, what are the next steps? The llvm-mc project looks like it may do what I want, but I can't find any sort of documentation on it.
Perhaps I'm expecting llvm to do something that it doesn't. My expectation is that I can build a Module, then there would be an API that I can call with the Module and a target triple and an object file will be produced. I have found documentation and examples on producing a JIT, and I am not interested in that. I am looking for how to produce compiled binaries.
I am working on OS X, if that has any impact.

Use llc -filetype=obj to emit a linkable object file from your IR. You can look at the code of llc to see the LLVM API calls it makes to emit such code. At least for Mac OS X and Linux, the objects emitted in such a manner should be pretty good (i.e. this is not a "alpha quality" option by now).
LLVM does not contain a linker (yet!), however. So to actually link this object file into some executable or shared library, you will need to use the system linker. Note that even if you have an executable consisting of a single object file, the latter has to be linked anyway. Developers in the LLVM community are working on a real linker for LLVM, called lld. You can visit its page or search the mailing list archives to follow its progress.

As you can read on the llc guide, it is indeed intended to just generate the assembly, and then "The assembly language output can then be passed through a native assembler and linker to generate a native executable" - e.g. the gnu assembler (as) and linker (ld).
So the main answer here is to use native tools for assembling and linking.
However, there's experimental support for generating the native object directly from an IR file, via llc:
-filetype - Choose a file type (not all types are supported by all targets):
=asm - Emit an assembly ('.s') file
=obj - Emit a native object ('.o') file [experimental]
Or you can use llvm-mc to assemble it from the .s file:
-filetype - Choose an output file type:
=asm - Emit an assembly ('.s') file
=null - Don't emit anything (for timing purposes)
=obj - Emit a native object ('.o') file
I don't know about linkers, though.
In addition, I recommend checking out the tools/bugpoint/ToolRunner.h file, which exposes a wrapper combining llc and the platform's native C toolchain for generating machine code. From its header comment:
This file exposes an abstraction around a platform C compiler, used to compile C and assembly code.

Check out these functions in llvm-c/TargetMachine.h:
/** Emits an asm or object file for the given module to the filename. This
wraps several c++ only classes (among them a file stream). Returns any
error in ErrorMessage. Use LLVMDisposeMessage to dispose the message. */
LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M,
char *Filename, LLVMCodeGenFileType codegen, char **ErrorMessage);
/** Compile the LLVM IR stored in \p M and store the result in \p OutMemBuf. */
LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T, LLVMModuleRef M,
LLVMCodeGenFileType codegen, char** ErrorMessage, LLVMMemoryBufferRef *OutMemBuf);

To run the example BrainF program, compile it and run:
echo ,. > test.bf
./BrainF test.bf -o test.bc
llc -filetype=obj test.bc
gcc test.o -o a.out
./a.out
then type a single letter and press Enter. It should echo that letter back to you. (That's what ,. does.)
The above was tested with LLVM version 3.5.0.

Related

How can I isolate the compiled logic of a C++ program from the rest of the file header data?

I want to observe the difference in op code binary output of compilation between two versions of a very basic C++ program. For example, 2 + 2 = ?, with no libraries called. I expected the compiled output to be a tiny file of binary op codes with a few small headers, being new to compiled programs, but there are large headers.
simple.cpp
int main()
{
unsigned int a = 2;
unsigned int b = 2;
unsigned int c = a + b;
}
compiler:
g++ -std=c++0x simple.cpp -o simple
Is there a format that I can export to that doesn't contain headers, just op code binary that we instruct the machine to execute? If not, what bytes or location in the resulting file can I look for to isolate the relevant logic from the program?
I need the machine code, not assembly, since my project is the analysis of differently obfuscated versions of a source file to attempt recognizing one based on the other. A complex subject with questionable feasibility, but nevertheless that's why I'm asking to isolate the machine code and not just the assembly - to test analysis against the true machine code outputs.
I tried googling the header structure but can't seem to find much info.
Seeing ld(1): GNU linker - Linux man page, you will find that you can use --oformat=output-format option to specify output format.
binary is a format that don't have headers.
Then, seeing gcc(1): GNU project C/C++ compiler - Linux man page, you will find that you can use -Wl option to pass options to the linker.
-nostdlib option is also useful to avoid extra things added.
Combining these, you can try this command:
g++ -std=c++0x simple.cpp -nostdlib -Wl,--oformat=binary -o simple

llvm JIT add library to module

I am working on a JIT that uses LLVM. The language has a small run-time written in C++ which I compile down to LLVM IR using clang
clang++ runtime.cu --cuda-gpu-arch=sm_50 -c -emit-llvm
and then load the *.bc files, generate additional IR, and execute on the fly. The reason for the CUDA stuff is that I want to add some GPU acceleration to the runtime. However, this introduces CUDA specific external functions which gives errors such as:
LLVM ERROR: Program used external function 'cudaSetupArgument' which could not be resolved!
As discussed here, this is usually solved by including the appropriate libraries when compiling the program:
g++ main.c cmal.o -L/usr/local/cuda/lib64 -lcudart
However, I am not sure how to include libraries in JITed modules using LLVM. I found this question which suggested that is used to be possible to add libraries to modules in the JIT like this:
[your module]->addLibrary("m");
Unfortunately, this has been deprecated. Can anyone tell me the best way to do this now? Let me know if I need to provide more information!
Furthermore, I am not really sure if this is the best way to be incorporating GPU offloading into my JIT, so if anyone can point me to a better method then please do :)
Thanks!
EDIT: I am using LLVM 5.0 and the JIT engine I am using is from llvm/ExecutionEngine/ExecutionEngine.h, more specifically I create it like this:
EngineBuilder EB(std::move(module));
ExecutionEngine *EE = EB.create(targetMachine);
You need to teach your JIT engine about other symbols explicitly.
If they are in a dynamic library (dylib, so, dll) then you can just call
sys::DynamicLibrary::LoadLibraryPermanently("path_to_some.dylib")
with a path to the dynamic library.
If the symbols are in an object file or an archive, then it requires a bit more work: you would need to load them into memory and add to the ExecutionEngine using its APIs.
Here is an example for an object file:
std::string objectFileName("some_object_file.o");
ErrorOr<std::unique_ptr<MemoryBuffer>> buffer =
MemoryBuffer::getFile(objectFileName.c_str());
if (!buffer) {
// handle error
}
Expected<std::unique_ptr<ObjectFile>> objectOrError =
ObjectFile::createObjectFile(buffer.get()->getMemBufferRef());
if (!objectOrError) {
// handle error
}
std::unique_ptr<ObjectFile> objectFile(std::move(objectOrError.get()));
auto owningObject = OwningBinary<ObjectFile>(std::move(objectFile),
std::move(buffer.get()));
executionEngine.addObjectFile(std::move(owningObject));
For archives replace template types ObjectFile with Archive, and call
executionEngine.addArchive(std::move(owningArchive));
at the end.

How to use SWIG for D from C++ on Windows?

I want to use LEAP Motion in D.
Therefore It doesn't have C library and It has only C++ library.
I tried SWIG 2.0.9 below command.
swig -c++ -d -d2 leap.i
This command output Leap.d, Leap_im.d, Leap_wrap.cxx, Leap_wrap.h.
However, I don't know how to use to wrapper in D and I can't find how to use the wrapper.
Link error displays to use it intact.
How use these wrapper in D2?
And can I use without Leap.cpp (source of Leap.dll)?
Update:
Thanks two answers. and sorry for reply late because of busy.
Say first conclusion I could build Leap sample code on Win64 by following the steps below.
Output wrappers by above command.
Create x64 DLL with VC2010 from Leap_wrap.cxx, Leap_wrap.h, and import Leap.lib(x64).
Compile Leap.d and Leap_im.d with dmd -c.
Build LeapTest.d with Leap.obj and Leap_im.obj
all command is below.
swig -c++ -d -d2 leap.i
dmd -c Leap.d Leap_im.d -m64
dmd LeapTest.d Leap.obj Leap_im.obj -m64
execute LeapTest.exe (require x64 Leap.dll and Leap_wrap.dll)
I could run Leap Program.
But program crach onFrame event callback.
I'll try again on x86 and investigate the causes.
Few helpful links (some information may be outdated):
http://klickverbot.at/blog/2010/11/announcing-d-support-in-swig/
http://www.swig.org/Doc2.0/D.html
http://www.swig.org/tutorial.html
I have never used SWIG personally but my guess based on general knowledge about SWIG:
Leap_wrap.cxx is C++ source file that wraps calls to C++ functions from target library in extern(C) calls
Leap_wrap.h is header file with all extern(C) wrappers listed
Leap_im.d is D module based on Leap_wrap.h with same extern(C) function listed
Leap.d is D module that uses Leap_im.d as an implementation and reproduces API similar to original C++ one.
So in your D code you want to import Leap.d module. Than compile Leap_wrap.cxx to an object file with your C++ compiler and provide D object files, Leap_wrap.o and target library at linking stage. That should do the trick.
P.S. Leap.cpp source should not be needed. All stuff links directly from Leap_wrap.cxx to target library binary.
Go to IRC, either FreeNode, or OFTC, channel #D. In order to help you, we have to see what is in those files. My first guess is that you have to compile both D files, and the C++ file into object files, and link them together. I suppose SWIG is going to flatten the C++ API into bunch of C functions, and that is probably what Leap_wrap.cxx does.
If the LEAP API is not complex (ie. just bunch of simple C++ classes), it may be possible to directly interface with it. Read more about it here: http://dlang.org/cpp_interface.html .

Is it possible to symbolicate C++ code?

I have been running into trouble recently trying to symbolicate a crash log of an iOS app. For some reason the UUID of the dSYM was not indexed in Spotlight. After some manual search and a healthy dose of command line incantations, I managed to symbolicate partially the crash log.
At first I thought the dSYM might be incomplete or something like that, but then I realized that the method calls missing were the ones occurring in C++ code: this project is an Objective-C app that calls into C++ libraries (via Objective-C++) which call back to Objective-C code (again, via Objective-C++ code). The calls that I'm missing are, specifically, the ones that happen in C++ land.
So, my question is: is there some way that the symbolication process can resolve the function calls of C++ code? Which special options do I need to set, if any?
One useful program that comes with the apple sdk is atos (address to symbol). Basically, here's what you want to do:
atos -o myExecutable -arch armv7 0x(address here)
It should print out the name of the symbol at that address.
I'm not well versed in Objective-C, but I'd make sure that the C++ code is being compiled with symbols. Particularly, did you make sure to include -rdynamic and/or -g when compiling the C++ code?
try
dwarfdump --lookup=0xYOUR_ADRESS YOUR_DSYM_FILE
you will have to look up each adress manually ( or write a script to do this ) but if the symbols are ok ( your dSym file is bigger than say 20MB) this will do the job .

dll Export/init problem (static vars init?) Visual Studio C++

I want to run an example plugin for CLANG/LLVM. Specifically llvm\tools\clang\examples\PrintFunctionNames. I managed to build it and i see an PrintFunctionNames.exports but i dont think visual studios supports it. The file is simply _ZN4llvm8Registry*. I have no idea what that is but i suspect its namespace llvm, class Registry which is defined as
template <typename T, typename U = RegistryTraits<T> >
class Registry {
I suspect the key line is at the end of the example file
static FrontendPluginRegistry::Add<PrintFunctionNamesAction> X("print-fns", "print function names");
print-fns is the name while the 2nd param is the desc. When i try loading/running the dll via
clang -cc1 -load printFunctionNames.dll -plugin print-fns a.c
I get an error about not finding print-fns. I suspect its because the static variable is never being initialize thus it never registers the plugin. A wrong dll name would get an error loading module msg.
I created a def file and added it to my project. It compiled but still no luck. Here is my def file
LIBRARY printFunctionNames
EXPORTS
X DATA
How do i register the plugin or get this example working?
Ok, becoming slightly more clear. To summarize: Visual Studio has nothing to do with it, really. This is a plugin for the clang executable. Therefore, there must be a method to communicate between them (the plugin interface). This appears to be an undocumented interface, so it's taking a bit off guesswork.
Troubleshooting DLL issues is done with "Dependency Walker" aka "Depends". It offers a profiling mode, in which all symbol lookups can be profiled. I.e. if you profile clang -cc1 -load printFunctionNames.dll -plugin print-fns a.c, you will see what symbols clang expects from your DLL, and in what order.
It looks like you're trying to mix C++ code built with two different, incompatible compilers. That's not supported, and the error you're seeing is a typical sign of that: C++ compilers usually use a "name mangling scheme", and if two compilers are incompatible then their name mangling schemes don't line up. One compiler may mangle llvm::Registry as _ZN4llvm8Registry* while another refers to it as llvm__Registry.