I was working on some project to convert a C+ACSL language into another language by converting first into LLVM IR and then convert it into another language. I cannot tell about that because it is related to a group of some people who do not want to disclose it, so we will not talk about it, but it is very close to C. I have completed the work till LLVM IR and also have written to convert it back, with the help of CBackend, but I really don't know how to use it, means how should I run that into my LLVM IR. Is there a command in LLVM for doing it, or something which can help me.
UPDATE 1
My input is LLVM IR and ouput is going to be a C like code, not C as it will be following syntax different from C.
Related
Object code can be disassembled in to an assembly language. Is there a way to turn object code or an executable into LLVM IR?
I mean, yes, you can convert machine language to LLVM IR. The IR is Turing-complete, meaning it can compute whatever some other Turing-complete system can compute. At worst, you could have an LLVM IR representation of an x86 emulator, and just execute the machine code given as a string.
But your question specifically asked about converting "back" to IR, in the sense of the IR result being similar to the original IR. And the answer is, no, not really. The machine language code will be the result of various optimization passes, and there's no way to determine what the code looked like before that optimization. (arrowd mentioned McSema in a comment, which does its best, but in general the results will be very different from the original code.)
I am beginner and want to build translator that can convert LLVM bitcode to Java Bytecode.
Can somebody please tell me in brief or list some major steps how to go through it.
In our company (Altimesh), we did the same thing for CIL. For Java Bytecode, the task is likely very similar.
I can tell you it's quite a long task.
First thing : LLVM libraries are written in C++
That means you either have to learn c++, and a way to generate java bytecode from C++, or export the symbols you need from LLVM libraries to JNI. I strongly recommend the second option, as you'll get a pure Java implementation (and you'll soon figure out that you don't need that many symbols from LLVM API).
Once you figured that out, you need to:
Parse modules from files
here is a simple example (using llvm 3.9 API, which is quite old now):
llvm::Module* llvm__Module_buildFromFile (llvm::LLVMContext* context, const char* filename)
{
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> buf = llvm::MemoryBuffer::getFile(filename);
llvm::SMDiagnostic diag;
return llvm::parseIR(buf->get()->getMemBufferRef(), diag, *context).release();
}
Parse debug infos
void llvm__DebugInfoFinder__processModule(llvm::DebugInfoFinder* self, llvm::Module* M)
{
self->processModule(*M);
}
debug info, or metadata, are quite a pain with llvm, as they change very frequently (compared to instructions). So you either have to stick to an LLVM's version (probably a bad choice), or update your code as soon as a new LLVM release gets out.
Once you're there, most of the pain is behind you, and you enter the world of fun.
I strongly recommend to start with something very very simple, such as a simple addition program.
Then always keep two windows opened, godbolt showing you input llvm you need to parse, and a java window showing you the target (here is an example for MSIL).
Once you're able to transpile your first program (hurrraah, I can add two integers :) ), you will soon want to transpile more stuff, and soon you will face two insanities:
getelementptr. This is how arrays, memory, structures... is accessed in LLVM. This is a pretty magic instruction.
phi. Crucial instruction in LLVM system, as it allows Single Static Assignment, which is fairly important for the backend (register allocator and co). I don't know in Java, but this was obviously not available in MSIL.
Once all of that is done, you enter the endless world of pain of special cases, weird C constructs you didn't know about, gcc extensions and so on...
Anyway good luck!
I've developed an LLVM front-end generating LLVM IR as the target code from some source-language X. If I extend this front-end to embed debug information within the generated IR, is it possible to use LLDB for debugging my source-language? I mean does lldb support any source-language targeting LLVM IR?
You'll have to get a DWARF language code and get lldb to recognize it. If we get some DWARF for an unknown language, we'll just ignore it...
Then with no more support, some things will work, others won't.
If you emit correct line table information, you should be able to map back to your source, and that should get stepping working as well. Other things start getting hard.
The next hard part is how you are going to tell lldb about your type information. lldb uses Clang's AST's as internal storage for type information in the debugger. lldb translates DWARF type information into Clang AST's both for printing local variables (with the frame variable command) and for use with the expression parser.
If your language has a type system that looks kinda like C lldb should be able to parse the DWARF for your types. You'll that plus correct variable information should get frame variable working.
The expression parser (i.e. the expression, print or po commands) requires that lldb have a parser for your language. That can be a pretty big chunk of work.
Is there an easy way of going from llvm ir to working source code?
Specifically, I'd like to start with some simple C++ code that merely modifies PODs (mainly arrays of ints, floats, etc), convert it to llvm ir, perform some simple analysis and translation on it and then convert it back into C++ code?
It don't really mind about any of the names getting mangled, I'd just like to be able to hack about with the source before doing the machine-dependent optimisations.
There are number of options actually. The 2 that you'll probably be interested in are -march=c and -march=cpp, which are options to llc.
Run:
llc -march=c -o code.c code.ll
This will convert the LLVM bitcode in code.ll back to C and put it in code.c.
Also:
llc -march=cpp -o code.cpp code.ll
This is different than the C output engine. It actually will write out C++ code that can be run to reconstruct the IR. I use this personal to embed LLVM IR in a program without having to deal with parsing bitcode files or anything.
-march=cpp has more options you can see with llc --help, such as -cppgen= which controls how much of the IR the output C++ reconstructs.
CppBackend was removed. We have no -march=cpp and -march=c option since 2016-05-05, r268631.
There is an issue here... it might not be possible to easily represent the IR back into the language.
I mean, you'll probably be able to get some representation, but it might be less readable.
The issue is that the IR is not concerned with high-level semantic, and without it...
I'd rather advise you to learn to read the IR. I can read a bit of it without that much effort, and I am far from being a llvm expert.
Otherwise, you can C code from the IR. It won't be much more similar to your C++ code, but you'll perhaps feel better without ssa and phi nodes.
I was reading here and there about llvm that can be used to ease the pain of cross platform compilations in c++ , i was trying to read the documents but i didn't understand how can i
use it in real life development problems can someone please explain me in simple words how can i use it ?
The key concept of LLVM is a low-level "intermediate" representation (IR) of your program.
This IR is at about the level of assembler code, but it contains more information to facilitate optimization.
The power of LLVM comes from its ability to defer compilation of this intermediate representation to a specific target machine until just before the code needs to run. A just-in-time (JIT) compilation approach can be used for an application to produce the code it needs just before it needs it.
In many cases, you have more information at the time the program is running that you do back at head office, so the program can be much optimized.
To get started, you could compile a C++ program to a single intermediate representation, then compile it to multiple platforms from that IR.
You can also try the Kaleidoscope demo, which walks you through creating a new language without having to actually write a compiler, just write the IR.
In performance-critical applications, the application can essentially write its own code that it needs to run, just before it needs to run it.
Why don't you go to the LLVM website and check out all the documentation there. They explain in great detail what LLVM is and how to use it. For example they have a Getting Started page.
LLVM is, as its name says a low level virtual machine which have code generator. If you want to compile to it, you can use either gcc front end or clang, which is c/c++ compiler for LLVM which is still work in progress.
It's important to note that a bunch of information about the target comes from the system header files that you use when compiling. LLVM does not defer resolving things like "size of pointer" or "byte layout" so if you compile with 64-bit headers for a little-endian platform, you cannot use that LLVM source code to target a 32-bit big-endian assembly output pater.
There is a good chapter in a book explaining everything nicely here: www.aosabook.org/en/llvm.html