Where can I find the opcode numbers for the LLVM bitcode? - llvm

Where can I find the LLVM bytecode representation of the LLVM IR language?
Like this <result> = add <ty> <op1>, <op2>, but in binary form like this incept for LLVM instead of JVM. More specifically I want the opcode numbers so I can study the bitcode on a binary level.

I think the canonical source for LLVM Bit Codes is this file :
llvm-src/include/llvm/Bitcode/LLVMBitCodes.h
from the llvm source which can be found here: http://llvm.org/releases/
You may also want to look at the code in llvm-src/lib/Bitcode/Reader, which reads bitcode.

you can find opcode numbers in include/llvm/IR/Instruction.def
https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_14/include/llvm/Instruction.def

Related

Can object code be converted back to LLVM IR?

Object code can be disassembled in to an assembly language. Is there a way to turn object code or an executable into LLVM IR?
I mean, yes, you can convert machine language to LLVM IR. The IR is Turing-complete, meaning it can compute whatever some other Turing-complete system can compute. At worst, you could have an LLVM IR representation of an x86 emulator, and just execute the machine code given as a string.
But your question specifically asked about converting "back" to IR, in the sense of the IR result being similar to the original IR. And the answer is, no, not really. The machine language code will be the result of various optimization passes, and there's no way to determine what the code looked like before that optimization. (arrowd mentioned McSema in a comment, which does its best, but in general the results will be very different from the original code.)

Generate binary code (shared library) from embedded LLVM in C++

I am working on a high performance system written in C++. The process needs to be able to understand some complex logic (rules) at runtime written in a simple language developed for this application. We have two options:
Interpret the logic - run a embedded interpreter and generate a dynamic function call, which when receives data, based on the interpreted logic works on the data
Compile the logic into a plugin.so dynamic shared file, use dlopen, dlsym to load the plugin and call logic function at runtime
Option 2 looks to be really attractive as it will be optimized machine code, would run much faster than embedded interpreter in the process.
The options I am exploring are:
write a compile method string compile( string logic, list & errors, list & warnings )
here input logic is a string containing logic coded in our custom language
it generates llvm ir, return value of the compile method returns ir string
write link method bool link(string ir, string filename, list & errors, list & warnings)
for the link method i searched llvm documentation but I have not been able to find out if there is a possibility to write such a method
If i am correct, LLVM IR is converted to LLVM Byte Code or Assembly code. Then either LLVM JIT is used to run in JIT mode or use GNU Assembler is used to generate native code.
Is it possible to find a function in LLVM which does that ? It would be much nicer if it is all done from within the code rather than using a system command from C++ to invoke "as" to generate the plugin.so file for my requirement.
Please let me know if you know of any ways i can generate a shared library native binary code from my process at runtime.
llc which is a llvm tool that does LLVM-IR to binary code translation. I think that is all you need.
Basically you can produce your LLVM IR the way you want and then call llc over your IR.
You can call it from the command line or you can go to the implementation of llc and find out how it works to do that in your own programs.
Here is a usefull link:
http://llvm.org/docs/CommandGuide/llc.html
I hope it helps.

Add comments to LLVM IR?

Is it possible to add comments into a BasicBlock? I only want that when I print out the IR for debugging I can have a few comments that help me. That is, I fully expect them to be lost once I pass them to the optimizer.
No, it's not possible directly. Comments, by which you probably mean the lexical elements beginning with a semicolon (;) in the textual IR representation, have no representation in the in-memory IR (and binary bitcode). As you probably know, LLVM IR has three equivalent representations (in memory API level, textual "assembly" level, binary bitcode level). Once the LLVM assembly IR parser reads the code into memory, comments are lost.
What you could do, however, is use metadata for this purpose. You can create arbitrary metadata attached to any instruction, as well as global module-level metadata. This is a hack, for sure, but if you really think you need some sort of annotation, metadata is the way. LLVM uses metadata for a number of annotation needs, like debug info and alias analysis annotations.

Converting GCC IR to LLVM IR

I have written an LLVM pass that modifies the Intermediate Representation (IR) code. To increase portability, I also want it to work with a gcc compiler. So I was wondering if there is any tool which can convert some Intermediate Representation (IR) of gcc to LLVM IR.
You probably want dragonegg (which is using the GCC front-end to build an LLVM IR).
And if you wanted to work on the GCC internal representations, MELT (a high level domain specific language to extend GCC) is probably the right tool.
It will probably be much easier to simply write another version of your code that works with gcc IR. What you want to do is likely not possible, and if it is possible, it's probably extremely difficult. (More so than writing the LLVM pass in the first place.)

llvm ir back to human-readable source language?

Is there an easy way of going from llvm ir to working source code?
Specifically, I'd like to start with some simple C++ code that merely modifies PODs (mainly arrays of ints, floats, etc), convert it to llvm ir, perform some simple analysis and translation on it and then convert it back into C++ code?
It don't really mind about any of the names getting mangled, I'd just like to be able to hack about with the source before doing the machine-dependent optimisations.
There are number of options actually. The 2 that you'll probably be interested in are -march=c and -march=cpp, which are options to llc.
Run:
llc -march=c -o code.c code.ll
This will convert the LLVM bitcode in code.ll back to C and put it in code.c.
Also:
llc -march=cpp -o code.cpp code.ll
This is different than the C output engine. It actually will write out C++ code that can be run to reconstruct the IR. I use this personal to embed LLVM IR in a program without having to deal with parsing bitcode files or anything.
-march=cpp has more options you can see with llc --help, such as -cppgen= which controls how much of the IR the output C++ reconstructs.
CppBackend was removed. We have no -march=cpp and -march=c option since 2016-05-05, r268631.
There is an issue here... it might not be possible to easily represent the IR back into the language.
I mean, you'll probably be able to get some representation, but it might be less readable.
The issue is that the IR is not concerned with high-level semantic, and without it...
I'd rather advise you to learn to read the IR. I can read a bit of it without that much effort, and I am far from being a llvm expert.
Otherwise, you can C code from the IR. It won't be much more similar to your C++ code, but you'll perhaps feel better without ssa and phi nodes.