When should I use "volatile" for LLVM IR? - llvm

In which scenarios should I care about it/place it for LLVM?
I have read the following doc but need more detailed examples if anyone could lift .What does volatile mean exactly in LLVM?
Citation: Volatile Memory Accesses, where "volatile" is defined in LLVM.

Related

Clang LLVM disable va_arg expansion

I am writing an llvm tool that uses the generated llvm IR bit code. and for va_arg clang expands it into
getelementptr instruction
with fixed positions and memory layout
instead of using
va_arg instruction
is there any compiler flag to disable this expansion ?
AFAIK, no, because variable argument handling is platform-specific.
Moreover, I tried to use VA instructions from LLVM IR and sometimes it was resulting in wrong machine code. There are a lot of intricacies there, and that's why IR VA instructions are going to be deprecated.

Can object code be converted back to LLVM IR?

Object code can be disassembled in to an assembly language. Is there a way to turn object code or an executable into LLVM IR?
I mean, yes, you can convert machine language to LLVM IR. The IR is Turing-complete, meaning it can compute whatever some other Turing-complete system can compute. At worst, you could have an LLVM IR representation of an x86 emulator, and just execute the machine code given as a string.
But your question specifically asked about converting "back" to IR, in the sense of the IR result being similar to the original IR. And the answer is, no, not really. The machine language code will be the result of various optimization passes, and there's no way to determine what the code looked like before that optimization. (arrowd mentioned McSema in a comment, which does its best, but in general the results will be very different from the original code.)

Add comments to LLVM IR?

Is it possible to add comments into a BasicBlock? I only want that when I print out the IR for debugging I can have a few comments that help me. That is, I fully expect them to be lost once I pass them to the optimizer.
No, it's not possible directly. Comments, by which you probably mean the lexical elements beginning with a semicolon (;) in the textual IR representation, have no representation in the in-memory IR (and binary bitcode). As you probably know, LLVM IR has three equivalent representations (in memory API level, textual "assembly" level, binary bitcode level). Once the LLVM assembly IR parser reads the code into memory, comments are lost.
What you could do, however, is use metadata for this purpose. You can create arbitrary metadata attached to any instruction, as well as global module-level metadata. This is a hack, for sure, but if you really think you need some sort of annotation, metadata is the way. LLVM uses metadata for a number of annotation needs, like debug info and alias analysis annotations.

how can one see content of registers with c++?

Over using gdb, any one can see content of any registers ?
ex:
x/x $ebp + 0x4
print $eax
I wonder, Can I do same thing by just with c++ ? If yes, how?
C++ does not specify any particular machine architecture; therefore, it would not be able to do anything standard related to (machine specific) registers. You'll have to check your compiler's documentation to see if doing these kinds of things are supported.
I believe the only way you can do this is to use assembly language to access the registers - but that's non-portable.
There's a good thread on the subject here:
http://bytes.com/topic/c/answers/626071-how-access-processor-registers
and I asked a question a while back about usage of assembly in C which would show you the basics (in the solutions) here:
How does C code call assembly code (e.g. optimized strlen)?
You can probably do this with inline assembler if your compiler supports it. http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Extended-Asm.html
You can use inline-assembler along with the mov instruction, but every compiler has it's own syntax for this (and the asm syntax is not always the same as well).

Learning to read GCC assembler output

I'm considering picking up some very rudimentary understanding of assembly. My current goal is simple: VERY BASIC understanding of GCC assembler output when compiling C/C++ with the -S switch for x86/x86-64.
Just enough to do simple things such as looking at a single function and verifying whether GCC optimizes away things I expect to disappear.
Does anyone have/know of a truly concise introduction to assembly, relevant to GCC and specifically for the purpose of reading, and a list of the most important instructions anyone casually reading assembly should know?
You should use GCC's -fverbose-asm option. It makes the compiler output additional information (in the form of comments) that make it easier to understand the assembly code's relationship to the original C/C++ code.
If you're using gcc or clang, the -masm=intel argument tells the compiler to generate assembly with Intel syntax rather than AT&T syntax, and the --save-temps argument tells the compiler to save temporary files (preprocessed source, assembly output, unlinked object file) in the directory GCC is called from.
Getting a superficial understanding of x86 assembly should be easy with all the resources out there. Here's one such resource: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html .
You can also just use disasm and gdb to see what a compiled program is doing.
I usually hunt down the processor documentation when faced with a new device, and then just look up the opcodes as I encounter ones I don't know.
On Intel, thankfully the opcodes are somewhat sensible. PowerPC not so much in my opinion. MIPS was my favorite. For MIPS I borrowed my neighbor's little reference book, and for PPC I had some IBM documentation in a PDF that was handy to search through. (And for Intel, mostly I guess and then watch the registers to make sure I'm guessing right! heh)
Basically, the assembly itself is easy. It basically does three things: move data between memory and registers, operate on data in registers, and change the program counter. Mapping between your language of choice and the assembly will require some study (e.g. learning how to recognize a virtual function call), and for this an "integrated" source and disassembly view (like you can get in Visual Studio) is very useful.
"casually reading assembly" lol (nicely)
I would start by following in gdb at run time; you get a better feel for whats happening. But then maybe thats just me. it will disassemble a function for you (disass func) then you can single step through it
If you are doing this solely to check the optimizations - do not worry.
a) the compiler does a good job
b) you wont be able to understand what it is doing anyway (nobody can)
Unlike higher-level languages, there's really not much (if any) difference between being able to read assembly and being able to write it. Instructions have a one-to-one relationship with CPU opcodes -- there's no complexity to skip over while still retaining an understanding of what the line of code does. (It's not like a higher-level language where you can see a line that says "print $var" and not need to know or care about how it goes about outputting it to screen.)
If you still want to learn assembly, try the book Assembly Language Step-by-Step: Programming with Linux, by Jeff Duntemann.
I'm sure there are introductory books and web sites out there, but a pretty efficient way of learning it is actually to get the Intel references and then try to do simple stuff (like integer math and Boolean logic) in your favorite high-level language and then look what the resulting binary code is.