I have following tasks that I need to do as part of my research idea:
1. Parse the C file(s) at hand to get llvm-IR.
2. Do analysis on the IR. Possibly add and remove some instructions or BB
3. Emit either x86 executable or C (need to decide later)
I think this is quite common task for any one writing analysis on C, I want to do all these tasks in C/C++ (as most of our research code is in C/C++). I googled a lot, although lot of documentation is available on Tasks 2 and 3, but less is available on Task 1, any idea on this would be really helpful.
I want to hook these tasks as a pipe line, any suggestion for this are also welcome.
-Thanks
(1) can be done by using Clang to emit LLVM IR.
(2) can be done by writing your own LLVM pass, and later invoking it (with any other passes you're interested in) via LLVM's opt tool.
(3) (to x86) can be done by LLVM's llc tool.
All of these are also accessible as APIs, not only command-line tools, making it possible to incorporate into your pipe.
Related
How can I pass command line arguments to my program when profiling with llvm-prof?
And where can I find a more comprehensive documentation for llvm-prof? "llvm-prof -help" output is too brief. Its manual is even shorter.
I would recommend staying away from llvm-prof at this point. The reason is that it was actually removed from trunk LLVM a month ago (in revision 191835). Here is the commit message that should clarify the motivation:
Remove the very substantial, largely unmaintained legacy PGO
infrastructure.
This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.
Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.
The way you write the question implies that you are trying to execute your program using llvm-prof. However I am not sure if that is the way to do it. The way to profile is first to instrument your code with counters using:
opt -disable-opt -insert-edge-profiling -o program.profile.bc program.bc
Then execute the instrumented program using lli as follows:
lli -O0 -fake-argv0 'program.bc < YOUR_ARGS' -load llvm/Debug+Asserts/lib/libprofile_rt.so program.profile.bc
Note the way to pass the arguments to the program using -fake-argv0 'program.bc < YOUR_ARGS' above. This step will generate llvmprof.out file which can then be read with llvm-prof to generate the execution profiles as follows:
llvm-prof program.profile.bc
I am working on a project where I need to track changes to particular set of variables in any given application code to model memory access patterns.
I can think of two approaches mainly, please give your thoughts on them.
My initial thought is to do it like many profilers like gprof would do, where I add instrumentation code in the target application code before compilation and analyze the log generated by this instrumentation code to the get required information.
To accomplish, I can only think of some sort of source-to-source compiler where it parses given code and injects instrumentation code (Same language source-source compiler) into application which I can later compile and run to get the required logs.
Does this seem right or am I over-engineering? If not, are there tools that let me build a source-source compiler (relatively) easily?
I read about GDB's support for python, so, I am thinking if I can write a python script to get set of variables as config file and set watchpoints and log everytime there is a write to variables being watched. I tried to use this GDB feature but on my Ubuntu machine it doesn't seem to be working for now.
http://sourceware.org/gdb/onlinedocs/gdb/Python.html#Python
And, the language of applications is going to be nesC (I guess nesC is converted to C in the process of compilation) (and applications are going to run on TOSSIM like native apps on my computer).
See my paper on instrumenting codes using a program transformation systems (PTS) (PTS is a very general kind of "source-to-source compiler).
It shows how to install probes in code in a pretty straightforward way, once you have a grammar for the language of interest. The underlying tool, DMS, makes it fairly easy to define the grammar too.
Normally If you want to modify LLVM IR, you need to write a pass. However, writing a pass by yourself is an overkill sometimes if a higher level tool could facilitate you.
For example, someone might wish to log every load and store in the program. For that purpose, he would need to inject code that does the logging. Now if there is a higher level tool, it can provide callbacks to us to write what we want. So in this case, for example, it could provide us OnLoad and OnStore functions which we can fill to tell the tool what to do on each load and store. Does such kind of a tool exist?
So basically I want something similar to what is provided by Dynamic Binary Instrumentation tools but that works with LLVM, for compile time code injection.
I think you should consider using PIN instead of LLVM for such things: http://www.pintool.org/
PIN enables you insert instrumentation/analyze code at several granularity levels: instruction, basic block, function, traces and even load/unload of shared libraries. Is may be a way more practical since you won't need to compile the application - so you can analyze programs wich aren't open source for example.
There are version of PIN for windows and linux.
PS: Another tool that seems useful: http://eces.colorado.edu/~blomsted/llvmpin/llvmpin.html
I'm using the LLVM C++ API mostly as a code generator for a scripting language that is parsed and evaluated (generating code, compiling, and executing it) at runtime. Currently I'm investigating future use cases in the context of a distributed/concurrent system and wonder if and how these use cases could be implemented. Maybe you can share your thoughts:
Is there a way to generate LLVM code on one node in a distributed
system, serialize it to some wire format, send it to another node,
compile or recompile it there and then execute it? I'm already stuck
finding methods to serialize a module/function.
Are there ways to enable multi-threaded code
generation/compilation within the same LLVMContext, i.e., a pool of
threads shares a LLVMContext and generate/execute code within this
context simultaneously. What I found out so far is that there should
be a LLVMContext for each thread in this case. However, I can I then
share a module between the different contexts and relating to 1),
how could I move generated code from one module to the other?
You can definitely use LLVM bitcode format to forward the code from one node to another. See include/llvm/Bitcode/ReaderWriter.h and around for more info. You can also check the sources of LLVM tools to see how the bitcode is serialized and deserialized. You might find http://llvm.org/docs/BitCodeFormat.html useful.
I was reading here and there about llvm that can be used to ease the pain of cross platform compilations in c++ , i was trying to read the documents but i didn't understand how can i
use it in real life development problems can someone please explain me in simple words how can i use it ?
The key concept of LLVM is a low-level "intermediate" representation (IR) of your program.
This IR is at about the level of assembler code, but it contains more information to facilitate optimization.
The power of LLVM comes from its ability to defer compilation of this intermediate representation to a specific target machine until just before the code needs to run. A just-in-time (JIT) compilation approach can be used for an application to produce the code it needs just before it needs it.
In many cases, you have more information at the time the program is running that you do back at head office, so the program can be much optimized.
To get started, you could compile a C++ program to a single intermediate representation, then compile it to multiple platforms from that IR.
You can also try the Kaleidoscope demo, which walks you through creating a new language without having to actually write a compiler, just write the IR.
In performance-critical applications, the application can essentially write its own code that it needs to run, just before it needs to run it.
Why don't you go to the LLVM website and check out all the documentation there. They explain in great detail what LLVM is and how to use it. For example they have a Getting Started page.
LLVM is, as its name says a low level virtual machine which have code generator. If you want to compile to it, you can use either gcc front end or clang, which is c/c++ compiler for LLVM which is still work in progress.
It's important to note that a bunch of information about the target comes from the system header files that you use when compiling. LLVM does not defer resolving things like "size of pointer" or "byte layout" so if you compile with 64-bit headers for a little-endian platform, you cannot use that LLVM source code to target a 32-bit big-endian assembly output pater.
There is a good chapter in a book explaining everything nicely here: www.aosabook.org/en/llvm.html