LLVM function hoisting best approach - llvm

I would like to have function hoisting in my language, so something like this should work,
foo();
void foo() {
// Do stuff
}
What is the best approach for that with LLVM? Is there a pass that can be used? Would I need to write my own pass? Would it be easier to rearrange the AST before getting to LLVM?

Related

LLVM Clang C++ code injection

I am a bit confused about implementing a code injection function in LLVM Clang. I basically want to add a function before a variable or a pointer is created in the source code. Example:
#include <iostream>
int main() {
int a;
return 0;
}
to
#include <iostream>
int main() {
foo();
int a;
return 0;
}
I read the LLVM docs to find an answer but couldn't. Please help me.
Thank you in advance.
First step is to decide whether you want to do this in Clang or LLVM. Although they are "connected", they are not the same thing. At clang you can do it at AST level, in which case you need to write a recursive AST-visitor, and from that identify the function definitions that you want to instrument - inserting the AST to call your foo function. This will only work for functions implemented by the compiler.
There is information on how to write such a visitor here:
https://clang.llvm.org/docs/RAVFrontendAction.html
In LLVM you could write a function-pass, that inserts code into each function. This obviously works for ANY functions, regardless of language.
How to write an LLVM pass:
http://llvm.org/docs/WritingAnLLVMPass.html
However, while this may seem trivial at the beginning, there are some interesting quirks. In an LLVM function, the alloca instructions should be first, so you would have to "skip" those functions. There may be functions that "shouldbn't be instrumented" - for example, if your function foo prints something using cout << something;, it would be rather terrible idea to insert foo into the operator<<(ostream&, ...) type functions... ;) And you obviously don't want to instrument foo itself, or any functions it calls.
There are ways in Clang that you can determine if the source is the "main file" or some header-file - although that may not be enough in your case. It is much harder to determine "which function is this" in LLVM.

Dead virtual function elimination

Question
(Can I get clang or perhaps some other optimizing tool shipped with LLVM to identify unused virtual functions in a C++ program, to mark them for dead code elimination? I guess not.)
If there is no such functionality shipped with LLVM, how would one go about implementing a thing like this? What's the most appropriate layer to achieve this, and where can I find examples on which I could build this?
Thoughts
My first thought was an optimizer working on LLVM bitcode or IR. After all, a lot of optimizers are written for that representation. Simple dead code elimination is easy enough: any function which is neither called nor has its address taken and stored somewhere is dead code and can be omitted from the final binary. But a virtual function has its address taken and stored in the virtual function table of the corresponding class. In order to identify whether that function has a chance of getting called, an optimizer would not only have to identify all virtual function calls, but also identify the type hierarchy to map these virtual function calls to all possible implementations.
This makes things look quite hard to tackle at the bitcode level. It might be better to handle this somewhere closer to the front end, at a stage where more type information is available, and where calls to a virtual function might be more readily associated with implementations of these functions. Perhaps the VirtualCallChecker could serve as a starting point.
One problem is probably the fact that while it's possible to combine the bitcode of several objects into a single unit for link time optimization, one hardly ever compiles all the source code of a moderately sized project as a single translation unit. So the association between virtual function calls and implementations might have to be somehow maintained till that stage. I don't know if any kind of custom annotation is possible with LLVM; I have seen no indication of this in the language specification.
But I'm having a bit of a trouble with the language specification in any case. The only reference to virtual in there are the virtuality and virtualIndex properties of MDSubprogram, but so far I have found no information at all about their semantics. No documentation, nor any useful places inside the LLVM source code. I might be looking at the wrong documentation for my use case.
Cross references
eliminate unused virtual functions asked about pretty much the same thing in the context of GCC, but I'm specifically looking for a LLVM solution here. There used to be a -fvtable-gc switch to GCC, but apparently it was too buggy and got punted, and clang doesn't support it either.
Example:
struct foo {
virtual ~foo() { }
virtual int a() { return 12345001; }
virtual int b() { return 12345002; }
};
struct bar : public foo {
virtual ~bar() { }
virtual int a() { return 12345003; }
virtual int b() { return 12345004; }
};
int main(int argc, char** argv) {
foo* p = (argc & 1 ? new foo() : new bar());
int res = p->a();
delete p;
return res;
};
How can I write a tool to automatically get rid of foo::b() and bar::b() in the generated code?
clang++ -fuse-ld=gold -O3 -flto with clang 3.5.1 wasn't enough, as an objdump -d -C of the resulting executable showed.
Question focus changed
Originally I had been asking not only about how to use clang or LLVM to this effect, but possibly for third party tools to achieve the same if clang and LLVM were not up to the task. Questions asking for tools are frowned upon here, though, so by now the focus has shifted from finding a tool to writing one. I guess chances for finding one are slim in any case, since a web search revealed no hints in that direction.

unparse the intermediate representation of c code back to c

I have a file written in c programming language and is preprocessed using CIL. Now there are calls to a function say foo() in this file. I want to modify the c code in this file such that all calls to foo() are under a #ifdef guard. I want only the calls to be guarded and not the function body so that I have finer control over the calls. The calls can be inside a if condition or a while loop. The rules for macro name: name begins with MACRO_ and ends with the line number of the function call foo() in the original code.
This is to be automated inside a tool and I am looking for a compiler that can unparse c code for doing this.
Example:
Input source file
void foo(int x){
// do something
}
int main(){
int a;
printf("doing something");
foo(a);
printf("doing something again");
foo(a);
return 0;
}
Desired output
void foo(int x){
// do something
}
int main(){
int a;
printf("doing something");
#ifdef MACRO_1
foo(a);
#endif
printf("doing something again");
#ifdef MACRO_2
foo(a);
#endif
return 0;
}
For SIMPLE source code, you can obviously do this with a simple script and some regexps in your favourite scripting language (perl, php, awk, python, etc). But it does get increasingly difficult if you start deciding to support for example function calls inside if-statements, member function calls, etc [and want to end up with output code that actually compiles to a correct program].
In that case, you need something that can read (and "understand") C or C++ and produce some intermediate form that you can then process and reissue the source code with modifications. It's far from easy to write such code, no matter where you start from. One solution may be to use Clang as a library. It has facilities to rewrite C or C++ code from it's Abstract Syntax Tree (AST) form. This link shows an example of such a rewriter: http://eli.thegreenplace.net/2012/06/08/basic-source-to-source-transformation-with-clang
I'm not sure exactly what you want to do if you have code like:
if (x)
foo();
bar();
Clearly, just inserting #if for the call to foo(); will cause the call to bar() to be called only when x is true, which is probably not what you wanted...
You could customize some free software compiler. If using some recent GCC you could customize it with MELT (a Lispy domain specific language to extend gcc & g++ etc....).
You probably do not want to produce idiomatic C code. It would be much simpler to customize your compiler (e.g. GCC -or perhaps Clang/LLVM ...) to have the desired behavior.
Transforming some internal compiler representation (e.g. Gimple for GCC) is a bit simpler than outputting C code. It may still mean several weeks of work (because C and C++ are quite complex languages, and compilers have quite complex internal representations).
Notice that your question does not consider what is happenning when foo is called inside some macro (or inside some C++ template expansion, or perhaps even some inlined function). This shows why working on the intermediate representation(s) of your compiler is worthwhile.
BTW, you might perhaps be interested by coccinelle, a source to source free software transformer.
You could also in principle use Clang (to compile your C or C++ code to LLVM) then llvm-cbe (an experimental LLVM to C backend)
If the code is structured in such a way that guarding the lines with foo calls can simply be commented out and that more complex expressions such as bar(), foo(a) need not be handled, you could use awk like this:
awk '/^\s*foo\(/ { print "#ifdef MACRO_" NR; print; print "#endif"; next } 1' filename.c
This will
/^\s*foo\(/ { # handle lines that begin with foo( preceded
# optionally by whitespaces specially by:
print "#ifdef MACRO_" NR # printing #ifdef MACRO_linenumber before
print
print "#endif" # and #endif after the line.
next
}
1 # all other lines are printed unchanged.
Be aware that this is a dirty, dirty hack that does not attempt to parse the C code properly. There are a number of ways you can break this, among them such things as
if(something)
foo(a);
and
foo(
a
);
That would come out as
if(something)
#ifdef MACRO_foo
foo(a);
#endif
and
#ifdef MACRO_foo
foo(
#endif
a
);
respectively. It may work for your particular case, but it is not a general C-code handling tool.
If the task is to exclude calling of foo(int) from code when some macro undefined (or defined), maybe the following approach will work better:
void foo(int x){
#ifdef MACRO_foo
// do something
#endif
}
int main(){
int a;
printf("doing something");
foo(a);
printf("doing something again");
foo(a);
return 0;
}
So, you can just exclude body of function and leave function calls in the whole program.
I think you are asking CIL to do things that CIL cannot do. Since it operates on preprocessed source code, it doesn't represent preprocessor directives, so you can't "put them into CIL representation" to be regenerated. You might be able to hack the CIL implementation itself to spit out your directives when it encountered your special circumstance, but it is hard to believe that such a hack would be general in any way.
You said you were looking for a "compiler that can unparse c code to do this". If you insist on "this" as involving specifically CIL, I think you are out of luck; there's only CIL itself to do this.
If you give up on CIL and will consider a different tool, then I think I have an answer, that will do CIL like things, can retain preprocessor directives in the representation (and/or allow you to insert them according to custom rules), and can regenerate valid C source code text.
That tool is our DMS Software Reengineering Toolkit, a general purpose program transformation engine, and its C Front End. DMS parses C code into ASTs, and unparse them back to valid source code, including retaining comments.
It can be used to do source-to-source transformations using mixtures of procedure calls on its AST manipulation library, and/or surface-syntax source-to-source rewrites.
DMS will capture preprocessor directives in that AST (they are just "more syntax!) in most cases without issues; sometimes you need to modify the source code slightly (permanently) to make preprocessor directives palatable. DMS provides symbol tables for C, and control and data flow analysis; these will need some revision to handle preprocessor conditionality.
To match what you are doing with CIL, you can ask DMS to do the preprocessing; now you end up with an AST that is preprocessor free. DMS's existing symbol tables, CF and DF machinery handle this case directly, now.
So you can carry out sophisticated operations on the AST using that additional information, in way different than but equivalent to CIL. In addition, you can still modify the ASTs to insert preprocessor directives, which seems to be your key problem.
To achieve your specific effect of call-site specific conditionals, you can take advantage of DMS's surface syntax source-to-source transformation capability.
The following DMS transform does something like what you want:
rule wrap_function_call(i: Identifier, a:arguments ):statement -> statement
" \i(\a); "
->
" #ifdef \generate_macro_name\(\i\)
\i(\a);
#endif
"
if want_to_wrap(i);
This rule finds any syntax tree corresponding to a function call as a statement, and wraps its in a conditional. (You didn't say what you wanted to do if the function call was part of an expression; that case requires a bit more transformation but could also be handled). A custom helper function generated_macro_name manufactures the macro name using the source position information associated with that identifier AST node matched for the function name. The transformation is conditioned on another custom helper function want_to_wrap, that inspects each matched name to determine if it is one that should be wrapped.
When done transforming the code, you call DMS's prettyprinter machinery to print the AST as source text.

Custom stacktrace implementation for ARM

I need to have a stacktrace in my program that is written in C++ and runs on ARM-device. I can't find any reliable way to get starcktrace so I decided to write my own that will be as simple as possible, just to get something like stacktrace in gdb.
Here's an idea: write a macro that will push FUNCTION and __PRETTY_FUNCTION__. There are several questions:
Consider I have such a macro:
#define STACKTRACE_ENTER_FUNC \
... lock mutex
... push info into the global list
... set scope-exit handler to delete info at function exit
... unlock mutex
Now I need to place this macro in every function in my code. But there are too many of them. Is there any better way to achieve the goal or should I really change every function to include this macro:
void foo()
{
STACKTRACE_ENTER_FUNC;
...
}
void bar()
{
STACKTRACE_ENTER_FUNC;
...
}
The next question is: I can use __PRETTY_FUNCTION__ (because we use only gcc of fixed version and the stacktrace implementation is only for debug builds on the fixed platform, no cross-platform or compiler issues). I can even parse it a bit to split the string to function name and function arguments names. But how can I print all function arguments without knowing too much about them: like types or number of arguments? Like:
int foo(char x, float y)
{
PRINT_ARGS("arg1", "arg2"); // Gives me the string: "arg1 = 'A', arg2 = 13.37"
...
}
int main()
{
foo('A', 13.37);
...
}
P.S. If you know a better approach to get stack-trace in running program on ARMv6, please let me know (compiler: arm-openwrt-linux-uclibcgnueabi-gcc 4.7.3, libc: uClibc-0.9.33.2)
Thanks in advance.
The easier solution is to drop down to assembly - stack traces don't exist on C++ level anyway.
From an assembly perspective, you use a map of function addresses (which any linker can generate). The current Instruction Pointer identifies the top frame, the return addresses identify the call stack. The tricky part is tail-call optimization, which is a bit philosophical (do you want the logical or the actual call stack?)

replacing all function calls with their definition in a C/C++ code

I wonder if there is some theory/tool available to replace a piece of code that contains function calls, into code where all function call has been replaced by their respective code.
like
main()
{
fun();
}
fun()
{
int i;
fun2();
}
fun2()
{
int j;
}
into
main()
{
int i;
int j;
}
I know there is a lot to take care of, like local variable names, recursive calls, external function calls etc etc. .. ..
I also know that it may not be at all useful, but still does something like this exist? even in theory?
should I call it advance per-processor unit :)
The compiler can usually tell when it's a good idea to do this, and already automatically does inlining whenever needed. You can also suggest that a function should be inlined using the inline keyword before a function (note that it still doesn't actually force it, and the compiler might decide to avoid the inlining).It's generally not such a good idea to do this manually, as modern compilers tend to figure out the best possible inlinings on their own. This article explains inline functions really well, I found it very helpful
Edit 1:
There are several reasons why one might want to do that inlining you speak of. If you feel like your code is divided into many different functions reducing its clarity and making it overly verbose, you could try a refactoring tool, such as the one provided by the VAssist X Visual Studio plugin. Though this plugin doesn't really do what you suggest (I can't think of a tool that does), it can help move functions/ methods around with ease, allowing you to clean up your code.