Building up AST from a function with more than one returns for llvm - llvm

For example I have this function:
def max(a,b) {
if(a < b) return b;
if(a > b) return a;
}
I am curious how to parse this into an AST.
If I understand this well then the node of it's body should return a ReturnInst*.
But in my AST this body is contained by two nodes (as expressions), one for the first if and another one for the other one.
Is there some trick or the design is wrong to begin with?
Edit: I've just tough out a might-be-a-solution:
CreateAlloca at the begin of the body.
CreateStore and jump to the end label at every return.
At the end label return the var.
Is it a good idea? And how to jump/goto with llvm?

You can try the online demo: http://llvm.org/demo/ Type in the C or C++ for what you want to do and it will show you the LLVM output.

Your outlined solution with alloca+store+jump definitely works, if I'm understanding it correctly. And yes, the LLVM optimizers will handle it just fine. Alternatively, there isn't any restriction on the number of ret instructions in a given function.
Not sure what you're asking with generating a goto; if you've managed to write an if statement, it the same sort of br instruction you need at the end of one. In general, running "clang -S -emit-llvm" over some simple code is a good technique to see how to generate simple constructs. Also, "llc -march=cpp" is a good way to see how to build instructions in C++.

Related

Insert text into C++ code between functions

I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes

Parsing C++ to make some changes in the code

I would like to write a small tool that takes a C++ program (a single .cpp file), finds the "main" function and adds 2 function calls to it, one in the beginning and one in the end.
How can this be done? Can I use g++'s parsing mechanism (or any other parser)?
If you want to make it solid, use clang's libraries.
As suggested by some commenters, let me put forward my idea as an answer:
So basically, the idea is:
... original .cpp file ...
#include <yourHeader>
namespace {
SpecialClass specialClassInstance;
}
Where SpecialClass is something like:
class SpecialClass {
public:
SpecialClass() {
firstFunction();
}
~SpecialClass() {
secondFunction();
}
}
This way, you don't need to parse the C++ file. Since you are declaring a global, its constructor will run before main starts and its destructor will run after main returns.
The downside is that you don't get to know the relative order of when your global is constructed compared to others. So if you need to guarantee that firstFunction is called
before any other constructor elsewhere in the entire program, you're out of luck.
I've heard the GCC parser is both hard to use and even harder to get at without invoking the whole toolchain. I would try the clang C/C++ parser (libparse), and the tutorials linked in this question.
Adding a function at the beginning of main() and at the end of main() is a bad idea. What if someone calls return in the middle?.
A better idea is to instantiate a class at the beginning of main() and let that class destructor do the call function you want called at the end. This would ensure that that function always get called.
If you have control of your main program, you can hack a script to do this, and that's by far the easiet way. Simply make sure the insertion points are obvious (odd comments, required placement of tokens, you choose) and unique (including outlawing general coding practices if you have to, to ensure the uniqueness you need is real). Then a dumb string hacking tool to read the source, find the unique markers, and insert your desired calls will work fine.
If the souce of the main program comes from others sources, and you don't have control, then to do this well you need a full C++ program transformation engine. You don't want to build this yourself, as just the C++ parser is an enormous effort to get right. Others here have mentioned Clang and GCC as answers.
An alternative is our DMS Software Reengineering Toolkit with its C++ front end. DMS, using its C++ front end, can parse code (for a variety of C++ dialects), builds ASTs, carry out full name/type resolution to determine the meaning/definition/use of all symbols. It provides procedural and source-to-source transformations to enable changes to the AST, and can regenerate compilable source code complete with original comments.

In C++ is there any portable way of jumping to a computed offset?

I'm looking for a portable way of jumping to a computed offset in C++.
I know that GCC has a mechanism for doing this using goto as discussed here:
http://social.msdn.microsoft.com/forums/en-US/vclanguage/thread/ec7e52b5-0978-4123-9d29-9dc7d807c6b4
Sadly I don't think other compilers implement this.
Normally I wouldn't have reason to use goto in C++ but I found that it could be useful for optimizing an interpreted language (search for 'threaded interpreter' if you are interested in this).
I know that I can implement this using inline assembly language, but the problem then is I have to implement this for every platform the interpreter runs on.
So does anyone know if there is a portable way of doing this?
The solution might involve goto but I'm open to any other sort of hackery that you can think of ;)
UPDATE: Currently the interpreter uses a switch statement. I'm looking for techniques that improve on this and make the interpreter run faster. Specifically I'm trying to figure out a portable way of saying 'goto <next-byte-code-instruction>' where <next-byte-code-instruction> is a computed offset that can be stored in the byte code itself.
UPDATE: I found a related question here.
What opcode dispatch strategies are used in efficient interpreters?
switch allows to jump to predefined offsets
Array of function pointers/function objects
setjmp/longjmp
I think setjmp/longjmp is as close as you can get. Beyond that, the spec calls things like offsets in the instruction stream "implementation details", and you're stuck with platform-specific stuff like intrinsics and inline asm.
The other (really ugly) thing you could try is using a switch statement, which is typically implemented as a jump table of offsets. Ie,
int ip = 0;
top:
switch( ip )
{
case 0:
ip += do_whatever(); // returns an offset
goto top;
case 1:
ip += some_other_function();
goto top;
case 2:
ip += etc();
goto top;
// ad infinitum...
}
This is in the spirit of Bell's original article, and the gist is that the body of each case is a single VM "opcode" in the stream. But that seems really icky.
Aside from goto's and longjumping and other nonportable tricks, could you consider virtual functions or at least ponters to functions? You can have several singleton objects of types derived from single base class. Array of pointers has all those objects. Virtual function returns index of the next interpreter state. Base object holds all data ever needed by any derived object.
EDIT: Pointers to functions could be a little faster even, but a little messier too. There is a Guru of the Week article that explains it.
It looks like it is not possible to implement this in a portable way.
(Although I still welcome alternative answers!)
I found this blog post from someone who has already tried what I wanted to do. He uses the GCC approach and got a 33% speed improvement (stats are at the end of the post).
The solution is conditionally compiled under Win32 to use inline assembly to compute the address of the labels. But he reports that using inline assembly in this way is 3 times slower than normal! Ouch
http://abepralle.wordpress.com/2009/01/25/how-not-to-make-a-virtual-machine-label-based-threading/
Oh well, I didn't want to use inline assembly anyway.
I seriously wouldn't use any form of goto. It died a long time ago. If you must, however, perhaps you should just write a macro for this, and use a bunch of ifdefs to make it portable. This shouldn't be too hard, and it is the fastest solution.

Executing certain code for every method call in C++

I have a C++ class I want to inspect. So, I would like to all methods print their parameters and the return, just before getting out.
The latter looks somewhat easy. If I do return() for everything, a macro
#define return(a) cout << (a) << endl; return (a)
would do it (might be wrong) if I padronize all returns to parenthesized (or whatever this may be called). If I want to take this out, just comment out the define.
However, printing inputs seems more difficult. Is there a way I can do it, using C++ structures or with a workaroud hack?
A few options come to mind:
Use a debugger.
Use the decorator pattern, as Space_C0wb0y suggested. However, this could be a lot of manual typing, since you'd have to duplicate all of the methods in the decorated class and add logging yourself. Maybe you could automate the creation of the decorator object by running doxygen on your class and then parsing its output...
Use aspect-oriented programming. (Logging, which is what you're wanting to do, is a common application of AOP.) Wikipedia lists a few AOP implementations for C++: AspectC++, XWeaver, and FeatureC++.
However, printing inputs seems more
difficult. Is there a way I can do it,
using C++ structures or with a
workaroud hack?
No.
Update: I'm going to lose some terseness in my answer by suggesting that you can probably achieve what you need by applying Design by Contract.
It sounds like you want to use a debugging utility to me. That will allow you to see all of the parameters that you want.
If you don't mind inserting some code by hand, you can create a class that:
logs entry to the method in the constructor
provides a method to dump arbitrary parameters
provides a method to record status
logs exit with recorded status in the destructor
The usage would look something like:
unsigned long long
factorial(unsigned long long n) {
Inspector inspect("factorial", __FILE__, __LINE__);
inspect.parameter("n", n);
if (n < 2) {
return inspect.result(1);
}
return inspect.result(n * fact(n-1));
}
Of course, you can write macros to declare the inspector and inspect the parameters. If you are working with a compiler that supports variable argument list macros, then you can get the result to look like:
unsigned long long
factorial(unsigned long long n) {
INJECT_INSPECTOR(n);
if (n < 2) {
return INSPECT_RETURN(1);
}
return INSPECT_RETURN(n * fact(n-1));
}
I'm not sure if you can get a cleaner solution without going to something like an AOP environment or some custom code generation tool.
If your methods are all virtual, you could use the decorator-pattern to achieve that in a very elegant way.
EDIT: From your comment above (you want the output for statistics) I conclude that you should definitely use the decorator-pattern. It is intended for this kind of stuff.
I would just use a logging library (or some macros) and insert manual logging calls. Unless your class has an inordinate number of methods, it's probably faster to get going with than developing and debugging more sophisticated solution.

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.