Related
I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.
Let's say I have a program made of several "basic" algorithms on integral variables, such as :
if(a<b)
a += c;
Is there a tool that would allow me to automatically log all the changes made to the different variables at run time?
For instance it would display in that case in a log file:
"condition passed because 5=a < b=10
a += 10; because c=10"
or some equivalent.
I am aware that I could manually log each operation but that would be much too complex.
Is there any tool that would allow me to do something like that? I don't care about refactoring / recompiling as long as it's not totally manual.
You can write your own integer class that overloads the operators accordingly (with automatic logging). If the class also provides implicit conversion (a constructor from int and a conversion operator to int), then you "only" need to change the types of variables and parameters to have your automatic logging of values. But instead of names you could only log addresses (or something derived from it like var20). With the help of a #define you could easily switch between raw ints (without logging) or your integer class with logging.
To get also the names of the variables into the logging one would either have to rewrite the operators with macros like
if (LESS(a,b))
INC(a,c)
or have a parser that automatically transforms your code into something like this. But I am not aware of any existing tool providing this.
I have a hard time imagining that logging the complete execution of a program like this would be useful. A simple std::cout << "hello, world!\n"; would produce a mass of useless logs.
What do you actually need to do? If you want to debug code you should probably use a debugger to examine the program as it runs instead of using a printf-debugging-gone-horribly-wrong strategy. If you want a way to describe the complete execution for later examination/manipulation you could make sure the program behaves deterministically and just save the program input.
The right solution depends on the actual problem, but it's not likely that complete execution logging is the correct solution to anything.
A little something that could be borrowed from IDEs. So the idea would be to highlight function arguments (and maybe scoped variable names) inside function bodies. This is the default behaviour for some C:
Well, if I were to place the cursor inside func I would like to see the arguments foo and bar highlighted to follow the algorithm logic better. Notice that the similarly named foo in func2 wouldn't get highlit. This luxury could be omitted though...
Using locally scoped variables, I would also like have locally initialized variables highlit:
Finally to redemonstrate the luxury:
Not so trivial to write this. I used the C to give a general idea. Really I could use this for Scheme/Clojure programming better:
This should recognize let, loop, for, doseq bindings for instance.
My vimscript-fu isn't that strong; I suspect we would need to
Parse (non-regexply?) the arguments from the function definition under the cursor. This would be language specific of course. My priority would be Clojure.
define a syntax region to cover the given function/scope only
give the required syntax matches
As a function this could be mapped to a key (if very resource intensive) or CursorMoved if not so slow.
Okay, now. Has anyone written/found something like this? Do the vimscript gurus have an idea on how to actually start writing such a script?
Sorry about slight offtopicness and bad formatting. Feel free to edit/format. Or vote to close.
This is much harder than it sounds, and borderline-impossible with the vimscript API as it stands, because you don't just need to parse the file; if you want it to work well, you need to parse the file incrementally. That's why regular syntax files are limited to what you can do with regexes - when you change a few characters, vim can figure out what's changed in the syntax highlighting, without redoing the whole file.
The vim syntax highlighter is limited to dealing with regexes, but if you're hellbent on doing this, you can roll your own parser in vimscript, and have it generate a buffer-local syntax that refers to tokens in the file by line and column, using the \%l and \%c atoms in a regex. This would have to be rerun after every change. Unfortunately there's no autocmd for "file changed", but there is the CursorHold autocmd, which runs when you've been idle for a configurable duration.
One possible solution can be found here. Not the best way because it highlights every occurrence in the whole file and you have to give the command every time (probably the second one can be avoided, don't know about the first). Give it a look though.
I have a C++ class I want to inspect. So, I would like to all methods print their parameters and the return, just before getting out.
The latter looks somewhat easy. If I do return() for everything, a macro
#define return(a) cout << (a) << endl; return (a)
would do it (might be wrong) if I padronize all returns to parenthesized (or whatever this may be called). If I want to take this out, just comment out the define.
However, printing inputs seems more difficult. Is there a way I can do it, using C++ structures or with a workaroud hack?
A few options come to mind:
Use a debugger.
Use the decorator pattern, as Space_C0wb0y suggested. However, this could be a lot of manual typing, since you'd have to duplicate all of the methods in the decorated class and add logging yourself. Maybe you could automate the creation of the decorator object by running doxygen on your class and then parsing its output...
Use aspect-oriented programming. (Logging, which is what you're wanting to do, is a common application of AOP.) Wikipedia lists a few AOP implementations for C++: AspectC++, XWeaver, and FeatureC++.
However, printing inputs seems more
difficult. Is there a way I can do it,
using C++ structures or with a
workaroud hack?
No.
Update: I'm going to lose some terseness in my answer by suggesting that you can probably achieve what you need by applying Design by Contract.
It sounds like you want to use a debugging utility to me. That will allow you to see all of the parameters that you want.
If you don't mind inserting some code by hand, you can create a class that:
logs entry to the method in the constructor
provides a method to dump arbitrary parameters
provides a method to record status
logs exit with recorded status in the destructor
The usage would look something like:
unsigned long long
factorial(unsigned long long n) {
Inspector inspect("factorial", __FILE__, __LINE__);
inspect.parameter("n", n);
if (n < 2) {
return inspect.result(1);
}
return inspect.result(n * fact(n-1));
}
Of course, you can write macros to declare the inspector and inspect the parameters. If you are working with a compiler that supports variable argument list macros, then you can get the result to look like:
unsigned long long
factorial(unsigned long long n) {
INJECT_INSPECTOR(n);
if (n < 2) {
return INSPECT_RETURN(1);
}
return INSPECT_RETURN(n * fact(n-1));
}
I'm not sure if you can get a cleaner solution without going to something like an AOP environment or some custom code generation tool.
If your methods are all virtual, you could use the decorator-pattern to achieve that in a very elegant way.
EDIT: From your comment above (you want the output for statistics) I conclude that you should definitely use the decorator-pattern. It is intended for this kind of stuff.
I would just use a logging library (or some macros) and insert manual logging calls. Unless your class has an inordinate number of methods, it's probably faster to get going with than developing and debugging more sophisticated solution.
Whenever I do a commit cycle in svn, I examine the diff when writing my comments. I thought it would be really nice to show the actual function that I made the modifications in when showing the diff.
I checked out this page, which mentioned that the -p option will show the C function that the change is in. When I tried using the -p option with some C++ code, however, it usually returns the access specifier (private, public, protected, etc), which isn't terribly handy.
I did notice that there is a -F option for diff that does the same as -p, but takes a user-specified regex. I was wondering: is there a simple regex to match a C++ function? It seems like that would be all that is necessary to get this to work.
I'd spend some time looking at this myself, but work is in crunch-mode and this seemed like something that a lot of people would find useful, so I figured I'd post it here.
EDIT: I'm not looking for something that's a slam-dunk catch-all regex, but something that would simply find the nearest function definition above the area diff would show. The fact that it would be nowhere near perfect, and somewhat buggy is okay with me. Just as long as it works right maybe like 60% of the time would be a significant productivity improvement IMHO.
Is there a simple regex to match a C++ function? No.
Is there a (complex) regex to match a C++. Maybe or could be possible to write one.
But I would say regular expressions neither are easily up to such a task (given you want some kind of excat match) nor are they the right tool for such a task.
Just think about case like this one. How would you handle this stuff.
void (*function(int, void (*)(int)))(int);
func1(int), func2(double); double func3(int);
The only real solution is to use a parser using yacc/lex. Which for your use case of course does nothing.
So either hack together some incomplete regex which fits most functions signatures in your code
If you're going to be applying this only to your commits I would recommend making a habit of adding a commit comment to the function, e.g:
void something ()
{
...
some thing = 1;
...
}
to
void something ()
// last change by me: a better value for thing
{
...
some thing = 2;
...
}
will display for you the function and your comment with the edits. As a bonus, other people will be able to understand what you're doing.
TortoiseSVN uses the following regexes for autocompletion support in its commit dialog for C++ files:
.h, .hpp, .hxx = ^\s*(?:class|struct)\s+([\w_]+)|\W([\w_]+)\(
.cpp, .c, .cxx = \W(([\w_]+)::([\w_]+))|\W([\w_]+)\(
I don't know how accurate they are, though.
I don't know of an option in SVN that will do this, and a regex-based solution will likely be one or more of the following:
a nightmare to code and maintain, with lots of special cases
incorrect, and missing several valid C++ functions
You need some sort of parser for this. It's technically possible to enumerate all of the possible regex cases, but writing a parser is the correct way to solve this. If you have time to roll your own solution I'd check out ANTLR, they have several C/C++ grammars available on this page:
ANTLR Grammar Lists