in fact I don't know how to be very precise.
Today, I browsed the following page:
http://siliconframework.org/docs/hello_world.html
I found the following syntax:
GET / _hello = [] () { return D(_message = "Hello world."); }
I found "GET" can be a function by lambda expression, but I cannot figure out what does "/" and "_hello" mean here, and how they connect to something meaningful.
Also, what is that "_message = "?
BTW, my primary C++ knowledge is before C++11.
I googled quite a bit.
Could any one kindly give an explanation?
This library uses what is known as an embedded Domain Specific Language, where it warps C++ and preprocessor syntax in ways that allow a seemingly different language to be just another part of a C++ program.
In short, magic.
The first bit of magic lies in:
iod_define_symbol(hello)
which is a macro that generates the identifier _hello of type _hello_t.
It also creates a _hello_t type which inherites from a CRTP helper called iod::symbol<_hello_t>.
_hello_t overrides various operators (including operator= and operator/) in ways that they don't do what you'd normally expect C++ objects to behave.
GET / _hello = [] () { return D(_message = "Hello world."); }
so this calls
operator=(
operator/( GET, _hello ),
/* lambda_goes_here */
);
similarly in the lambda:
D(_message = "Hello world.");
is
D( operator=(_message, "Hello world.") );
operator/ and operator= can do nearly anything.
In the D case, = doesn't do any assigning -- instead, it builds a structure that basically says "the field called "message" is assigned the value "Hello world.".
_message knows it is called "message" because it was generated by a macro iod_define_symbol(message) where they took the string message and stored it with the type _message_t, and created the variable _message which is an instance of that type.
D takes an number of such key/value pairs and bundles them together.
The lambda returns this bundle.
So [] () { return D(_message = "Hello world."); } is a lambda that returns a bundle of key-value pair attachments, written in a strange way.
We then invoke operator= with GET/_hello on the left hand side.
GET is another global object with operator/ overloaded on it. I haven't tracked it down. Suppose it is of type iod::get_t (I made up that name: again, I haven't looked up what type it is, and it doesn't really matter)
Then iod::get_t::operator/(iod::symbol<T> const&) is overloaded to generate yet another helper type. This type gets the T's name (in this case "hello"), and waits for it to be assigned to by a lambda.
When assigned to, it doesn't do what you expect. Instead, it goes off and builds an association between "hello" and invoking that lambda, where that lambda is expected to return a set of key-value pairs generated by D.
We then pass one or more such associations to http_api, which gathers up those bundles and builds the data required to run a web server with those queries and those responses, possibly including flags saying "I am going to be an http server".
sl::mhd_json_serve then takes that data, and a port number, and actually runs a web server.
All of this is a bunch of layers of abstraction to make some reflection easier. The structures generated both have C++ identifiers, and similar strings. The similar strings are exposed in them, and when the json serialization (or deserialization) code is generated, those strings are used to read/write the json values.
The macros merely exist to make writing the boilerplate easier.
Techniques that might be helpful to read on further include "expression templates", "reflection", "CRTP", embedded "Domain Specific Language"s if you want to learn about what is going on here.
Some of the above contains minor "lies told to children" -- in particular, the operator syntax doesn't work quite like I implied. (a/b is not equivalent to operator/(a,b), in that the second won't call member operator /. Understanding that they are just functions is what I intend, not that the syntax is the same.)
#mattheiuG (the author of this framework) has shared these slides in a comment below this post that further explains D and the _message tokens and the framework.
It's not standard C++ syntax, it's framework specific instead. The elements prefixed with an underscore (_hello, _message etc) are used with a symbol definition generator that runs and creates the necessary definitions prior to compilation.
There's some more information on it on the end of this page: http://siliconframework.org/docs/symbols.html. Qt does a similar thing with its moc tool.
Related
I have a templated class SafeInt<T> (By Microsoft).
This class in theory can be used in place of a POD integer type and can detect any integer overflows during arithmetic operations.
For this class I wrote some custom templatized overloaded arithmetic operator (+, -, *, /) functions whose both arguments are objects of SafeInt<T>.
I typedef'd all my integer types to SafeInt class type.
I want to search my codebase for instances of the said binary operators where both operands are of type SafeInt.
Some of the ways I could think of
String search using regex and weed through the code to detect operator usage instances where both operands are SafeInt objects.
Write a clang tool and process the AST to do this searching (I am yet to learn how to write such a tool.)
Somehow add a counter to count the number of times the custom overloaded operator is instantiated. I spent a lot of time trying this but doesn't seem to work.
Can anyone suggest a better way?
Please let me know if I need to clarify anything.
Thanks.
Short answer
You can do this using the clang-query command:
$ clang-query \
-c='m cxxOperatorCallExpr(callee(functionDecl(hasName("operator+"))), hasArgument(0, expr(hasType(cxxRecordDecl(hasName("SafeInt"))))), hasArgument(1, expr(hasType(cxxRecordDecl(hasName("SafeInt"))))))' \
use-si.cc --
Match #1:
/home/scott/wrk/learn/clang/clang-query1/use-si.cc:10:3: note: "root" binds here
x + y; // reported
^~~~~
1 match.
What is clang-query?
clang-query is a utility intended to facilitate writing clang-tidy checks. In particular it understands the language of AST Matchers and can be used to interactively explore what is matched by a given match expression. However, as shown here, it can also be used non-interactively to look for arbitrary AST tree patterns.
The blog post Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query by Stephen Kelly provides a nice introduction to using clang-query.
The clang-query program is included in the pre-built LLVM binaries, or it can be built from source as described in the AST Matchers Tutorial.
How does the above command work?
The -c argument provides a command to run non-interactively. With whitespace added, the command is:
m // Match (and report) every
cxxOperatorCallExpr( // operator function call
callee(functionDecl( // where the callee
hasName("operator+"))), // is "operator+", and
hasArgument(0, // where the first argument
expr(hasType(cxxRecordDecl( // is a class type
hasName("SafeInt"))))), // called "SafeInt",
hasArgument(1, // and the second argument
expr(hasType(cxxRecordDecl( // is also a class type
hasName("SafeInt")))))) // called "SafeInt".
The command line ends with use-si.cc --, meaning to analyze use-si.cc and there are no extra compiler flags needed by clang to interpret it.
The clang-query command line has the same basic structure as that of clang-tidy, including the ability to pass -p compile_commands.json to scan many files at once, possibly with different compiler options per file.
Example input
For completeness, the input I used to test my matcher is use-si.cc:
// use-si.cc
#include "SafeInt.hpp" // SafeInt
void f1()
{
SafeInt<int> x(2);
SafeInt<int> y(3);
x + y; // reported
x + 2; // not reported
2 + x; // not reported
}
where SafeInt.hpp comes from https://github.com/dcleblanc/SafeInt , the repo named on the Microsoft SafeInt page.
To do this right, you clearly have to be able to identify individual uses of the operator which overload to a specific operator definition. Fundamentally, you need what the front end of a C++ compiler does: parsing and name resolution (including the overloads).
Obviously GCC and Clang have this basic capability. But you want to track/display all uses of the specific operator. You can probably bend Clang (or GCC, harder) to give you this information on a file-by-file basis.
Our DMS Software Reengineering Toolkit with its C++ Front End can be used for this, too.
DMS provides the generic parsing and symbol table support machinery; the C++ front end specializes DMS to handle C++ with full, accurate name resolution including overloads, for both GCC5 and MSVS2015. Its symbol table actually collects, for each declaration in a scope, the point of the declaration, and the list of uses of that declaration in terms of accurate source positions. The symbol scopes include an entry for each (overloaded) operator valid in the scope. You could just
go to the desired symbol table entry and enumerate/count the list of references to get a raw count. There are standard APIs for this available via DMS.
The same kind of symbol scope/definition/uses information is used by our Java Source Browser to build an HTML-based JavaDoc-like display with full HTML linkages between symbol declarations and uses. So for any symbol declaration, you can easily see the uses.
The C++ front end has a similar HTMLizer that operates on C++ source code. It isn't as mature/pretty, but it is robust. It presently doesn't show all the uses of a declared symbol, but that would be a pretty straightforward change to make to it. (I don't have a publicly visible instance of it. Contact me through my bio and I can send you a sample).
I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.
I created a syntax extension that allow the definition of a type as
type.yjson type_name {
/* type_declaration */
}
to be able to build a record value directly from a json file.
The syntax extension insert a module and the function necessary to do so.
Until here, no problem. The syntax extension do exactly what I wanted.
I start having some issue if I want to use "yjson" at some other place in my code (i.e:function parameter).
Here what I tried:
EXTEND Gram
str_item:
[
[ KEYWORD "type"; KEYWORD "."; "yjson"; tdl_raw = type_declaration ->
Here the error I get when I use "yjson" as a function parameter
[fun_binding] expected after [ipatt] (in [let_binding])
I don't really understand what happen here. It doesn't seems like the rule have been match, so why do I get a parse error ?
I do not perfectly understand the P4's mechanism around this, but [ [ "blahblah" -> ... makes blahblah as a new keyword of the language, so you can no longer use blahblah as a function argument.
To see this, try preprocess your pa_*.ml by camlp4of and see how "blahblah" is expanded to Gram.Skeyword "blahblah". It seems that this Skeyword _ is passed to Structure.using via Insert.insert of P4 and the string is registered as a new keyword.
To keep yjson usable as a normal variable, use id = LIDENT instead of "yjson" in your rule, then check id's content is "yjson" or not in your action.
If I can make a slightly off-topic remark, I think it's wrong to design a custom syntax for type-directed code generation, when there already exist two different syntaxes (one for type_conv and one for deriving), one of which (type-conv) is becoming a de facto standard.
type foo = {
...
} with json
If you pick a syntax for this, you should use this one unless you have very good reasons not to. In fact, type-conv itself is a helper utility to let you write your own type-directed code generators, so you may as well use type-conv directly for what you're trying to do.
(You probably know about Martin Jambon's Atdgen, which made a conscious choice not to use Camlp4; there is ongoing work by Alain Frisch to support annotations directly in the OCaml syntax, but that's not yet ready for consumption.)
Let's say I have a program made of several "basic" algorithms on integral variables, such as :
if(a<b)
a += c;
Is there a tool that would allow me to automatically log all the changes made to the different variables at run time?
For instance it would display in that case in a log file:
"condition passed because 5=a < b=10
a += 10; because c=10"
or some equivalent.
I am aware that I could manually log each operation but that would be much too complex.
Is there any tool that would allow me to do something like that? I don't care about refactoring / recompiling as long as it's not totally manual.
You can write your own integer class that overloads the operators accordingly (with automatic logging). If the class also provides implicit conversion (a constructor from int and a conversion operator to int), then you "only" need to change the types of variables and parameters to have your automatic logging of values. But instead of names you could only log addresses (or something derived from it like var20). With the help of a #define you could easily switch between raw ints (without logging) or your integer class with logging.
To get also the names of the variables into the logging one would either have to rewrite the operators with macros like
if (LESS(a,b))
INC(a,c)
or have a parser that automatically transforms your code into something like this. But I am not aware of any existing tool providing this.
I have a hard time imagining that logging the complete execution of a program like this would be useful. A simple std::cout << "hello, world!\n"; would produce a mass of useless logs.
What do you actually need to do? If you want to debug code you should probably use a debugger to examine the program as it runs instead of using a printf-debugging-gone-horribly-wrong strategy. If you want a way to describe the complete execution for later examination/manipulation you could make sure the program behaves deterministically and just save the program input.
The right solution depends on the actual problem, but it's not likely that complete execution logging is the correct solution to anything.
I have the following code in Lua:
ABC:
test (X)
The test function is implemented in C + +. My problem is this: I need to know what the variable name passed as parameter (in this case X). In C + + only have access to the value of this variable, but I must know her name.
Help please
Functions are not passed variables; they are passed values. Variables are just locations that store values.
When you say X somewhere in your Lua code, that means to get the value from the variable X (note: it's actually more complicated than that, but I won't get into that here).
So when you say test(X), you're saying, "Get the value from the variable X and pass that value as the first parameter to the function test."
What it seems like you want to do is change the contents of X, right? You want to have the test function modify X in some way. Well, you can't really do that directly in Lua. Nor should you.
See, in Lua, you can return values from functions. And you can return multiple values. Even from C++ code, you can return multiple values. So whatever it is you wanted to store in X can just be returned:
X = test(X)
This way, the caller of the function decides what to do with the value, not the function itself. If the caller wants to modify the variable, that's fine. If the caller wants to stick it somewhere else, that's also fine. Your function should not care one way or the other.
Also, this allows the user to do things like test(5). Here, there is no variable; you just pass a value directly. That's one reason why functions cannot modify the "variable" that is passed; because it doesn't have to be a variable. Only values are passed, so the user could simply pass a literal value rather than one stored in a variable.
In short: you can't do it, and you shouldn't want to.
The correct answer is that Lua doesn't really support this, but there is the debug interface. See this question for the solution you're looking for. If you can't get a call to debug to work directly from C++, then wrap your function call with a Lua function that first extracts the debug results and then calls your C++ function.
If what you're after is a string representation of the argument, then you're kind of stuck in lua.
I'm thinking something like in C:
assert( x==y );
Which generates a nice message on failure. In C this is done through macros.
Something like this (untested and probably broken).
#define assert(X) if(!(X)) { printf("ASSERION FAILED: %s\n", #X ); abort(); }
Here #X means the string form of the arguments. In the example above that is "x==y". Note that this is subtly different from a variable name - its just the string used in the parser when expanding the macro.
Unfortunately there's no such corresponding functionality in lua. For my lua testing libraries I end up passing the stringified version as part of the expression, so in lua my code looks something like this:
assert( x==y, "x==y")
There may be ways to make this work as assert("x==y") using some kind of string evaluation and closure mechanism, but it seemed to tricky to be worth doing to me.
EDIT:
While this doesn't appear to be possible in pure lua, there's a patched version that does seem to support macros: http://lua-users.org/wiki/LuaMacros . They even have an example of a nicer assert.