Following Boost.Spirit compiler examples I am migrating my Flex/Bison based calculator-like grammar to Spirit based. I want to add a feature #include<another_input.inp>. I have defined the include_statement grammar successfully. Should I follow the way error handling was doing: on_success(include_statement, annotation_function(...)), i.e. for each successful matching of include_statement, get the new input file name and call phrase_parse() again ? or like the Flex/Bison to push/pop the input stack?
Thanks.
Guessing, from the little information that is here, that you meant to ask whether you can reuse the same grammar instance, or it should be better to instantiate a new instance to parse the includes, it depends.
You can do both.
When the grammar is stateless (hint: it usually is if you can use it const) there's no difference. Otherwise, prefer to instantiate a separate instance.
However,
the point is somewhat moot since it appears you already decided to parse the includes after parsing the main document (if I get your comment right)
there's always the danger of global state; Even if the grammar object is const, you could potentially modify external state (e.g. using phx::ref from semantic action) so, this would be an issue, regardless of whether you used separate grammar instances.
Related
I'm writing a scanner/parser combination using Flex and Bison, if possible I would like to avoid using C++ specific features of both programs but nevertheless I need to access a C++ library from the source file generated by Bison. At the moment I'm compiling the source file generated by Flex as a C program.
One thing I thought I might be able to do is to declare STL type members inside Bison's %union statement, e.g.:
%union {
std::string str;
};
I quickly realized that this cannot work because this produces a union that is included by the Flex source file. I then thought I might just compile that with a C++ compiler as well but the statement above is already rejected when running bison:
error: expected specifier-qualifier-list before ‘std’
I don't really want to go through the trouble of copying and concatenating strings with C stdlib functions throughout my parser. What can I do in order to make the scanner return STL types to the parser?
EDIT: the linked duplicate does not really answer my question, the answers to that one only show how to compile both files using a C++ compiler which is not my issue.
You can certainly compile both your generated scanner and parser with C++, even if you use the default C skeletons (and I agree that the C++ skeletons are badly documented and excessively complicated). So there is nothing stopping you from using std::string inside your parser.
However, that won't let you put a std::string inside a union, because you can't just toss a class with a non-trivial destructor into a union. It's possible to work around this limitation by explicitly declaring the semantic type and providing explicit constructors and destructors, but it's going to be a fair amount of work and it may well not be worth it.
That still leaves you with a couple of options. One is to use a pointer to a std::string, which means that your scanner action has to do something like:
[[:alpha:]][[:alnum:]_]* yylval.strval = new std::string(yytext);
Another one is to just use C strings, leading to:
[[:alpha:]][[:alnum:]_]* yylval.strval = strdup(yytext);
In both cases, you'll end up having to manually manage the allocated memory; C++'s smart pointers won't help you because they also have non-trivial destructors, so they can't be easily shoe-horned into semantic unions either.
Since it appears that you're going to eventually make the token into a std::string, you might as well do it from the start using the first option above. Since most tokens are short and most C++ libraries now implement a short string optimization, new std::string(yytext) will frequently require only one memory allocation (and if it requires two, the library will transparently handle the second one).
I'm generating C++ code and run into issues when the model being generated from has properties clashing with C++ keywords. I'd prefer the model to stay language agnostic.
I've tried some #define int ReSeRvEd_int-hacks local to the generated code but it just feels wrong to allocate other symbols - the problem does not really go away and either case cross referencing between generated code and model becomes more difficult.
Any suggestion how to suppress/hide keywords?
I can think of a couple of approaches:
Add a standard prefix or suffix to all generated tokens. So rather than properties named "steve" and "int" producing variables named steve and int respectively, they would produce prop_steve and prop_int.
Force generated tokens to be capitalized.
Two things that I would not do:
Try to make the parser okay with a property named int, as you seem to be trying to do above. In addition to violating the Principle of Least Astonishment, this is not legal.
Have a hardcoded remapping from, say, "int" to innt. Ugly, inconsistent, and (assuming the generated code interfaces with user-written code) forces the user to memorize the remappings.
I was reading the paper "An Object oriented preprocessor fit for C++".
"http://www.informatik.uni-bremen.de/st/lehre/Arte-fakt/Seminar/papers/17/An%20Object-Oriented%20preprocessor%20fit%20for%20C++.pdf"
It discusses three different types of macros.
text macros. // pretty much the same as C preprocessor
computational macros // text replaced as a result of computation
syntax macros. // text replaced by the syntax tree representating a linguistically consistent construct.
Can somebody please explain the last two type of macros in an elaborate way.
It says that inline functions and templates are examples of computational macros, how ?
Looking at the original Cheatham's paper from 1966 that the Willink's and Muchnick's paper refer to I'd summarize the different macro types like this:
Text macros do text replacements before scanning and parsing.
Syntactic macros are processed during scanning and parsing. Calling a syntax macro replaces the macro call with another piece of AST.
Computational macros can happen at any point after the AST has been built by the scanner and the parser. The point is that at this point we are no longer processing any text but instead manipulating the nodes of the AST i.e., we are dealing with objects that might already even have semantic information attached to them.
I'm no C++ internals expert but I'd assume that the inlining of function calls and instantiating templates is about manipulating the syntax tree before, while and after it's been annotated with semantic information necessary to compile it properly as both of those seem to assume knowing a lot of stuff (like type info and if something is good to be inlined) that is not yet known during scanning and parsing.
By 2. it sounds like they mean that some computation is done at compile time and the resulting instructions executed at runtime only involve the result. I wouldn't think inline functions particularly represent this, but template meta-programming does exactly this. Also constexpr in C++11.
I think 3. could also be represented by the use of templates. A template does represent a syntax tree, and instantiating it involves taking the generic syntax tree, filling in the parameterized, unknown bits, and using the resulting syntax tree.
Many (most) regular expression libraries for C++ allow for creating the expression from a string during runtime. Is anyone aware of any C++ parser generators that allow for feeding a grammar (preferably BNF) represented as a string into a generator at runtime? All the implementations I've found either require an explicit code generator to be run or require the grammar to be expressed via clever template meta-programming.
It should be pretty easy to build a recursive descent, backtracking parser that accepts a grammar as input. You can reduce all your rules to the following form (or act as if you have):
A = B C D ;
Parsing such a rule by recursive descent is easy: call a routine that corresponds to finding a B, then one that finds a C, then one that finds a D. Given you are doing a general parser, you can always call a "parse_next_sentential_form(x)" function, and pass the name of the desired form (terminal or nonterminal token) as x (e.g., "B", "C", "D").
In processing such a rule, the parser wants to produce an A, by finding a B, then C, then D. To find B (or C or D), you'd like to have an indexed set of rules in which all the left-hand sides are the same, so one can easily enumerate the B-producing rules, and recurse to process their content. If your parser gets a failure, it simply backtracks.
This won't be a lightning fast parser, but shouldn't be terrible if well implemented.
One could also use an Earley parser, which parses by creating states of partially-processed rules.
If you wanted it to be fast, I suppose you could simply take the guts of Bison and make it into a library. Then if you have grammar text or grammar rules (different entry points into Bison), you could start it and have it produce its tables in memory (which it must do in some form). Don't spit them out; simply build an LR parsing engine that uses them. Voila, on-the-fly efficient parser generation.
You have to worry about ambiguities and the LALR(1)ness of your grammar if you do this; the previous two solutions work with any context free grammar.
I am not aware of an existing library for this. However if performance and robustness are not critical, then you can spin off bison or any other tool that generates C code (via popen(3) or similar), spin off gcc on the generated code, link it into shared library and load the library via dlopen(3)/dlsym(3). On Windows -- DLL and LoadLibrary() instead.
The easiest option is to embed some scripting language or even a full-blown VM (e.g., Mono), and run your generated parsers on top of it. Lua has quite a powerful JIT compiler, decent metaprogramming capabilities and several Packrat implementations ready to use, so probably it would be the least effort way.
I just came across this http://cocom.sourceforge.net/ammunition++-13.html
The last one is an Earley Parser and it appears to take the grammar as a string.
One of the functions is:
Public function `parse_grammar'
`int parse_grammar (int strict_p, const char *description)'
is another function which tunes the parser to given grammar.
The grammar is given by string `description'.
The description is similiar YACC one.
The actual code is at http://sourceforge.net/projects/cocom/
EDIT
A newer version is at https://github.com/vnmakarov/yaep
boost::spirit is a C++ parsing framework that can be used to construct parsers dynamically at runtime.
The task
I am trying to work out how best to add C++0x's override identifier to all existing methods that are already overrides in a large body of C++ code, without doing it manually.
(We have many, many hundreds of thousands of lines of code, and doing it manually would be a complete non-starter.)
Current idea
Our coding standards say that we should add the virtual keyword against all implicitly virtual methods in derived classes, even though strictly unnecessary (to aid comprehension).
So if I were to script the addition myself, I'd write a script that read all our headers, found all functions beginning with virtual, and insert override before the following semi-colon. Then compile it on a compiler that supports override, and fix all the errors in base classes.
But I'd really much rather not use this home-grown way, as:
it's obviously going to be tedious and error-prone.
not everyone has remembered, every time, to add the virtual keyword, so this method would miss out some existing overrides
Is there an existing tool?
So, is there already a tool that parses C++ code, detects existing methods that overrides, and appends override to their declarations?
(I am aware of static analysis tools such as PC-lint that warn about functions that look like they should be overrides. What I'm after is something that would actually munge our code, so that future errors in overrides will be detected at compiler-time, rather than later on in static analysis)
(In case anyone is tempted to point out that C++03 doesn't support 'override'... In practice, I'd be adding a macro, rather than the actual "override" identifier, to use our code on older compilers that don't support this feature. So after the identifier was added, I'd run a separate script to replace it with whatever macro we're going to use...)
Thanks in advance...
There is a tool under development by the LLVM project called "cpp11-migrate" which currently has the following features:
convert loops to range-based for loops
convert null pointer constants (like NULL or 0) to C++11 nullptr
replace the type specifier in variable declarations with the auto type specifier
add the override specifier to applicable member functions
This tool is documented here and should be released as part of clang 3.3.
However, you can download the source and build it yourself today.
Edit
Some more info:
Status of the C++11 Migrator - a blog post, dated 2013-04-15
cpp11-migrate User’s Manual
Edit 2: 2013-09-07
"cpp11-migrate" has been renamed to "clang-modernize". For windows users, it is now included in the new LLVM Snapshot Builds.
Edit 3: 2020-10-07
"clang-modernize" has bee renamed to "Clang-Tidy".
Our DMS Software Reengineering Toolkit with its C++11-capable C++ Front End can do this.
DMS is a general purpose program transformation system for arbitrary programming languages; the C++ front end allows it to process C++. DMS parses, builds ASTs and symbol tables that are accurate (this is hard to do for C++), provides support for querying properties of the AST nodes and trees, allows procedural and source-to-source transformations on the tree. After all changes are made, the modified tree can be regenerated with comments retained.
Your problem requires that you find derived virtual methods and change them. A DMS source-to-source transformation rule to do that would look something like:
source domain Cpp. -- tells DMS the following rules are for C++
rule insert_virtual_keyword (n:identifier, a: arguments, s: statements):
method_declaration -> method_declaration " =
" void \n(\a) { \s } " -> " virtual void \n(\a) { \s }"
if is_implicitly_virtual(n).
Such rules match against the syntax trees, so they can't mismatch to a comment, string, or whatever. The funny quotes are not C++ string quotes; they are meta-quotes to allow the rule language to know that what is inside them has to be treated as target language ("Cpp") syntax. The backslashes are escapes from the target language text, allowing matches to arbitrary structures e.g., \a indicates a need for an "a", which is defined to be the syntactic category "arguments".
You'd need more rules to handle cases where the function returns a non-void result, etc. but you shouldn't need a lot of them.
The fun part is implementing the predicate (returning TRUE or FALSE) controlling application of the transformation: is_implicitly_virtual. This predicate takes (an abstract syntax tree for) the method name n.
This predicate would consult the full C++ symbol table to determine what n really is. We already know it is a method from just its syntactic setting, but we want to know in what class context.
The symbol table provides the linkage between the method and class, and the symbol table information for the class tells us what the class inherits from, and for those classes, which methods they contain and how they are declared, eventually leading to the discovery (or not) that the parent class method is virtual. The code to do this has to be implemented as procedural code going against the C++ symbol table API. However, all the hard work is done; the symbol table is correct and contains references to all the other data needed. (If you don't have this information, you can't possibly decide algorithmically, and any code changes will likely be erroneous).
DMS has been used to carry out massive changes on C++ code in the past using program transformations.(Check the Papers page at the web site for C++ rearchitecting topics).
(I'm not a C++ expert, merely the DMS architect, so if I have minor detail wrong, please forgive.)
I did something like this a few months ago with about 3 MB worth of code and while you say that "doing it manually would be a complete non-starter," I think it is the only way. The reason is that you should be applying the override keyword to the prototypes that are intended to override base class methods. Any tool that adds it will put it on the prototypes that actually override base class methods. The compiler already knows which methods those are so adding the keyword doesn't change anything. (Please note that I am not terribly familiar with the new standard and I am assuming the override keyword is optional. Visual Studio has supported override since at least VS2005.)
I used a search for "virtual" in the header files to find most of them and I still occasionally find another prototype that is missing the override keyword.
I found two bugs by going through that.
Eclipse CDT has a working C++ parser and semantic utilities. The latest version IIRC also has markers for overriding methods.
It wouldn't require much code to write a plug-in which would base on that and rewrite the code to contain the override tags where appropriate.
one option is to
Enable suggest-override compiler warning And then write a script
which can insert override keyword to location pointed by the emitted warnings