i can´t see my error here .. this rule parse some stuff ok but the last two samples not. Could somebody please give me a hint ..
Goal is a parser than can identify member property access and member function calls. Also chained in some way
a()
a(para)
x.a()
x.a(para)
x.a(para).g(para).j()
x.y
x.y.z
x.y.z() <---fail
y.z.z(para) <--- fail
lvalue =
iter_pos >> name[_val = _1]
>> *(lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_parameter, construct<std::vector<common_node> >(_1))]
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _3), key_this, construct<common_node>(_val), key_callname, construct<std::wstring>(_1), key_parameter, construct<std::vector<common_node> >(_2))]
>> *(lit('.') >> name_pure >> iter_pos)[_val = construct<common_node>(type_cmd_dot_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_propname, construct<std::wstring>(_1))]
;
thank you
Markus
You provide very little information to go at. Let me humor you with my entry into this guessing game:
Let's assume you want to parse a simple "language" that merely allows member expressions and function invocations, but chained.
Now, your grammar says nothing about the parameters (though it's clear the param list can be empty), so let me go the next mile and assume that you want to accept the same kind of expressions there (so foo(a) is okay, but also bar(foo(a)) or bar(b.foo(a))).
Since you accept chaining of function calls, it appears that functions are first-class objects (and functions can return functions), so foo(a)(b, c, d) should be accepted as well.
You didn't mention it, but parameters often include literals (sqrt(9) comes to mind, or println("hello world")).
Other items:
you didn't say but likely you want to ignore whitespace in certain spots
from the iter_pos (ab)use it seems you're interested in tracking the original source location inside the resulting AST.
1. Define An AST
We should keep it simple as ever:
namespace Ast {
using Identifier = boost::iterator_range<It>;
struct MemberExpression;
struct FunctionCall;
using Expression = boost::variant<
double, // some literal types
std::string,
// non-literals
Identifier,
boost::recursive_wrapper<MemberExpression>,
boost::recursive_wrapper<FunctionCall>
>;
struct MemberExpression {
Expression object; // antecedent
Identifier member; // function or field
};
using Parameter = Expression;
using Parameters = std::vector<Parameter>;
struct FunctionCall {
Expression function; // could be a member function
Parameters parameters;
};
}
NOTE We're not going to focus on showing source locations, but already made one provision, storing identifiers as an iterator-range.
NOTE Fusion-adapting the only types not directly supported by Spirit:
BOOST_FUSION_ADAPT_STRUCT(Ast::MemberExpression, object, member)
BOOST_FUSION_ADAPT_STRUCT(Ast::FunctionCall, function, parameters)
We will find that we don't use these, because Semantic Actions are more convenient here.
2. A Matching Grammar
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [expression];
identifier = raw [ (alpha|'_') >> *(alnum|'_') ];
parameters = -(expression % ',');
expression
= literal
| identifier >> *(
('.' >> identifier)
| ('(' >> parameters >> ')')
);
literal = double_ | string_;
string_ = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES(
(identifier)(start)(parameters)(expression)(literal)(string_)
);
}
In this skeleton most rules benefit from automatic attribute propagation. The one that doesn't is expression:
qi::rule<It, Expression()> start;
using Skipper = qi::space_type;
qi::rule<It, Expression(), Skipper> expression, literal;
qi::rule<It, Parameters(), Skipper> parameters;
// lexemes
qi::rule<It, Identifier()> identifier;
qi::rule<It, std::string()> string_;
So, let's create some helpers for the semantic actions.
NOTE An important take-away here is to create your own higher-level building blocks instead of toiling away with boost::phoenix::construct<> etc.
Define two simple construction functions:
struct mme_f { MemberExpression operator()(Expression lhs, Identifier rhs) const { return { lhs, rhs }; } };
struct mfc_f { FunctionCall operator()(Expression f, Parameters params) const { return { f, params }; } };
phx::function<mme_f> make_member_expression;
phx::function<mfc_f> make_function_call;
Then use them:
expression
= literal [_val=_1]
| identifier [_val=_1] >> *(
('.' >> identifier) [ _val = make_member_expression(_val, _1)]
| ('(' >> parameters >> ')') [ _val = make_function_call(_val, _1) ]
);
That's all. We're ready to roll!
3. DEMO
Live On Coliru
I created a test bed looking like this:
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> const g;
for (std::string const input : {
"a()", "a(para)", "x.a()", "x.a(para)", "x.a(para).g(para).j()", "x.y", "x.y.z",
"x.y.z()",
"y.z.z(para)",
// now let's add some funkyness that you didn't mention
"bar(foo(a))",
"bar(b.foo(a))",
"foo(a)(b, c, d)", // first class functions
"sqrt(9)",
"println(\"hello world\")",
"allocate(strlen(\"aaaaa\"))",
"3.14",
"object.rotate(180)",
"object.rotate(event.getAngle(), \"torque\")",
"app.mainwindow().find_child(\"InputBox\").font().size(12)",
"app.mainwindow().find_child(\"InputBox\").font(config().preferences.baseFont(style.PROPORTIONAL))"
}) {
std::cout << " =========== '" << input << "' ========================\n";
It f(input.begin()), l(input.end());
Ast::Expression parsed;
bool ok = parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << parsed << "\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Incredible as it may appear, this already parses all the test cases and prints:
=========== 'a()' ========================
Parsed: a()
=========== 'a(para)' ========================
Parsed: a(para)
=========== 'x.a()' ========================
Parsed: x.a()
=========== 'x.a(para)' ========================
Parsed: x.a(para)
=========== 'x.a(para).g(para).j()' ========================
Parsed: x.a(para).g(para).j()
=========== 'x.y' ========================
Parsed: x.y
=========== 'x.y.z' ========================
Parsed: x.y.z
=========== 'x.y.z()' ========================
Parsed: x.y.z()
=========== 'y.z.z(para)' ========================
Parsed: y.z.z(para)
=========== 'bar(foo(a))' ========================
Parsed: bar(foo(a))
=========== 'bar(b.foo(a))' ========================
Parsed: bar(b.foo(a))
=========== 'foo(a)(b, c, d)' ========================
Parsed: foo(a)(b, c, d)
=========== 'sqrt(9)' ========================
Parsed: sqrt(9)
=========== 'println("hello world")' ========================
Parsed: println(hello world)
=========== 'allocate(strlen("aaaaa"))' ========================
Parsed: allocate(strlen(aaaaa))
=========== '3.14' ========================
Parsed: 3.14
=========== 'object.rotate(180)' ========================
Parsed: object.rotate(180)
=========== 'object.rotate(event.getAngle(), "torque")' ========================
Parsed: object.rotate(event.getAngle(), torque)
=========== 'app.mainwindow().find_child("InputBox").font().size(12)' ========================
Parsed: app.mainwindow().find_child(InputBox).font().size(12)
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
4. Too Good To Be True?
You're right. I cheated. I didn't show you this code required to debug print the parsed AST:
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, MemberExpression const& me) {
return os << me.object << "." << me.member;
}
static inline std::ostream& operator<<(std::ostream& os, FunctionCall const& fc) {
os << fc.function << "(";
bool first = true;
for (auto& p : fc.parameters) { if (!first) os << ", "; first = false; os << p; }
return os << ")";
}
}
It's only debug printing, as string literals aren't correctly roundtripped. But it's only 10 lines of code, that's a bonus.
5. The Full Monty: Source Locations
This had your interest, so let's show it working. Let's add a simple loop to print all locations of identifiers:
using IOManip::showpos;
for (auto& id : all_identifiers(parsed)) {
std::cout << " - " << id << " at " << showpos(id, input) << "\n";
}
Of course, this begs the question, what are showpos and all_identifiers?
namespace IOManip {
struct showpos_t {
boost::iterator_range<It> fragment;
std::string const& source;
friend std::ostream& operator<<(std::ostream& os, showpos_t const& manip) {
auto ofs = [&](It it) { return it - manip.source.begin(); };
return os << "[" << ofs(manip.fragment.begin()) << ".." << ofs(manip.fragment.end()) << ")";
}
};
showpos_t showpos(boost::iterator_range<It> fragment, std::string const& source) {
return {fragment, source};
}
}
As for the identifier extraction:
std::vector<Identifier> all_identifiers(Expression const& expr) {
std::vector<Identifier> result;
struct Harvest {
using result_type = void;
std::back_insert_iterator<std::vector<Identifier> > out;
void operator()(Identifier const& id) { *out++ = id; }
void operator()(MemberExpression const& me) { apply_visitor(*this, me.object); *out++ = me.member; }
void operator()(FunctionCall const& fc) {
apply_visitor(*this, fc.function);
for (auto& p : fc.parameters) apply_visitor(*this, p);
}
// non-identifier expressions
void operator()(std::string const&) { }
void operator()(double) { }
} harvest { back_inserter(result) };
boost::apply_visitor(harvest, expr);
return result;
}
That's a tree visitor that harvests all identifiers recursively, inserting them into the back of a container.
Live On Coliru
Where output looks like (excerpt):
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
- app at [0..3)
- mainwindow at [4..14)
- find_child at [17..27)
- font at [40..44)
- config at [45..51)
- preferences at [54..65)
- baseFont at [66..74)
- style at [75..80)
- PROPORTIONAL at [81..93)
Try changing
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')'))
to
>> *(*(lit('.') >> name_pure) >> lit('(') > paralistopt > lit(')'))
Related
I have a quite simple grammar I try to implement using boost spirit x3, without success.
It does not compile, and due to all the templates and complex concepts used in the library (I know, it is rather a "header"), the compilation error message is way too long to be intelligible.
I tried to comment part of the code in order narrow down the culprit, without success as it comes down to several parts, for which I don't see any error anyway.
Edit2: the first error message is in indeed in push_front_impl.hpp highlighting that:
::REQUESTED_PUSH_FRONT_SPECIALISATION_FOR_SEQUENCE_DOES_NOT_EXIST::*
I suspect the keyword auto or maybe the p2 statement with ulong_long...but with no faith.
Need the help of you guys...spirit's elites !
Below a minimal code snippet reproducing the compilation error.
Edit: using boost 1.70 and visual studio 2019 v16.1.6
#include <string>
#include <iostream>
#include "boost/spirit/home/x3.hpp"
#include "boost/spirit/include/support_istream_iterator.hpp"
int main(void)
{
std::string input = \
"\"nodes\":{ {\"type\":\"bb\", \"id\" : 123456567, \"label\" : \"0x12023049\"}," \
"{\"type\":\"bb\", \"id\" : 123123123, \"label\" : \"0x01223234\"}," \
"{\"type\":\"ib\", \"id\" : 223092343, \"label\" : \"0x03020343\"}}";
std::istringstream iss(input);
namespace x3 = boost::spirit::x3;
using x3::char_;
using x3::ulong_long;
using x3::lit;
auto q = lit('\"'); /* q => quote */
auto p1 = q >> lit("type") >> q >> lit(':') >> q >> (lit("bb") | lit("ib")) >> q;
auto p2 = q >> lit("id") >> q >> lit(':') >> ulong_long;
auto p3 = q >> lit("label") >> q >> lit(':') >> q >> (+x3::alpha) >> q;
auto node = lit('{') >> p1 >> lit(',') >> p2 >> lit(',') >> p3 >> lit('}');
auto nodes = q >> lit("nodes") >> q >> lit(':') >> lit('{') >> node % lit(',') >> lit('}');
boost::spirit::istream_iterator f(iss >> std::noskipws), l{};
bool b = x3::phrase_parse(f, l, nodes, x3::space);
return 0;
}
It is an known MPL limitation (Issue with X3 and MS VS2017, https://github.com/boostorg/spirit/issues/515) + bug/difference of implementation for MSVC/ICC compilers (https://github.com/boostorg/mpl/issues/43).
I rewrote an offending part without using MPL (https://github.com/boostorg/spirit/pull/607), it will be released in Boost 1.74, until then you should be able to workaround with:
#define BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
#define BOOST_MPL_LIMIT_VECTOR_SIZE 50
Alternatively you could wrap different parts of your grammar into rules, what will reduce sequence parser chain.
Note that q >> lit("x") >> q >> lit(':') >> ... probably is not what you really want, it (with a skipper) will allow " x ": to be parsed. If you do not want that use simply lit("\"x\"") >> lit(':') >> ...
There's a chance that there might be a missing indirect include for your specific platform/version (if I had to guess it might be caused by using the istream iterator support header from Qi).
If that's not the issue, my attention is drawn by the where T = boost::mpl::aux::vector_tag<20> (/HT #Rup - number 20 seems suspiciously like it might be some kind of limit.
Either we can find what trips the limit and see if we can raise it, but I'll do the "unscientific" approach in the interest of helping you along with the parser.
Simplifying The Expressions
I see a lot (lot) of lit() nodes in your parser expressions that you don't need. I suspect all the quoted constructs need to be lexemes, and instead of painstakingly repeating the quote symbol all over the place, perhaps package it as follows:
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
Notes:
I improved the naming so it's more natural to read:
auto node = '{' >> type >> ',' >> id >> ',' >> label >> '}';
I changed alpha to alnum so it would actually match your sample input
Hypothesis: The expressions are structurally simplified to be more hierarchical - the sequences consist of fewer >>-ed terms - the hope is that this removes a potential mpl::vector size limit.
There's one missing piece, bb_ib that I left out because it changes when you want to actually assign parsed values to attributes. Let's do that:
Attributes
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
As you can see I opted for an enum to represent type. The most natural way to parse that would be using symbols<>
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
Now you can parse into a vector of Node:
Demo
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
namespace { // debug output
inline std::ostream& operator<<(std::ostream& os, Node::Type t) {
switch (t) {
case Node::bb: return os << "bb";
case Node::ib: return os << "ib";
}
return os << "?";
}
inline std::ostream& operator<<(std::ostream& os, Node const& n) {
return os << "Node{" << n.type << ", " << n.id << ", " << std::quoted(n.label) << "}";
}
}
// attribute propagation
BOOST_FUSION_ADAPT_STRUCT(Node, type, id, label)
int main() {
std::string input = R"("nodes": {
{
"type": "bb",
"id": 123456567,
"label": "0x12023049"
},
{
"type": "bb",
"id": 123123123,
"label": "0x01223234"
},
{
"type": "ib",
"id": 223092343,
"label": "0x03020343"
}
})";
namespace x3 = boost::spirit::x3;
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
auto node
= x3::rule<Node, Node> {"node"}
= '{' >> type >> ',' >> id >> ',' >> label >> '}';
auto nodes = q("nodes") >> ':' >> '{' >> node % ',' >> '}';
std::vector<Node> parsed;
auto f = begin(input);
auto l = end(input);
if (x3::phrase_parse(f, l, nodes, x3::space, parsed)) {
for (Node& node : parsed) {
std::cout << node << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
Prints
Node{bb, 123456567, "0x12023049"}
Node{bb, 123123123, "0x01223234"}
Node{ib, 223092343, "0x03020343"}
This small sample grammar just parse this statements
a <--special (but ok because rule in grammer)
a()
a.b <--special
a.b()
a.b().c <--special
a().b.c()
a().b <--special
all cases with non () at the end are special and should be separate rules in spirit. Only the rule (special case 1) is correct so far. How to define a rule which capture all other cases without () at the end ?
lvalue_statement =
(
name >> +(
(lit('(') >> paralistopt >> lit(')')[_normal_action_call]
| (lit('.') >> name) [_normal_action_dot]
)
| name [_special_action_] // special case 1
)
another sample to explain what "special" means, you can see that the ROOT node should have the special AST Node or action
a.b -> SPECIAL_DOT(a,b)
a.b.c -> SPECIAL_DOT(a,NORMAL_DOT(b,c))
a(para).b.c -> SEPCIAL_DOT(NORMAL_DOT(CALL(a,para),c)
I'm quite averse of so many semantic actions¹.
I also think that's not your problem.
In language terms, you'd expect a.b to be member dereference, a() to be invocation, and hence a.b() would be invoation of a.b after the member dereference.
In that sense, a.b is the normal case, because it doesn't do invocation. a.b() would be "more special" in the sense that it is the same PLUS invocation.
I'd phrase my expression grammar to reflect this:
lvalue = name >> *(
'.' >> name
| '(' >> paralistopt >> ')'
);
This parses everything. Now you might go with semantic actions or attribute propagation
Semantic Actions #1
auto lvalue = name [ action("normal") ] >> *(
'.' >> name [ action("member_access") ]
| ('(' >> paralistopt >> ')') [ action("call") ]
);
There you go. Let's come up with a generic action that logs stuff:
auto action = [](auto type) {
return [=](auto& ctx){
auto& attr = _attr(ctx);
using A = std::decay_t<decltype(attr)>;
std::cout << type << ":";
if constexpr(boost::fusion::traits::is_sequence<A>::value) {
std::cout << boost::fusion::as_vector(attr);
} else if constexpr(x3::traits::is_container<A>::value && not std::is_same_v<std::string, A>) {
std::string_view sep;
std::cout << "{";
for (auto& el : attr) { std::cout << sep << el; sep = ", "; }
std::cout << "}";
} else {
std::cout << attr;
}
std::cout << "\n";
};
};
Now we can parse all the samples (plus a few more):
Live On Coliru prints:
=== "a"
normal:a
Ok
=== "a()"
normal:a
call:{}
Ok
=== "a.b"
normal:a
member_access:b
Ok
=== "a.b()"
normal:a
member_access:b
call:{}
Ok
=== "a.b().c"
normal:a
member_access:b
call:{}
member_access:c
Ok
=== "a().b.c()"
normal:a
call:{}
member_access:b
member_access:c
call:{}
Ok
=== "a().b.c()"
normal:a
call:{}
member_access:b
member_access:c
call:{}
Ok
=== "a(q,r,s).b"
normal:a
call:{q, r, s}
member_access:b
Ok
SA #2: Building an AST
Let's model the AST:
namespace Ast {
using name = std::string;
using params = std::vector<name>;
struct member_access;
struct call;
using lvalue = boost::variant<
name,
boost::recursive_wrapper<member_access>,
boost::recursive_wrapper<call>
>;
using params = std::vector<name>;
struct member_access { lvalue obj; name member; } ;
struct call { lvalue f; params args; } ;
}
Now we can replace the actions:
auto lvalue
= rule<struct lvalue_, Ast::lvalue> {"lvalue"}
= name [ ([](auto& ctx){ _val(ctx) = _attr(ctx); }) ] >> *(
'.' >> name [ ([](auto& ctx){ _val(ctx) = Ast::member_access{ _val(ctx), _attr(ctx) }; }) ]
| ('(' >> paralistopt >> ')') [ ([](auto& ctx){ _val(ctx) = Ast::call{ _val(ctx), _attr(ctx) }; }) ]
);
That's ugly - I don't recommend writing your code this way, but at least it drives home how few steps are involved.
Also adding some output operators:
namespace Ast { // debug output
static inline std::ostream& operator<<(std::ostream& os, Ast::member_access const& ma) {
return os << ma.obj << "." << ma.member;
}
static inline std::ostream& operator<<(std::ostream& os, Ast::call const& c) {
std::string_view sep;
os << c.f << "(";
for (auto& arg: c.args) { os << sep << arg; sep = ", "; }
return os << ")";
}
}
Now can parse everything with full AST: Live On Coliru, printing:
"a" -> a
"a()" -> a()
"a.b" -> a.b
"a.b()" -> a.b()
"a.b().c" -> a.b().c
"a().b.c()" -> a().b.c()
"a().b" -> a().b
"a(q,r,s).b" -> a(q, r, s).b
Automatic Propagation
Actually I sort of got stranded doing this. It took me too long to get it right and parse the associativity in a useful way, so I stopped trying. Let's instead summarize by cleaning up out second SA take:
Summary
Making the actions more readable:
auto passthrough =
[](auto& ctx) { _val(ctx) = _attr(ctx); };
template <typename T> auto binary_ =
[](auto& ctx) { _val(ctx) = T { _val(ctx), _attr(ctx) }; };
auto lvalue
= rule<struct lvalue_, Ast::lvalue> {"lvalue"}
= name [ passthrough ] >> *(
'.' >> name [ binary_<Ast::member_access> ]
| ('(' >> paralistopt >> ')') [ binary_<Ast::call> ]
);
Now there are a number of issues left:
You might want a more general expression grammar that doesn't just parse lvalue expressions (e.g. f(foo, 42) should probably parse, as should len("foo") + 17?).
To that end, the lvalue/rvalue distinction doesn't belong in the grammar: it's a semantic distinction mostly.
I happen to have created an extended parser that does all that + evaluation against proper LValues (while supporting general values). I'd suggest looking at the [extended chat][3] at this answer and the resulting code on github: https://github.com/sehe/qi-extended-parser-evaluator .
Full Listing
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
namespace Ast {
using name = std::string;
using params = std::vector<name>;
struct member_access;
struct call;
using lvalue = boost::variant<
name,
boost::recursive_wrapper<member_access>,
boost::recursive_wrapper<call>
>;
using params = std::vector<name>;
struct member_access { lvalue obj; name member; } ;
struct call { lvalue f; params args; } ;
}
namespace Ast { // debug output
static inline std::ostream& operator<<(std::ostream& os, Ast::member_access const& ma) {
return os << ma.obj << "." << ma.member;
}
static inline std::ostream& operator<<(std::ostream& os, Ast::call const& c) {
std::string_view sep;
os << c.f << "(";
for (auto& arg: c.args) { os << sep << arg; sep = ", "; }
return os << ")";
}
}
namespace Parser {
using namespace x3;
auto name
= rule<struct string_, Ast::name> {"name"}
= lexeme[alpha >> *(alnum|char_("_"))];
auto paralistopt
= rule<struct params_, Ast::params> {"params"}
= -(name % ',');
auto passthrough =
[](auto& ctx) { _val(ctx) = _attr(ctx); };
template <typename T> auto binary_ =
[](auto& ctx) { _val(ctx) = T { _val(ctx), _attr(ctx) }; };
auto lvalue
= rule<struct lvalue_, Ast::lvalue> {"lvalue"}
= name [ passthrough ] >> *(
'.' >> name [ binary_<Ast::member_access> ]
| ('(' >> paralistopt >> ')') [ binary_<Ast::call> ]
);
auto start = skip(space) [ lvalue ];
}
int main() {
for (std::string const input: {
"a", // special (but ok because rule in grammer)
"a()",
"a.b", // special
"a.b()",
"a.b().c", // special
"a().b.c()",
"a().b", // special
"a(q,r,s).b",
})
{
std::cout << std::quoted(input) << " -> ";
auto f = begin(input), l = end(input);
Ast::lvalue parsed;
if (parse(f, l, Parser::start, parsed)) {
std::cout << parsed << "\n";;
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << " -- Remainig: " << std::quoted(std::string(f,l)) << "\n";
}
}
}
Prints
"a" -> a
"a()" -> a()
"a.b" -> a.b
"a.b()" -> a.b()
"a.b().c" -> a.b().c
"a().b.c()" -> a().b.c()
"a().b" -> a().b
"a(q,r,s).b" -> a(q, r, s).b
¹ (they lead to a mess in the presence of backtracking, see Boost Spirit: "Semantic actions are evil"?)
I am trying to model a parser for a subset of the C language, for a school project. However, I seem stuck in the process of generating recursive parsing rules for Boost.Spirit, as my rules either overflow the stack or simply do not pick up anything.
For example, I want to model the following syntax:
a ::= ... | A[a] | a1 op a2 | ...
There are some other subsets of syntax for this expression rule, but those are working without problems. For example, if I were to parse A[3*4], it should be read as a recursive parsing where A[...] (A[a] in the syntax) is the array accessor and 3*4 (a1 op a2 in the syntax) is the index.
I've tried defining the following rule objects in the grammar struct:
qi::rule<Iterator, Type(), Skipper> expr_arr;
qi::rule<Iterator, Type(), Skipper> expr_binary_arith;
qi::rule<Iterator, Type(), Skipper> expr_a;
And giving them the following grammar:
expr_arr %= qi::lexeme[identifier >> qi::omit['[']] >> expr_a >> qi::lexeme[qi::omit[']']];
expr_binary_arith %= expr_a >> op_binary_arith >> expr_a;
expr_a %= (expr_binary_arith | expr_arr);
where "op_binary_arith" is a qi::symbol<> object with the allowed operator symbols.
This compiles fine, but upon execution enters a supposedly endless loop, and the stack overflows. I've tried looking at the answer by Sehe in the following question: How to set max recursion in boost spirit.
However, I have been unsuccessful in setting a max recursion depth. Firstly, I failed to make it compile without errors for almost any of my attempts, but on the last attempt it built successfully, albeit with very unexpected results.
Can someone guide me in the right direction, as to how I should go about implementing this grammar correctly?
PEG grammars do not handle left-recursion well. In general you have to split out helper rules to write without left-recursion.
In your particular case, the goal production
a ::= ... | A[a] | a1 op a2 | ...
Seems a little off. This would allows foo[bar] or foo + bar but not foo + bar[qux].
Usually, the choice between array element reference or plain identifier is at a lower level of precedence (often "simple expression").
Here's a tiny elaboration:
literal = number_literal | string_literal; // TODO exapnd?
expr_arr = identifier >> '[' >> (expr_a % ',') >> ']';
simple_expression = literal | expr_arr | identifier;
expr_binary_arith = simple_expression >> op_binary_arith >> expr_a;
expr_a = expr_binary_arith | simple_expression;
Now you can parse e.g.:
for (std::string const& input : {
"A[3*4]",
"A[F[3]]",
"A[8 + F[0x31]]",
"3 * \"foo\"",
})
{
std::cout << std::quoted(input) << " -> ";
It f=begin(input), l=end(input);
AST::Expr e;
if (parse(f,l,g,e)) {
std::cout << "Parsed: " << e << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << "Remaining: " << std::quoted(std::string(f,l)) << "\b";
}
}
Which prints Live On Coliru
"A[3*4]" -> Parsed: A[3*4]
"A[F[3]]" -> Parsed: A[F[3]]
"A[8 + F[0x31]]" -> Parsed: A[8+F[49]]
"3 * \"foo\"" -> Parsed: 3*"foo"
NOTE I deliberately left efficiency and operator precedence out of the picture for now.
These are talked about in detail in other answers:
Boost::Spirit Expression Parser
Implementing operator precedence with boost spirit
Boost::Spirit : Optimizing an expression parser
And many more
Full Demo Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <experimental/iterator>
namespace qi = boost::spirit::qi;
namespace AST {
using Var = std::string;
struct String : std::string {
using std::string::string;
};
using Literal = boost::variant<String, intmax_t, double>;
enum class ArithOp {
addition, subtraction, division, multplication
};
struct IndexExpr;
struct BinOpExpr;
using Expr = boost::variant<
Literal,
Var,
boost::recursive_wrapper<IndexExpr>,
boost::recursive_wrapper<BinOpExpr>
>;
struct IndexExpr {
Expr expr;
std::vector<Expr> indices;
};
struct BinOpExpr {
Expr lhs, rhs;
ArithOp op;
};
std::ostream& operator<<(std::ostream& os, Literal const& lit) {
struct {
std::ostream& os;
void operator()(String const& s) const { os << std::quoted(s); }
void operator()(double d) const { os << d; }
void operator()(intmax_t i) const { os << i; }
} vis {os};
boost::apply_visitor(vis, lit);
return os;
}
std::ostream& operator<<(std::ostream& os, ArithOp const& op) {
switch(op) {
case ArithOp::addition: return os << '+';
case ArithOp::subtraction: return os << '-';
case ArithOp::division: return os << '/';
case ArithOp::multplication: return os << '*';
}
return os << '?';
}
std::ostream& operator<<(std::ostream& os, BinOpExpr const& e) {
return os << e.lhs << e.op << e.rhs;
}
std::ostream& operator<<(std::ostream& os, IndexExpr const& e) {
std::copy(
begin(e.indices),
end(e.indices),
std::experimental::make_ostream_joiner(os << e.expr << '[', ","));
return os << ']';
}
}
BOOST_FUSION_ADAPT_STRUCT(AST::IndexExpr, expr, indices)
BOOST_FUSION_ADAPT_STRUCT(AST::BinOpExpr, lhs, op, rhs)
template <typename Iterator, typename Skipper = qi::space_type>
struct G : qi::grammar<Iterator, AST::Expr()> {
G() : G::base_type(start) {
using namespace qi;
identifier = alpha >> *alnum;
number_literal =
qi::real_parser<double, qi::strict_real_policies<double> >{}
| "0x" >> qi::uint_parser<intmax_t, 16> {}
| qi::int_parser<intmax_t, 10> {}
;
string_literal = '"' >> *('\\' >> char_escape | ~char_('"')) >> '"';
literal = number_literal | string_literal; // TODO exapnd?
expr_arr = identifier >> '[' >> (expr_a % ',') >> ']';
simple_expression = literal | expr_arr | identifier;
expr_binary_arith = simple_expression >> op_binary_arith >> expr_a;
expr_a = expr_binary_arith | simple_expression;
start = skip(space) [expr_a];
BOOST_SPIRIT_DEBUG_NODES(
(start)
(expr_a)(expr_binary_arith)(simple_expression)(expr_a)
(literal)(number_literal)(string_literal)
(identifier))
}
private:
struct escape_sym : qi::symbols<char, char> {
escape_sym() {
this->add
("b", '\b')
("f", '\f')
("r", '\r')
("n", '\n')
("t", '\t')
("\\", '\\')
;
}
} char_escape;
struct op_binary_arith_sym : qi::symbols<char, AST::ArithOp> {
op_binary_arith_sym() {
this->add
("+", AST::ArithOp::addition)
("-", AST::ArithOp::subtraction)
("/", AST::ArithOp::division)
("*", AST::ArithOp::multplication)
;
}
} op_binary_arith;
qi::rule<Iterator, AST::Expr()> start;
qi::rule<Iterator, AST::IndexExpr(), Skipper> expr_arr;
qi::rule<Iterator, AST::BinOpExpr(), Skipper> expr_binary_arith;
qi::rule<Iterator, AST::Expr(), Skipper> simple_expression, expr_a;
// implicit lexemes
qi::rule<Iterator, AST::Literal()> literal, string_literal, number_literal;
qi::rule<Iterator, AST::Var()> identifier;
};
int main() {
using It = std::string::const_iterator;
G<It> const g;
for (std::string const& input : {
"A[3*4]",
"A[F[3]]",
"A[8 + F[0x31]]",
"3 * \"foo\"",
})
{
std::cout << std::quoted(input) << " -> ";
It f=begin(input), l=end(input);
AST::Expr e;
if (parse(f,l,g,e)) {
std::cout << "Parsed: " << e << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << "Remaining: " << std::quoted(std::string(f,l)) << "\b";
}
}
}
The following Spirit x3 grammar for a simple robot command language generates compiler errors in Windows Visual Studio 17. For this project, I am required to compile with the warning level to 4 (/W4) and treat warnings as errors (/WX).
Warning C4127 conditional expression is
constant SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\char\detail\cast_char.hpp 29
Error C2039 'insert': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259 Error C2039 'end': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259 Error C2039 'empty': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 254 Error C2039 'begin': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259
Clearly, something is wrong with my grammar, but the error messages are completely unhelpful. I have found that if I remove the Kleene star in the last line of the grammar (*parameter to just parameter) the errors disappear, but then I get lots of warnings like this:
Warning C4459 declaration of 'digit' hides global
declaration SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\support\numeric_utils\detail\extract_int.hpp 174
Warning C4127 conditional expression is constant SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\char\detail\cast_char.hpp 29
#include <string>
#include <iostream>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
//
// Grammar for simple command language
//
namespace scl
{
using boost::spirit::x3::char_;
using boost::spirit::x3::double_;
using boost::spirit::x3::int_;
using boost::spirit::x3::lexeme;
using boost::spirit::x3::lit;
using boost::spirit::x3::no_case;
auto valid_identifier_chars = char_ ("a-zA-Z_");
auto quoted_string = '"' >> *(lexeme [~char_ ('"')]) >> '"';
auto keyword_value_chars = char_ ("a-zA-Z0-9$_.");
auto qual = lexeme [!(no_case [lit ("no")]) >> +valid_identifier_chars] >> -('=' >> (quoted_string | int_ | double_ | +keyword_value_chars));
auto neg_qual = lexeme [no_case [lit ("no")] >> +valid_identifier_chars];
auto qualifier = lexeme ['/' >> (qual | neg_qual)];
auto verb = +valid_identifier_chars >> *qualifier;
auto parameter = +keyword_value_chars >> *qualifier;
auto command = verb >> *parameter;
}; // End namespace scl
using namespace std; // Must be after Boost stuff!
int
main ()
{
vector <string> input =
{
"show/out=\"somefile.txt\" motors/all cameras/full",
"start/speed=5 motors arm1 arm2/speed=2.5/track arm3",
"rotate camera1/notrack/axis=y/angle=45"
};
//
// Parse each of the strings in the input vector
//
for (string str : input)
{
auto b = str.begin ();
auto e = str.end ();
cout << "Parsing: " << str << endl;
x3::phrase_parse (b, e, scl::command, x3::space);
if (b != e)
{
cout << "Error, only parsed to position: " << b - str.begin () << endl;
}
} // End for
return 0;
} // End main
There is a regression since Boost 1.65 that causes problems with some rules that potentially propagate into container type attributes.
They dispatch to the wrong overload when instantiated without an actual bound attribute. When this happens there is a "mock" attribute type called unused_type. The errors you are seeing indicate that unused_type is being treated as if it were a concrete attribute type, and clearly that won't fly.
The regression was fixed in https://github.com/boostorg/spirit/commit/ee4943d5891bdae0706fb616b908e3bf528e0dfa
You can see that it's a regression by compiling with Boost 1.64:
Boost 1.64 compiles it fine GCC
and Clang
Boost 1.65 breaks it GCC and Clang again
Now, latest develop is supposed to fix it, but you can simply copy the patched file, even just the 7-line patch.
All of the above was already available when I linked the duplicate question How to make a recursive rule in boost spirit x3 in VS2017, which highlights the same regression
Review
using namespace std; // Must be after Boost stuff!
Actually, it probably needs to be nowhere unless very locally scoped, where you can see the impact of any potential name colisions.
Consider encapsulating the skipper, since it's likely logically part of your grammar spec, not something to be overridden by the caller.
This is a bug:
auto quoted_string = '"' >> *(lexeme[~char_('"')]) >> '"';
You probably meant to assert the whole literal is lexeme, not individual characters (that's... moot because whitespace would never hit the parser anyways, because of the skipper).
auto quoted_string = lexeme['"' >> *~char_('"') >> '"'];
Likewise, you might have intended +keyword_value_chars to be lexeme, because right now one=two three four would parse the "qualifier" one with a "keyword value" of onethreefour, not one three four¹
x3::space skips embedded newlines, if that's not the intent, use x3::blank
Since PEG grammars are parsed left-to-right greedy, you can order the qualifier production and do without the !(no_case["no"]) lookahead assertion. That not only removes duplication but also makes the grammar simpler and more efficient:
auto qual = lexeme[+valid_identifier_chars] >>
-('=' >> (quoted_string | int_ | double_ | +keyword_value_chars)); // TODO lexeme
auto neg_qual = lexeme[no_case["no"] >> +valid_identifier_chars];
auto qualifier = lexeme['/' >> (neg_qual | qual)];
¹ Note (Post-Scriptum) now that we notice qualifier is, itself, already a lexeme, there's no need to lexeme[] things inside (unless, of course they're reused in contexts with skippers).
However, this also gives rise to the question whether whitespace around the = operator should be accepted (currently, it is not), or whether qualifiers can be separated with whitespace (like id /a /b; currently they can).
Perhaps verb needed some lexemes[] as well (unless you really did want to parse "one two three" as a verb)
If no prefix for negative qualifiers, then maybe the identifier itself is, too? This could simplify the grammar
The ordering of int_ and double_ makes it so that most doubles are mis-parsed as int before they could ever be recognized. Consider something more explicit like x3::strict_real_policies<double>>{} | int_
If you're parsing quoted constructs, perhaps you want to recognize escapes too ('\"' and '\\' for example):
auto quoted_string = lexeme['"' >> *('\\' >> char_ | ~char_('"')) >> '"'];
If you have a need for "keyword values" consider listing known values in x3::symbols<>. This can also be used to parse directly into an enum type.
Here's a version that parses into AST types and prints it back for demonstration purposes:
Live On Coliru
#include <boost/config/warning_disable.hpp>
#include <string>
#include <vector>
#include <boost/variant.hpp>
namespace Ast {
struct Keyword : std::string { // needs to be strong-typed to distinguish from quoted values
using std::string::string;
using std::string::operator=;
};
struct Nil {};
using Value = boost::variant<Nil, std::string, int, double, Keyword>;
struct Qualifier {
enum Kind { positive, negative } kind;
std::string identifier;
Value value;
};
struct Param {
Keyword keyword;
std::vector<Qualifier> qualifiers;
};
struct Command {
std::string verb;
std::vector<Qualifier> qualifiers;
std::vector<Param> params;
};
}
#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(Ast::Qualifier, kind, identifier, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::Param, keyword, qualifiers)
BOOST_FUSION_ADAPT_STRUCT(Ast::Command, verb, qualifiers, params)
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace scl {
//
// Grammar for simple command language
//
using x3::char_;
using x3::int_;
using x3::lexeme;
using x3::no_case;
// lexeme tokens
auto keyword = x3::rule<struct _keyword, Ast::Keyword> { "keyword" }
= lexeme [ +char_("a-zA-Z0-9$_.") ];
auto identifier = lexeme [ +char_("a-zA-Z_") ];
auto quoted_string = lexeme['"' >> *('\\' >> x3::char_ | ~x3::char_('"')) >> '"'];
auto value
= quoted_string
| x3::real_parser<double, x3::strict_real_policies<double>>{}
| x3::int_
| keyword;
auto qual
= x3::attr(Ast::Qualifier::positive) >> identifier >> -('=' >> value);
auto neg_qual
= x3::attr(Ast::Qualifier::negative) >> lexeme[no_case["no"] >> identifier] >> x3::attr(Ast::Nil{}); // never a value
auto qualifier
= lexeme['/' >> (neg_qual | qual)];
auto verb
= identifier;
auto parameter = x3::rule<struct _parameter, Ast::Param> {"parameter"}
= keyword >> *qualifier;
auto command = x3::rule<struct _command, Ast::Command> {"command"}
= x3::skip(x3::space) [ verb >> *qualifier >> *parameter ];
} // End namespace scl
// For Demo, Debug: printing the Ast types back
#include <iostream>
#include <iomanip>
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
struct {
std::ostream& _os;
void operator()(std::string const& s) const { _os << std::quoted(s); }
void operator()(int i) const { _os << i; }
void operator()(double d) const { _os << d; }
void operator()(Keyword const& kwv) const { _os << kwv; }
void operator()(Nil) const { }
} vis{os};
boost::apply_visitor(vis, v);
return os;
}
static inline std::ostream& operator<<(std::ostream& os, Qualifier const& q) {
os << "/" << (q.kind==Qualifier::negative?"no":"") << q.identifier;
if (q.value.which())
os << "=" << q.value;
return os;
}
static inline std::ostream& operator<<(std::ostream& os, std::vector<Qualifier> const& qualifiers) {
for (auto& qualifier : qualifiers)
os << qualifier;
return os;
}
static inline std::ostream& operator<<(std::ostream& os, Param const& p) {
return os << p.keyword << p.qualifiers;
}
static inline std::ostream& operator<<(std::ostream& os, Command const& cmd) {
os << cmd.verb << cmd.qualifiers;
for (auto& param : cmd.params) os << " " << param;
return os;
}
}
int main() {
for (std::string const str : {
"show/out=\"somefile.txt\" motors/all cameras/full",
"start/speed=5 motors arm1 arm2/speed=2.5/track arm3",
"rotate camera1/notrack/axis=y/angle=45",
})
{
auto b = str.begin(), e = str.end();
Ast::Command cmd;
bool ok = parse(b, e, scl::command, cmd);
std::cout << (ok?"OK":"FAIL") << '\t' << std::quoted(str) << '\n';
if (ok) {
std::cout << " -- Full AST: " << cmd << "\n";
std::cout << " -- Verb+Qualifiers: " << cmd.verb << cmd.qualifiers << "\n";
for (auto& param : cmd.params)
std::cout << " -- Param+Qualifiers: " << param << "\n";
}
if (b != e) {
std::cout << " -- Remaining unparsed: " << std::quoted(std::string(b,e)) << "\n";
}
}
}
Prints
OK "show/out=\"somefile.txt\" motors/all cameras/full"
-- Full AST: show/out="somefile.txt" motors/all cameras/full
-- Verb+Qualifiers: show/out="somefile.txt"
-- Param+Qualifiers: motors/all
-- Param+Qualifiers: cameras/full
OK "start/speed=5 motors arm1 arm2/speed=2.5/track arm3"
-- Full AST: start/speed=5 motors arm1 arm2/speed=2.5/track arm3
-- Verb+Qualifiers: start/speed=5
-- Param+Qualifiers: motors
-- Param+Qualifiers: arm1
-- Param+Qualifiers: arm2/speed=2.5/track
-- Param+Qualifiers: arm3
OK "rotate camera1/notrack/axis=y/angle=45"
-- Full AST: rotate camera1/notrack/axis=y/angle=45
-- Verb+Qualifiers: rotate
-- Param+Qualifiers: camera1/notrack/axis=y/angle=45
For completeness
Demo also Live On MSVC (Rextester) - note that RexTester uses Boost 1.60
Coliru uses Boost 1.66 but the problem doesn't manifest itself because now, there are concrete attribute values bound to parsers
I'm trying to write a parser for the following BNF rules using boost spirit
(Boost v1.64)
The rules are:
<numeric-literal>::= integer
<type-name> ::= "in" | "out" | "in_out"
<array-type-spec> ::= <type-spec> "[" [<numeric-literal>] "]"
<tuple-type-spec> ::= "(" <type-spec> ("," <type-spec>)+ ")"
<type-spec> ::= <type-name> | <array-type-spec> | <tuple-type-spec>
Below is my attempt, using boost::make_recursive_variant
It seems to work ok on the string in
But it fails on in[2].
Where is my mistake?
What would be an elegant solution?
namespace Ast {
enum class TypeName { IN, OUT, INOUT};
using NumericLiteral = int;
using TypeSpec = boost::make_recursive_variant
<
TypeName,
std::pair<boost::recursive_variant_, NumericLiteral>,
std::vector < boost::recursive_variant_ >
>::type;
}
//grammar:
namespace myGrammar {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const*,typename Signature = Ast::TypeSpec()>
struct myRules : qi::grammar < Iterator, Signature> {
myRules() : myRules::base_type(start) {
fillSymbols();
rNumericLiteral = qi::int_;
rTypeName = sTypeName;
rTypeSpec = rTypeName | (rTypeSpec >> '[' >> rNumericLiteral >> ']') | ('(' >> qi::repeat(2, qi::inf)[(rTypeSpec % ',')] >> ')');
start = qi::skip(qi::space)[rTypeSpec];
}
private:
using Skipper = qi::space_type;
qi::rule<Iterator, Ast::TypeSpec()> start;
qi::rule<Iterator, Ast::NumericLiteral(), Skipper> rNumericLiteral;
qi::rule<Iterator, Ast::TypeName(), Skipper> rTypeName;
qi::rule<Iterator, Ast::TypeSpec(), Skipper> rTypeSpec;
//symbols
qi::symbols<char, Ast::TypeName>sTypeName;
void fillSymbols()
{
using namespace Ast;
sTypeName.add
("in", TypeName::IN)
("out", TypeName::OUT)
("in_out", TypeName::INOUT)
}
};
}
There's a problem translating this grammar 1:1 to a PEG grammar since left-recursion leads to infinite recursion.
You can still trivially rearrange the rules so left-recursion doesn't occur, but you will have more trouble synthesizing the AST you want.
Here's a halfway station that has half-decent test results:
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
/*
<numeric-literal> ::= integer
<type-name> ::= "in" | "out" | "in_out"
<array-type-spec> ::= <type-spec> "[" [<numeric-literal>] "]"
<tuple-type-spec> ::= "(" <type-spec> ("," <type-spec>)+ ")"
<type-spec> ::= <type-name> | <array-type-spec> | <tuple-type-spec>
*/
namespace Ast {
enum class TypeName { IN, OUT, INOUT };
static inline std::ostream& operator<<(std::ostream& os, TypeName tn) {
switch(tn) {
case TypeName::IN: return os << "IN";
case TypeName::OUT: return os << "OUT";
case TypeName::INOUT: return os << "INOUT";
}
return os << "?";
}
using NumericLiteral = int;
using TypeSpec = boost::make_recursive_variant<
TypeName,
std::pair<boost::recursive_variant_, NumericLiteral>,
std::vector<boost::recursive_variant_>
>::type;
using ArraySpec = std::pair<TypeSpec, NumericLiteral>;
using TupleSpec = std::vector<TypeSpec>;
}
// grammar:
namespace myGrammar {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const *, typename Signature = Ast::TypeSpec()>
struct myRules : qi::grammar<Iterator, Signature> {
myRules() : myRules::base_type(start) {
rNumericLiteral = qi::int_;
rTypeName = sTypeName >> !qi::alpha;
rTupleSpec = '(' >> rTypeSpec >> +(',' >> rTypeSpec) >> ')';
rScalarSpec = rTypeName | rTupleSpec;
rArraySpec = rScalarSpec >> '[' >> rNumericLiteral >> ']';
rTypeSpec = rArraySpec | rScalarSpec;
start = qi::skip(qi::space)[rTypeSpec >> qi::eoi];
BOOST_SPIRIT_DEBUG_NODES((start)(rTypeSpec)(rTypeName)(rArraySpec)(rScalarSpec)(rTypeSpec)(rNumericLiteral))
}
private:
using Skipper = qi::space_type;
qi::rule<Iterator, Ast::TypeSpec()> start;
qi::rule<Iterator, Ast::NumericLiteral(), Skipper> rNumericLiteral;
qi::rule<Iterator, Ast::ArraySpec(), Skipper> rArraySpec;
qi::rule<Iterator, Ast::TypeSpec(), Skipper> rTypeSpec, rScalarSpec;
qi::rule<Iterator, Ast::TupleSpec(), Skipper> rTupleSpec;
// implicit lexeme
qi::rule<Iterator, Ast::TypeName()> rTypeName;
// symbols
struct TypeName_r : qi::symbols<char, Ast::TypeName> {
TypeName_r() {
using Ast::TypeName;
add ("in", TypeName::IN)
("out", TypeName::OUT)
("in_out", TypeName::INOUT);
}
} sTypeName;
};
}
static inline std::ostream& operator<<(std::ostream& os, Ast::TypeSpec tn) {
struct {
std::ostream& _os;
void operator()(Ast::TypeSpec const& ts) const {
apply_visitor(*this, ts);
}
void operator()(Ast::TypeName tn) const { std::cout << tn; }
void operator()(Ast::TupleSpec const& tss) const {
std::cout << "(";
for (auto const& ts: tss) {
(*this)(ts);
std::cout << ", ";
}
std::cout << ")";
}
void operator()(Ast::ArraySpec const& as) const {
(*this)(as.first);
std::cout << '[' << as.second << ']';
}
} const dumper{os};
dumper(tn);
return os;
}
int main() {
using It = std::string::const_iterator;
myGrammar::myRules<It> const parser;
std::string const test_ok[] = {
"in",
"out",
"in_out",
"(in, out)",
"(out, in)",
"(in, in, in, out, in_out)",
"in[13]",
"in[0]",
"in[-2]",
"in[1][2][3]",
"in[3][3][3]",
"(in[3][3][3], out, in_out[0])",
"(in[3][3][3], out, in_out[0])",
"(in, out)[13]",
"(in, out)[13][0]",
};
std::string const test_fail[] = {
"",
"i n",
"inout",
"()",
"(in)",
"(out)",
"(in_out)",
"IN",
};
auto expect = [&](std::string const& sample, bool expected) {
It f = sample.begin(), l = sample.end();
Ast::TypeSpec spec;
bool ok = parse(f, l, parser, spec);
std::cout << "Test passed:" << std::boolalpha << (expected == ok) << "\n";
if (expected || (expected != ok)) {
if (ok) {
std::cout << "Parsed: " << spec << "\n";
} else {
std::cout << "Parse failed\n";
}
}
if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
};
for (std::string const sample : test_ok) expect(sample, true);
for (std::string const sample : test_fail) expect(sample, false);
}
Prints
Test passed:true
Parsed: IN
Test passed:true
Parsed: OUT
Test passed:true
Parsed: INOUT
Test passed:true
Parsed: (IN, OUT, )
Test passed:true
Parsed: (OUT, IN, )
Test passed:true
Parsed: (IN, IN, IN, OUT, INOUT, )
Test passed:true
Parsed: IN[13]
Test passed:true
Parsed: IN[0]
Test passed:true
Parsed: IN[-2]
Test passed:false
Parse failed
Remaining unparsed: 'in[1][2][3]'
Test passed:false
Parse failed
Remaining unparsed: 'in[3][3][3]'
Test passed:false
Parse failed
Remaining unparsed: '(in[3][3][3], out, in_out[0])'
Test passed:false
Parse failed
Remaining unparsed: '(in[3][3][3], out, in_out[0])'
Test passed:true
Parsed: (IN, OUT, )[13]
Test passed:false
Parse failed
Remaining unparsed: '(in, out)[13][0]'
Test passed:true
Test passed:true
Remaining unparsed: 'i n'
Test passed:true
Remaining unparsed: 'inout'
Test passed:true
Remaining unparsed: '()'
Test passed:true
Remaining unparsed: '(in)'
Test passed:true
Remaining unparsed: '(out)'
Test passed:true
Remaining unparsed: '(in_out)'
Test passed:true
Remaining unparsed: 'IN'
As you can see most things get parsed correctly, except for chained array dimensions like in[1][2]. The trouble is that we resolved ambiguity by inducing a "precedence" in the rules:
rScalarSpec = rTypeName | rTupleSpec;
rArraySpec = rScalarSpec >> '[' >> rNumericLiteral >> ']';
rTypeSpec = rArraySpec | rScalarSpec;
This means we always try expecting an array dimension first, and only fallback to scalar type-spec if we failed to find one. This is because any array-spec would always be matched as a scalarspec first making it impossible to parse the array-dimension part.
To fix the multi-dimensional case, you could try asserting that [ doesn't follow the array-spec:
rArraySpec = rScalarSpec >> '[' >> rNumericLiteral >> ']' >> !qi::lit('[')
| rArraySpec >> '[' >> rNumericLiteral >> ']';
But -- BOOM -- we're back at left-recursion again (in case we enter the second branch, e.g. in[1][).
Back to the drawing board.
Two thoughts cross my mind.
I'd say it would be very beneficial to remove the distinction between scalar/array spec in the AST. If a scalar were to be treated as a zero-rank array that would just mean we could always parse an optional dimension into the same resulting AST type.
The other thought more or less continues down the road shown above and would require backtracking all the way down if a presumed scalar spec was followed by a '[' character. This would lead to bad worst case behaviour in cases like (very long spec)[1][1][1][1][1][1][1][1][1][1].
Let me implement the first idea outlined after a coffee break :)
Reworked AST
Here the TypeSpec always carries a (possibly empty) collection of dimensions:
namespace Ast {
enum class TypeName { IN, OUT, INOUT };
static inline std::ostream& operator<<(std::ostream& os, TypeName tn) {
switch(tn) {
case TypeName::IN: return os << "IN";
case TypeName::OUT: return os << "OUT";
case TypeName::INOUT: return os << "INOUT";
}
return os << "?";
}
struct TypeSpec;
using ScalarSpec = boost::make_recursive_variant<
TypeName,
std::vector<TypeSpec>
>::type;
struct TypeSpec {
ScalarSpec spec;
std::vector<unsigned> dim;
};
using TupleSpec = std::vector<TypeSpec>;
}
Note that we also improved by making dimensions unsigned. The grammar will check that it's not 0 for completeness. A number of "positive" test cases have moved to the "expected-to-fail" cases for this reason.
Now the grammar is a straightforward mimic of that:
rRank %= qi::uint_ [qi::_pass = (qi::_1 > 0)];
rTypeName = sTypeName;
rTupleSpec = '(' >> rTypeSpec >> +(',' >> rTypeSpec) >> ')';
rScalarSpec = rTypeName | rTupleSpec;
rTypeSpec = rScalarSpec >> *('[' >> rRank >> ']');
Note the semantic action using Phoenix to assert that the array dimension cannot be 0
And here's the live demo showing all testcases passing:
FULL DEMO
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
/*
<numeric-literal> ::= integer
<type-name> ::= "in" | "out" | "in_out"
<array-type-spec> ::= <type-spec> "[" [<numeric-literal>] "]"
<tuple-type-spec> ::= "(" <type-spec> ("," <type-spec>)+ ")"
<type-spec> ::= <type-name> | <array-type-spec> | <tuple-type-spec>
*/
namespace Ast {
enum class TypeName { IN, OUT, INOUT };
static inline std::ostream& operator<<(std::ostream& os, TypeName tn) {
switch(tn) {
case TypeName::IN: return os << "IN";
case TypeName::OUT: return os << "OUT";
case TypeName::INOUT: return os << "INOUT";
}
return os << "?";
}
struct TypeSpec;
using ScalarSpec = boost::make_recursive_variant<
TypeName,
std::vector<TypeSpec>
>::type;
struct TypeSpec {
ScalarSpec spec;
std::vector<unsigned> dim;
};
using TupleSpec = std::vector<TypeSpec>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::TypeSpec, spec, dim)
// grammar:
namespace myGrammar {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const *, typename Signature = Ast::TypeSpec()>
struct myRules : qi::grammar<Iterator, Signature> {
myRules() : myRules::base_type(start) {
rRank %= qi::uint_ [qi::_pass = (qi::_1 > 0)];
rTypeName = sTypeName;
rTupleSpec = '(' >> rTypeSpec >> +(',' >> rTypeSpec) >> ')';
rScalarSpec = rTypeName | rTupleSpec;
rTypeSpec = rScalarSpec >> *('[' >> rRank >> ']');
start = qi::skip(qi::space)[rTypeSpec >> qi::eoi];
BOOST_SPIRIT_DEBUG_NODES((start)(rTypeSpec)(rTypeName)(rScalarSpec)(rTypeSpec)(rRank))
}
private:
using Skipper = qi::space_type;
qi::rule<Iterator, Ast::TypeSpec()> start;
qi::rule<Iterator, Ast::ScalarSpec(), Skipper> rScalarSpec;
qi::rule<Iterator, Ast::TypeSpec(), Skipper> rTypeSpec;
qi::rule<Iterator, Ast::TupleSpec(), Skipper> rTupleSpec;
// implicit lexeme
qi::rule<Iterator, Ast::TypeName()> rTypeName;
qi::rule<Iterator, unsigned()> rRank;
// symbols
struct TypeName_r : qi::symbols<char, Ast::TypeName> {
TypeName_r() {
using Ast::TypeName;
add ("in", TypeName::IN)
("out", TypeName::OUT)
("in_out", TypeName::INOUT);
}
} sTypeName;
};
}
static inline std::ostream& operator<<(std::ostream& os, Ast::TypeSpec tn) {
struct {
std::ostream& _os;
void operator()(Ast::ScalarSpec const& ts) const {
apply_visitor(*this, ts);
}
void operator()(Ast::TypeName tn) const { std::cout << tn; }
void operator()(Ast::TupleSpec const& tss) const {
std::cout << "(";
for (auto const& ts: tss) {
(*this)(ts);
std::cout << ", ";
}
std::cout << ")";
}
void operator()(Ast::TypeSpec const& as) const {
(*this)(as.spec);
for (auto rank : as.dim)
std::cout << '[' << rank << ']';
}
} const dumper{os};
dumper(tn);
return os;
}
int main() {
using It = std::string::const_iterator;
myGrammar::myRules<It> const parser;
std::string const test_ok[] = {
"in",
"out",
"in_out",
"(in, out)",
"(out, in)",
"(in, in, in, out, in_out)",
"in[13]",
"in[1][2][3]",
"in[3][3][3]",
"(in[3][3][3], out, in_out[1])",
"(in[3][3][3], out, in_out[1])",
"(in, out)[13]",
"(in, out)[13][14]",
};
std::string const test_fail[] = {
"",
"i n",
"inout",
"()",
"(in)",
"(out)",
"(in_out)",
"IN",
"in[0]",
"in[-2]",
"(in[3][3][3], out, in_out[0])",
"(in[3][3][3], out, in_out[0])",
};
auto expect = [&](std::string const& sample, bool expected) {
It f = sample.begin(), l = sample.end();
Ast::TypeSpec spec;
bool ok = parse(f, l, parser, spec);
std::cout << "Test passed:" << std::boolalpha << (expected == ok) << "\n";
if (expected || (expected != ok)) {
if (ok) {
std::cout << "Parsed: " << spec << "\n";
} else {
std::cout << "Parse failed\n";
}
}
if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
};
for (std::string const sample : test_ok) expect(sample, true);
for (std::string const sample : test_fail) expect(sample, false);
}
Prints
Test passed:true
Parsed: IN
Test passed:true
Parsed: OUT
Test passed:true
Parsed: INOUT
Test passed:true
Parsed: (IN, OUT, )
Test passed:true
Parsed: (OUT, IN, )
Test passed:true
Parsed: (IN, IN, IN, OUT, INOUT, )
Test passed:true
Parsed: IN[13]
Test passed:true
Parsed: IN[1][2][3]
Test passed:true
Parsed: IN[3][3][3]
Test passed:true
Parsed: (IN[3][3][3], OUT, INOUT[1], )
Test passed:true
Parsed: (IN[3][3][3], OUT, INOUT[1], )
Test passed:true
Parsed: (IN, OUT, )[13]
Test passed:true
Parsed: (IN, OUT, )[13][14]
Test passed:true
Test passed:true
Remaining unparsed: 'i n'
Test passed:true
Remaining unparsed: 'inout'
Test passed:true
Remaining unparsed: '()'
Test passed:true
Remaining unparsed: '(in)'
Test passed:true
Remaining unparsed: '(out)'
Test passed:true
Remaining unparsed: '(in_out)'
Test passed:true
Remaining unparsed: 'IN'
Test passed:true
Remaining unparsed: 'in[0]'
Test passed:true
Remaining unparsed: 'in[-2]'
Test passed:true
Remaining unparsed: '(in[3][3][3], out, in_out[0])'
Test passed:true
Remaining unparsed: '(in[3][3][3], out, in_out[0])'