This small sample grammar just parse this statements
a <--special (but ok because rule in grammer)
a()
a.b <--special
a.b()
a.b().c <--special
a().b.c()
a().b <--special
all cases with non () at the end are special and should be separate rules in spirit. Only the rule (special case 1) is correct so far. How to define a rule which capture all other cases without () at the end ?
lvalue_statement =
(
name >> +(
(lit('(') >> paralistopt >> lit(')')[_normal_action_call]
| (lit('.') >> name) [_normal_action_dot]
)
| name [_special_action_] // special case 1
)
another sample to explain what "special" means, you can see that the ROOT node should have the special AST Node or action
a.b -> SPECIAL_DOT(a,b)
a.b.c -> SPECIAL_DOT(a,NORMAL_DOT(b,c))
a(para).b.c -> SEPCIAL_DOT(NORMAL_DOT(CALL(a,para),c)
I'm quite averse of so many semantic actions¹.
I also think that's not your problem.
In language terms, you'd expect a.b to be member dereference, a() to be invocation, and hence a.b() would be invoation of a.b after the member dereference.
In that sense, a.b is the normal case, because it doesn't do invocation. a.b() would be "more special" in the sense that it is the same PLUS invocation.
I'd phrase my expression grammar to reflect this:
lvalue = name >> *(
'.' >> name
| '(' >> paralistopt >> ')'
);
This parses everything. Now you might go with semantic actions or attribute propagation
Semantic Actions #1
auto lvalue = name [ action("normal") ] >> *(
'.' >> name [ action("member_access") ]
| ('(' >> paralistopt >> ')') [ action("call") ]
);
There you go. Let's come up with a generic action that logs stuff:
auto action = [](auto type) {
return [=](auto& ctx){
auto& attr = _attr(ctx);
using A = std::decay_t<decltype(attr)>;
std::cout << type << ":";
if constexpr(boost::fusion::traits::is_sequence<A>::value) {
std::cout << boost::fusion::as_vector(attr);
} else if constexpr(x3::traits::is_container<A>::value && not std::is_same_v<std::string, A>) {
std::string_view sep;
std::cout << "{";
for (auto& el : attr) { std::cout << sep << el; sep = ", "; }
std::cout << "}";
} else {
std::cout << attr;
}
std::cout << "\n";
};
};
Now we can parse all the samples (plus a few more):
Live On Coliru prints:
=== "a"
normal:a
Ok
=== "a()"
normal:a
call:{}
Ok
=== "a.b"
normal:a
member_access:b
Ok
=== "a.b()"
normal:a
member_access:b
call:{}
Ok
=== "a.b().c"
normal:a
member_access:b
call:{}
member_access:c
Ok
=== "a().b.c()"
normal:a
call:{}
member_access:b
member_access:c
call:{}
Ok
=== "a().b.c()"
normal:a
call:{}
member_access:b
member_access:c
call:{}
Ok
=== "a(q,r,s).b"
normal:a
call:{q, r, s}
member_access:b
Ok
SA #2: Building an AST
Let's model the AST:
namespace Ast {
using name = std::string;
using params = std::vector<name>;
struct member_access;
struct call;
using lvalue = boost::variant<
name,
boost::recursive_wrapper<member_access>,
boost::recursive_wrapper<call>
>;
using params = std::vector<name>;
struct member_access { lvalue obj; name member; } ;
struct call { lvalue f; params args; } ;
}
Now we can replace the actions:
auto lvalue
= rule<struct lvalue_, Ast::lvalue> {"lvalue"}
= name [ ([](auto& ctx){ _val(ctx) = _attr(ctx); }) ] >> *(
'.' >> name [ ([](auto& ctx){ _val(ctx) = Ast::member_access{ _val(ctx), _attr(ctx) }; }) ]
| ('(' >> paralistopt >> ')') [ ([](auto& ctx){ _val(ctx) = Ast::call{ _val(ctx), _attr(ctx) }; }) ]
);
That's ugly - I don't recommend writing your code this way, but at least it drives home how few steps are involved.
Also adding some output operators:
namespace Ast { // debug output
static inline std::ostream& operator<<(std::ostream& os, Ast::member_access const& ma) {
return os << ma.obj << "." << ma.member;
}
static inline std::ostream& operator<<(std::ostream& os, Ast::call const& c) {
std::string_view sep;
os << c.f << "(";
for (auto& arg: c.args) { os << sep << arg; sep = ", "; }
return os << ")";
}
}
Now can parse everything with full AST: Live On Coliru, printing:
"a" -> a
"a()" -> a()
"a.b" -> a.b
"a.b()" -> a.b()
"a.b().c" -> a.b().c
"a().b.c()" -> a().b.c()
"a().b" -> a().b
"a(q,r,s).b" -> a(q, r, s).b
Automatic Propagation
Actually I sort of got stranded doing this. It took me too long to get it right and parse the associativity in a useful way, so I stopped trying. Let's instead summarize by cleaning up out second SA take:
Summary
Making the actions more readable:
auto passthrough =
[](auto& ctx) { _val(ctx) = _attr(ctx); };
template <typename T> auto binary_ =
[](auto& ctx) { _val(ctx) = T { _val(ctx), _attr(ctx) }; };
auto lvalue
= rule<struct lvalue_, Ast::lvalue> {"lvalue"}
= name [ passthrough ] >> *(
'.' >> name [ binary_<Ast::member_access> ]
| ('(' >> paralistopt >> ')') [ binary_<Ast::call> ]
);
Now there are a number of issues left:
You might want a more general expression grammar that doesn't just parse lvalue expressions (e.g. f(foo, 42) should probably parse, as should len("foo") + 17?).
To that end, the lvalue/rvalue distinction doesn't belong in the grammar: it's a semantic distinction mostly.
I happen to have created an extended parser that does all that + evaluation against proper LValues (while supporting general values). I'd suggest looking at the [extended chat][3] at this answer and the resulting code on github: https://github.com/sehe/qi-extended-parser-evaluator .
Full Listing
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
namespace Ast {
using name = std::string;
using params = std::vector<name>;
struct member_access;
struct call;
using lvalue = boost::variant<
name,
boost::recursive_wrapper<member_access>,
boost::recursive_wrapper<call>
>;
using params = std::vector<name>;
struct member_access { lvalue obj; name member; } ;
struct call { lvalue f; params args; } ;
}
namespace Ast { // debug output
static inline std::ostream& operator<<(std::ostream& os, Ast::member_access const& ma) {
return os << ma.obj << "." << ma.member;
}
static inline std::ostream& operator<<(std::ostream& os, Ast::call const& c) {
std::string_view sep;
os << c.f << "(";
for (auto& arg: c.args) { os << sep << arg; sep = ", "; }
return os << ")";
}
}
namespace Parser {
using namespace x3;
auto name
= rule<struct string_, Ast::name> {"name"}
= lexeme[alpha >> *(alnum|char_("_"))];
auto paralistopt
= rule<struct params_, Ast::params> {"params"}
= -(name % ',');
auto passthrough =
[](auto& ctx) { _val(ctx) = _attr(ctx); };
template <typename T> auto binary_ =
[](auto& ctx) { _val(ctx) = T { _val(ctx), _attr(ctx) }; };
auto lvalue
= rule<struct lvalue_, Ast::lvalue> {"lvalue"}
= name [ passthrough ] >> *(
'.' >> name [ binary_<Ast::member_access> ]
| ('(' >> paralistopt >> ')') [ binary_<Ast::call> ]
);
auto start = skip(space) [ lvalue ];
}
int main() {
for (std::string const input: {
"a", // special (but ok because rule in grammer)
"a()",
"a.b", // special
"a.b()",
"a.b().c", // special
"a().b.c()",
"a().b", // special
"a(q,r,s).b",
})
{
std::cout << std::quoted(input) << " -> ";
auto f = begin(input), l = end(input);
Ast::lvalue parsed;
if (parse(f, l, Parser::start, parsed)) {
std::cout << parsed << "\n";;
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << " -- Remainig: " << std::quoted(std::string(f,l)) << "\n";
}
}
}
Prints
"a" -> a
"a()" -> a()
"a.b" -> a.b
"a.b()" -> a.b()
"a.b().c" -> a.b().c
"a().b.c()" -> a().b.c()
"a().b" -> a().b
"a(q,r,s).b" -> a(q, r, s).b
¹ (they lead to a mess in the presence of backtracking, see Boost Spirit: "Semantic actions are evil"?)
Related
I have a string (nested strings even) that are formatted like a C++ braced initializer list. I want to tokenize them one level at a time into a vector of strings.
So when I input "{one, two, three}" to the function should output a three element vector
"one",
"two",
"three"
To complicate this, it needs to support quoted tokens and preserve nested lists:
Input String: "{one, {2, \"three four\"}}, \"five, six\", {\"seven, eight\"}}"
Output is a four element vector:
"one",
"{2, \"three four\"}",
"five, six",
"{\"seven, eight\"}"
I've looked at a few other SO posts:
Using Boost Tokenizer escaped_list_separator with different parameters
Boost split not traversing inside of parenthesis or braces
And used those to start a solution, but this seems slightly too complicated for the tokenizer (because of the braces):
#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>
std::vector<std::string> TokenizeBracedList(const std::string& x)
{
std::vector<std::string> tokens;
std::string separator1("");
std::string separator2(",\n\t\r");
std::string separator3("\"\'");
boost::escaped_list_separator<char> elements(separator1, separator2, separator3);
boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(x, elements);
for(auto i = std::begin(tokenizer); i != std::end(tokenizer); ++i)
{
auto token = *i;
boost::algorithm::trim(token);
tokens.push_back(token);
}
return tokens;
}
With this, even in the trivial case, it doesn't strip the opening and closing braces.
Boost and C++17 are fair game for a solution.
Simple (Flat) Take
Defining a flat data structure like:
using token = std::string;
using tokens = std::vector<token>;
We can define an X3 parser like:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Live On Wandbox
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
using token = std::string;
using tokens = std::vector<token>;
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
int main() {
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
std::vector<std::string> parsed;
bool ok = phrase_parse(f, l, Parser::list, x3::space, parsed);
if (ok) {
std::cout << "Parsed: " << parsed.size() << " elements\n";
for (auto& el : parsed) {
std::cout << " - " << std::quoted(el, '\'') << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints
Parsed: 2 elements
- 'one'
- 'five, six'
Parsed: 4 elements
- 'one'
- '{2, "three four"}'
- 'five, six'
- '{"seven, eight"}'
Nested Data
Changing the datastructure to be a bit more specific/realistic:
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
Now we can change the grammar, as we no longer need to treat sublist as if it is a string:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Everything "still works": Live On Wandbox
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
struct pretty_printer {
using result_type = void;
std::ostream& _os;
int _indent;
pretty_printer(std::ostream& os, int indent = 0) : _os(os), _indent(indent) {}
void operator()(ast::value const& v) { boost::apply_visitor(*this, v); }
void operator()(double v) { _os << v; }
void operator()(std::string s) { _os << std::quoted(s); }
void operator()(ast::list const& l) {
_os << "{\n";
_indent += 2;
for (auto& item : l) {
_os << std::setw(_indent) << "";
operator()(item);
_os << ",\n";
}
_indent -= 2;
_os << std::setw(_indent) << "" << "}";
}
};
int main() {
pretty_printer print{std::cout};
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
ast::value parsed;
bool ok = phrase_parse(f, l, Parser::item, x3::space, parsed);
if (ok) {
std::cout << "Parsed: ";
print(parsed);
std::cout << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints:
Parsed: {
"one",
"five, six",
}
Parsed: {
"one",
{
2,
"three four",
},
"five, six",
{
"seven, eight",
},
}
i can´t see my error here .. this rule parse some stuff ok but the last two samples not. Could somebody please give me a hint ..
Goal is a parser than can identify member property access and member function calls. Also chained in some way
a()
a(para)
x.a()
x.a(para)
x.a(para).g(para).j()
x.y
x.y.z
x.y.z() <---fail
y.z.z(para) <--- fail
lvalue =
iter_pos >> name[_val = _1]
>> *(lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_parameter, construct<std::vector<common_node> >(_1))]
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _3), key_this, construct<common_node>(_val), key_callname, construct<std::wstring>(_1), key_parameter, construct<std::vector<common_node> >(_2))]
>> *(lit('.') >> name_pure >> iter_pos)[_val = construct<common_node>(type_cmd_dot_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_propname, construct<std::wstring>(_1))]
;
thank you
Markus
You provide very little information to go at. Let me humor you with my entry into this guessing game:
Let's assume you want to parse a simple "language" that merely allows member expressions and function invocations, but chained.
Now, your grammar says nothing about the parameters (though it's clear the param list can be empty), so let me go the next mile and assume that you want to accept the same kind of expressions there (so foo(a) is okay, but also bar(foo(a)) or bar(b.foo(a))).
Since you accept chaining of function calls, it appears that functions are first-class objects (and functions can return functions), so foo(a)(b, c, d) should be accepted as well.
You didn't mention it, but parameters often include literals (sqrt(9) comes to mind, or println("hello world")).
Other items:
you didn't say but likely you want to ignore whitespace in certain spots
from the iter_pos (ab)use it seems you're interested in tracking the original source location inside the resulting AST.
1. Define An AST
We should keep it simple as ever:
namespace Ast {
using Identifier = boost::iterator_range<It>;
struct MemberExpression;
struct FunctionCall;
using Expression = boost::variant<
double, // some literal types
std::string,
// non-literals
Identifier,
boost::recursive_wrapper<MemberExpression>,
boost::recursive_wrapper<FunctionCall>
>;
struct MemberExpression {
Expression object; // antecedent
Identifier member; // function or field
};
using Parameter = Expression;
using Parameters = std::vector<Parameter>;
struct FunctionCall {
Expression function; // could be a member function
Parameters parameters;
};
}
NOTE We're not going to focus on showing source locations, but already made one provision, storing identifiers as an iterator-range.
NOTE Fusion-adapting the only types not directly supported by Spirit:
BOOST_FUSION_ADAPT_STRUCT(Ast::MemberExpression, object, member)
BOOST_FUSION_ADAPT_STRUCT(Ast::FunctionCall, function, parameters)
We will find that we don't use these, because Semantic Actions are more convenient here.
2. A Matching Grammar
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [expression];
identifier = raw [ (alpha|'_') >> *(alnum|'_') ];
parameters = -(expression % ',');
expression
= literal
| identifier >> *(
('.' >> identifier)
| ('(' >> parameters >> ')')
);
literal = double_ | string_;
string_ = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES(
(identifier)(start)(parameters)(expression)(literal)(string_)
);
}
In this skeleton most rules benefit from automatic attribute propagation. The one that doesn't is expression:
qi::rule<It, Expression()> start;
using Skipper = qi::space_type;
qi::rule<It, Expression(), Skipper> expression, literal;
qi::rule<It, Parameters(), Skipper> parameters;
// lexemes
qi::rule<It, Identifier()> identifier;
qi::rule<It, std::string()> string_;
So, let's create some helpers for the semantic actions.
NOTE An important take-away here is to create your own higher-level building blocks instead of toiling away with boost::phoenix::construct<> etc.
Define two simple construction functions:
struct mme_f { MemberExpression operator()(Expression lhs, Identifier rhs) const { return { lhs, rhs }; } };
struct mfc_f { FunctionCall operator()(Expression f, Parameters params) const { return { f, params }; } };
phx::function<mme_f> make_member_expression;
phx::function<mfc_f> make_function_call;
Then use them:
expression
= literal [_val=_1]
| identifier [_val=_1] >> *(
('.' >> identifier) [ _val = make_member_expression(_val, _1)]
| ('(' >> parameters >> ')') [ _val = make_function_call(_val, _1) ]
);
That's all. We're ready to roll!
3. DEMO
Live On Coliru
I created a test bed looking like this:
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> const g;
for (std::string const input : {
"a()", "a(para)", "x.a()", "x.a(para)", "x.a(para).g(para).j()", "x.y", "x.y.z",
"x.y.z()",
"y.z.z(para)",
// now let's add some funkyness that you didn't mention
"bar(foo(a))",
"bar(b.foo(a))",
"foo(a)(b, c, d)", // first class functions
"sqrt(9)",
"println(\"hello world\")",
"allocate(strlen(\"aaaaa\"))",
"3.14",
"object.rotate(180)",
"object.rotate(event.getAngle(), \"torque\")",
"app.mainwindow().find_child(\"InputBox\").font().size(12)",
"app.mainwindow().find_child(\"InputBox\").font(config().preferences.baseFont(style.PROPORTIONAL))"
}) {
std::cout << " =========== '" << input << "' ========================\n";
It f(input.begin()), l(input.end());
Ast::Expression parsed;
bool ok = parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << parsed << "\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Incredible as it may appear, this already parses all the test cases and prints:
=========== 'a()' ========================
Parsed: a()
=========== 'a(para)' ========================
Parsed: a(para)
=========== 'x.a()' ========================
Parsed: x.a()
=========== 'x.a(para)' ========================
Parsed: x.a(para)
=========== 'x.a(para).g(para).j()' ========================
Parsed: x.a(para).g(para).j()
=========== 'x.y' ========================
Parsed: x.y
=========== 'x.y.z' ========================
Parsed: x.y.z
=========== 'x.y.z()' ========================
Parsed: x.y.z()
=========== 'y.z.z(para)' ========================
Parsed: y.z.z(para)
=========== 'bar(foo(a))' ========================
Parsed: bar(foo(a))
=========== 'bar(b.foo(a))' ========================
Parsed: bar(b.foo(a))
=========== 'foo(a)(b, c, d)' ========================
Parsed: foo(a)(b, c, d)
=========== 'sqrt(9)' ========================
Parsed: sqrt(9)
=========== 'println("hello world")' ========================
Parsed: println(hello world)
=========== 'allocate(strlen("aaaaa"))' ========================
Parsed: allocate(strlen(aaaaa))
=========== '3.14' ========================
Parsed: 3.14
=========== 'object.rotate(180)' ========================
Parsed: object.rotate(180)
=========== 'object.rotate(event.getAngle(), "torque")' ========================
Parsed: object.rotate(event.getAngle(), torque)
=========== 'app.mainwindow().find_child("InputBox").font().size(12)' ========================
Parsed: app.mainwindow().find_child(InputBox).font().size(12)
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
4. Too Good To Be True?
You're right. I cheated. I didn't show you this code required to debug print the parsed AST:
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, MemberExpression const& me) {
return os << me.object << "." << me.member;
}
static inline std::ostream& operator<<(std::ostream& os, FunctionCall const& fc) {
os << fc.function << "(";
bool first = true;
for (auto& p : fc.parameters) { if (!first) os << ", "; first = false; os << p; }
return os << ")";
}
}
It's only debug printing, as string literals aren't correctly roundtripped. But it's only 10 lines of code, that's a bonus.
5. The Full Monty: Source Locations
This had your interest, so let's show it working. Let's add a simple loop to print all locations of identifiers:
using IOManip::showpos;
for (auto& id : all_identifiers(parsed)) {
std::cout << " - " << id << " at " << showpos(id, input) << "\n";
}
Of course, this begs the question, what are showpos and all_identifiers?
namespace IOManip {
struct showpos_t {
boost::iterator_range<It> fragment;
std::string const& source;
friend std::ostream& operator<<(std::ostream& os, showpos_t const& manip) {
auto ofs = [&](It it) { return it - manip.source.begin(); };
return os << "[" << ofs(manip.fragment.begin()) << ".." << ofs(manip.fragment.end()) << ")";
}
};
showpos_t showpos(boost::iterator_range<It> fragment, std::string const& source) {
return {fragment, source};
}
}
As for the identifier extraction:
std::vector<Identifier> all_identifiers(Expression const& expr) {
std::vector<Identifier> result;
struct Harvest {
using result_type = void;
std::back_insert_iterator<std::vector<Identifier> > out;
void operator()(Identifier const& id) { *out++ = id; }
void operator()(MemberExpression const& me) { apply_visitor(*this, me.object); *out++ = me.member; }
void operator()(FunctionCall const& fc) {
apply_visitor(*this, fc.function);
for (auto& p : fc.parameters) apply_visitor(*this, p);
}
// non-identifier expressions
void operator()(std::string const&) { }
void operator()(double) { }
} harvest { back_inserter(result) };
boost::apply_visitor(harvest, expr);
return result;
}
That's a tree visitor that harvests all identifiers recursively, inserting them into the back of a container.
Live On Coliru
Where output looks like (excerpt):
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
- app at [0..3)
- mainwindow at [4..14)
- find_child at [17..27)
- font at [40..44)
- config at [45..51)
- preferences at [54..65)
- baseFont at [66..74)
- style at [75..80)
- PROPORTIONAL at [81..93)
Try changing
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')'))
to
>> *(*(lit('.') >> name_pure) >> lit('(') > paralistopt > lit(')'))
I am playing with boost.spirit library and I cannot manage to report a simple error message from my semantic action.
// supported parameter types (int or quoted strings)
parameter = bsqi::int_ | bsqi::lexeme[L'"' > *(bsqi_coding::char_ - L'"') > L'"'];
parameter.name("parameter");
// comma separator list of parameters (or no parameters)
parameters = -(parameter % L',');
parameters.name("parameters");
// action with parameters
action = (Actions > L'(' > parameters > L')')[bsqi::_pass = boost::phoenix::bind(&ValidateAction, bsqi::_1, bsqi::_2)];
action.name("action");
The Actions is just a symbol table (boost::spirit::qi::symbols). The attribute of parameters is std::vector of boost::variant which describes the parameters types. I would like to produces a meaningful error message within semantic action ValidateAction with also indicating position within input what is wrong. If I just assign _pass to false, parsing ends but the error message is something like 'expecting ' and not that e.g. 2nd parameter has wrong type (expected int instead of string).
Somewhere I read that I can throw an exception from my semantic action, but the problem is that I didn't find whether and how I can access iterators from parsed values. For example I wanted to use expectation_failure exception so my error handler automatically is called, but I need to pass iterators to the exception which seems impossible.
Is there any nice way how to report semantic failures with more detailed information except returning just false?
I'd use filepos_iterator and just throw an exception, so you have complete control over the reporting.
Let me see what I can come up with in the remaining 15 minutes I have
Ok, took a little bit more time but think it's an instructive demo:
Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <boost/spirit/repository/include/qi_iter_pos.hpp>
#include <boost/lexical_cast.hpp>
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
namespace px = boost::phoenix;
namespace qi_coding = boost::spirit::ascii;
using It = boost::spirit::line_pos_iterator<std::string::const_iterator>;
namespace ast {
enum actionid { f_unary, f_binary };
enum param_type { int_param, string_param };
static inline std::ostream& operator<<(std::ostream& os, actionid id) {
switch(id) {
case f_unary: return os << "f_unary";
case f_binary: return os << "f_binary";
default: return os << "(unknown)";
} }
static inline std::ostream& operator<<(std::ostream& os, param_type t) {
switch(t) {
case int_param: return os << "integer";
case string_param: return os << "string";
default: return os << "(unknown)";
} }
using param_value = boost::variant<int, std::string>;
struct parameter {
It position;
param_value value;
friend std::ostream& operator<<(std::ostream& os, parameter const& p) { return os << p.value; }
};
using parameters = std::vector<parameter>;
struct action {
/*
*action() = default;
*template <typename Sequence> action(Sequence const& seq) { boost::fusion::copy(seq, *this); }
*/
actionid id;
parameters params;
};
}
namespace std {
static inline std::ostream& operator<<(std::ostream& os, ast::parameters const& v) {
std::copy(v.begin(), v.end(), std::ostream_iterator<ast::parameter>(os, " "));
return os;
}
}
BOOST_FUSION_ADAPT_STRUCT(ast::action, id, params)
BOOST_FUSION_ADAPT_STRUCT(ast::parameter, position, value)
struct BadAction : std::exception {
It _where;
std::string _what;
BadAction(It it, std::string msg) : _where(it), _what(std::move(msg)) {}
It where() const { return _where; }
char const* what() const noexcept { return _what.c_str(); }
};
struct ValidateAction {
std::map<ast::actionid, std::vector<ast::param_type> > const specs {
{ ast::f_unary, { ast::int_param } },
{ ast::f_binary, { ast::int_param, ast::string_param } },
};
ast::action operator()(It source, ast::action parsed) const {
auto check = [](ast::parameter const& p, ast::param_type expected_type) {
if (p.value.which() != expected_type) {
auto name = boost::lexical_cast<std::string>(expected_type);
throw BadAction(p.position, "Type mismatch (expecting " + name + ")");
}
};
int i;
try {
auto& formals = specs.at(parsed.id);
auto& actuals = parsed.params;
auto arity = formals.size();
for (i=0; i<arity; ++i)
check(actuals.at(i), formals.at(i));
if (actuals.size() > arity)
throw BadAction(actuals.at(arity).position, "Excess parameters");
} catch(std::out_of_range const&) {
throw BadAction(source, "Missing parameter #" + std::to_string(i+1));
}
return parsed;
}
};
template <typename It, typename Skipper = qi::space_type>
struct Parser : qi::grammar<It, ast::action(), Skipper> {
Parser() : Parser::base_type(start) {
using namespace qi;
parameter = qr::iter_pos >> (int_ | lexeme['"' >> *~qi_coding::char_('"') >> '"']);
parameters = -(parameter % ',');
action = actions_ >> '(' >> parameters >> ')';
start = (qr::iter_pos >> action) [ _val = validate_(_1, _2) ];
BOOST_SPIRIT_DEBUG_NODES((parameter)(parameters)(action))
}
private:
qi::rule<It, ast::action(), Skipper> start, action;
qi::rule<It, ast::parameters(), Skipper> parameters;
qi::rule<It, ast::parameter(), Skipper> parameter;
px::function<ValidateAction> validate_;
struct Actions : qi::symbols<char, ast::actionid> {
Actions() { this->add("f_unary", ast::f_unary)("f_binary", ast::f_binary); }
} actions_;
};
int main() {
for (std::string const input : {
// good
"f_unary( 0 )",
"f_binary ( 47, \"hello\")",
// errors
"f_binary ( 47, \"hello\") bogus",
"f_unary ( 47, \"hello\") ",
"f_binary ( 47, \r\n 7) ",
})
{
std::cout << "-----------------------\n";
Parser<It> p;
It f(input.begin()), l(input.end());
auto printErrorContext = [f,l](std::ostream& os, It where) {
auto line = get_current_line(f, where, l);
os << " line:" << get_line(where)
<< ", col:" << get_column(line.begin(), where) << "\n";
while (!line.empty() && std::strchr("\r\n", *line.begin()))
line.advance_begin(1);
std::cerr << line << "\n";
std::cerr << std::string(std::distance(line.begin(), where), ' ') << "^ --- here\n";
};
ast::action data;
try {
if (qi::phrase_parse(f, l, p > qi::eoi, qi::space, data)) {
std::cout << "Parsed: " << boost::fusion::as_vector(data) << "\n";
}
} catch(qi::expectation_failure<It> const& e) {
printErrorContext(std::cerr << "Expectation failed: " << e.what_, e.first);
} catch(BadAction const& ba) {
printErrorContext(std::cerr << "BadAction: " << ba.what(), ba.where());
}
if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
}
Printing:
-----------------------
Parsed: (f_unary 0 )
-----------------------
Parsed: (f_binary 47 hello )
-----------------------
Expectation failed: <eoi> line:1, col:25
f_binary ( 47, "hello") bogus
^ --- here
Remaining unparsed: 'f_binary ( 47, "hello") bogus'
-----------------------
BadAction: Excess parameters line:1, col:15
f_unary ( 47, "hello")
^ --- here
Remaining unparsed: 'f_unary ( 47, "hello") '
-----------------------
BadAction: Type mismatch (expecting string) line:2, col:8
7)
^ --- here
Remaining unparsed: 'f_binary ( 47,
7) '
I am currentl adding expectation points to my grammar in X3.
Now I came accross an rule, which looks like this.
auto const id_string = +x3::char("A-Za-z0-9_);
auto const nested_identifier_def =
x3::lexeme[
*(id_string >> "::")
>> *(id_string >> ".")
>> id_string
];
I am wondering how I can add conditional expectation points to this rule.
Like "if there is a "::" then there musst follow an id_string" or "when there is a . then there musst follow an id_string"
and so on.
How can I achieve such a behaviour for such a rule?
I'd write it exactly the way you intend it:
auto const identifier
= lexeme [+char_("A-Za-z0-9_")];
auto const qualified_id
= identifier >> *("::" > identifier);
auto const simple_expression // only member expressions supported now
= qualified_id >> *('.' > identifier);
With a corresponding AST:
namespace AST {
using identifier = std::string;
struct qualified_id : std::vector<identifier> { using std::vector<identifier>::vector; };
struct simple_expression {
qualified_id lhs;
std::vector<identifier> rhs;
};
}
LIVE DEMO
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
namespace AST {
using identifier = std::string;
struct qualified_id : std::vector<identifier> { using std::vector<identifier>::vector; };
struct simple_expression {
qualified_id lhs;
std::vector<identifier> rhs;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(AST::simple_expression, lhs, rhs)
#include <boost/spirit/home/x3.hpp>
namespace Parser {
using namespace boost::spirit::x3;
auto const identifier
= rule<struct identifier_, AST::identifier> {}
= lexeme [+char_("A-Za-z0-9_")];
auto const qualified_id
= rule<struct qualified_id_, AST::qualified_id> {}
= identifier >> *("::" > identifier);
auto const simple_expression // only member expressions supported now
= rule<struct simple_expression_, AST::simple_expression> {}
= qualified_id >> *('.' > identifier);
}
int main() {
using It = std::string::const_iterator;
for (std::string const input : { "foo", "foo::bar", "foo.member", "foo::bar.member.subobject" }) {
It f = input.begin(), l = input.end();
AST::simple_expression data;
bool ok = phrase_parse(f, l, Parser::simple_expression, boost::spirit::x3::space, data);
if (ok) {
std::cout << "Parse success: ";
for (auto& el : data.lhs) std::cout << "::" << el;
for (auto& el : data.rhs) std::cout << "." << el;
std::cout << "\n";
}
else {
std::cout << "Parse failure ('" << input << "')\n";
}
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Prints
Parse success: ::foo
Parse success: ::foo::bar
Parse success: ::foo.member
Parse success: ::foo::bar.member.subobject
I want to use boost spirit to parse an expression like
function1(arg1, arg2, function2(arg1, arg2, arg3),
function3(arg1,arg2))
and call corresponding c++ functions. What should be the grammar to parse above expression and call the corresponding c++ function by phoneix::bind()?
I have 2 types of functions to call
1) string functions;
wstring GetSubString(wstring stringToCut, int position, int length);
wstring GetStringToken(wstring stringToTokenize, wstring seperators,
int tokenNumber );
2) Functions that return integer;
int GetCount();
int GetId(wstring srcId, wstring srcType);
Second Answer (more pragmatic)
Here's a second take, for comparison:
Just in case you really didn't want to parse into an abstract syntax tree representation, but rather evaluate the functions on-the-fly during parsing, you can simplify the grammar.
It comes in at 92 lines as opposed to 209 lines in the first answer. It really depends on what you're implementing which approach is more suitable.
This shorter approach has some downsides:
less flexible (not reusable)
less robust (if functions have side effects, they will happen even if parsing fails halfway)
less extensible (the supported functions are hardwired into the grammar1)
Full code:
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/phoenix/function.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef boost::variant<int, std::string> value;
//////////////////////////////////////////////////
// Demo functions:
value AnswerToLTUAE() {
return 42;
}
value ReverseString(value const& input) {
auto& as_string = boost::get<std::string>(input);
return std::string(as_string.rbegin(), as_string.rend());
}
value Concatenate(value const& a, value const& b) {
std::ostringstream oss;
oss << a << b;
return oss.str();
}
BOOST_PHOENIX_ADAPT_FUNCTION_NULLARY(value, AnswerToLTUAE_, AnswerToLTUAE)
BOOST_PHOENIX_ADAPT_FUNCTION(value, ReverseString_, ReverseString, 1)
BOOST_PHOENIX_ADAPT_FUNCTION(value, Concatenate_, Concatenate, 2)
//////////////////////////////////////////////////
// Parser grammar
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, value(), Skipper>
{
parser() : parser::base_type(expr_)
{
using namespace qi;
function_call_ =
(lit("AnswerToLTUAE") > '(' > ')')
[ _val = AnswerToLTUAE_() ]
| (lit("ReverseString") > '(' > expr_ > ')')
[ _val = ReverseString_(_1) ]
| (lit("Concatenate") > '(' > expr_ > ',' > expr_ > ')')
[ _val = Concatenate_(_1, _2) ]
;
string_ = as_string [
lexeme [ "'" >> *~char_("'") >> "'" ]
];
value_ = int_ | string_;
expr_ = function_call_ | value_;
on_error<fail> ( expr_, std::cout
<< phx::val("Error! Expecting ") << _4 << phx::val(" here: \"")
<< phx::construct<std::string>(_3, _2) << phx::val("\"\n"));
BOOST_SPIRIT_DEBUG_NODES((expr_)(function_call_)(value_)(string_))
}
private:
qi::rule<It, value(), Skipper> value_, function_call_, expr_, string_;
};
int main()
{
for (const std::string input: std::vector<std::string> {
"-99",
"'string'",
"AnswerToLTUAE()",
"ReverseString('string')",
"Concatenate('string', 987)",
"Concatenate('The Answer Is ', AnswerToLTUAE())",
})
{
auto f(std::begin(input)), l(std::end(input));
const static parser<decltype(f)> p;
value direct_eval;
bool ok = qi::phrase_parse(f,l,p,qi::space,direct_eval);
if (!ok)
std::cout << "invalid input\n";
else
{
std::cout << "input:\t" << input << "\n";
std::cout << "eval:\t" << direct_eval << "\n\n";
}
if (f!=l) std::cout << "unparsed: '" << std::string(f,l) << "'\n";
}
}
Note how, instead of using BOOST_PHOENIX_ADAPT_FUNCTION* we could have directly used boost::phoenix::bind.
The output is still the same:
input: -99
eval: -99
input: 'string'
eval: string
input: AnswerToLTUAE()
eval: 42
input: ReverseString('string')
eval: gnirts
input: Concatenate('string', 987)
eval: string987
input: Concatenate('The Answer Is ', AnswerToLTUAE())
eval: The Answer Is 42
1 This last downside is easily remedied by using the 'Nabialek Trick'
First Answer (complete)
I've gone and implemented a simple recursive expression grammar for functions having up-to-three parameters:
for (const std::string input: std::vector<std::string> {
"-99",
"'string'",
"AnswerToLTUAE()",
"ReverseString('string')",
"Concatenate('string', 987)",
"Concatenate('The Answer Is ', AnswerToLTUAE())",
})
{
auto f(std::begin(input)), l(std::end(input));
const static parser<decltype(f)> p;
expr parsed_script;
bool ok = qi::phrase_parse(f,l,p,qi::space,parsed_script);
if (!ok)
std::cout << "invalid input\n";
else
{
const static generator<boost::spirit::ostream_iterator> g;
std::cout << "input:\t" << input << "\n";
std::cout << "tree:\t" << karma::format(g, parsed_script) << "\n";
std::cout << "eval:\t" << evaluate(parsed_script) << "\n";
}
if (f!=l) std::cout << "unparsed: '" << std::string(f,l) << "'\n";
}
Which prints:
input: -99
tree: -99
eval: -99
input: 'string'
tree: 'string'
eval: string
input: AnswerToLTUAE()
tree: nullary_function_call()
eval: 42
input: ReverseString('string')
tree: unary_function_call('string')
eval: gnirts
input: Concatenate('string', 987)
tree: binary_function_call('string',987)
eval: string987
input: Concatenate('The Answer Is ', AnswerToLTUAE())
tree: binary_function_call('The Answer Is ',nullary_function_call())
eval: The Answer Is 42
Some notes:
I separated parsing from execution (which is always a good idea IMO)
I implemented function evaluation for zero, one or two parameters (this should be easy to extend)
Values are assumed to be integers or strings (should be easy to extend)
I added a karma generator to display the parsed expression (with a TODO marked in the comment)
I hope this helps:
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/variant/recursive_wrapper.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
namespace phx = boost::phoenix;
typedef boost::variant<int, std::string> value;
typedef boost::variant<value, boost::recursive_wrapper<struct function_call> > expr;
typedef std::function<value() > nullary_function_impl;
typedef std::function<value(value const&) > unary_function_impl;
typedef std::function<value(value const&, value const&)> binary_function_impl;
typedef boost::variant<nullary_function_impl, unary_function_impl, binary_function_impl> function_impl;
typedef qi::symbols<char, function_impl> function_table;
struct function_call
{
typedef std::vector<expr> arguments_t;
function_call() = default;
function_call(function_impl f, arguments_t const& arguments)
: f(f), arguments(arguments) { }
function_impl f;
arguments_t arguments;
};
BOOST_FUSION_ADAPT_STRUCT(function_call, (function_impl, f)(function_call::arguments_t, arguments))
#ifdef BOOST_SPIRIT_DEBUG
namespace std
{
static inline std::ostream& operator<<(std::ostream& os, nullary_function_impl const& f) { return os << "<nullary_function_impl>"; }
static inline std::ostream& operator<<(std::ostream& os, unary_function_impl const& f) { return os << "<unary_function_impl>"; }
static inline std::ostream& operator<<(std::ostream& os, binary_function_impl const& f) { return os << "<binary_function_impl>"; }
}
static inline std::ostream& operator<<(std::ostream& os, function_call const& call) { return os << call.f << "(" << call.arguments.size() << ")"; }
#endif
//////////////////////////////////////////////////
// Evaluation
value evaluate(const expr& e);
struct eval : boost::static_visitor<value>
{
eval() {}
value operator()(const value& v) const
{
return v;
}
value operator()(const function_call& call) const
{
return boost::apply_visitor(invoke(call.arguments), call.f);
}
private:
struct invoke : boost::static_visitor<value>
{
function_call::arguments_t const& _args;
invoke(function_call::arguments_t const& args) : _args(args) {}
value operator()(nullary_function_impl const& f) const {
return f();
}
value operator()(unary_function_impl const& f) const {
auto a = evaluate(_args.at(0));
return f(a);
}
value operator()(binary_function_impl const& f) const {
auto a = evaluate(_args.at(0));
auto b = evaluate(_args.at(1));
return f(a, b);
}
};
};
value evaluate(const expr& e)
{
return boost::apply_visitor(eval(), e);
}
//////////////////////////////////////////////////
// Demo functions:
value AnswerToLTUAE() {
return 42;
}
value ReverseString(value const& input) {
auto& as_string = boost::get<std::string>(input);
return std::string(as_string.rbegin(), as_string.rend());
}
value Concatenate(value const& a, value const& b) {
std::ostringstream oss;
oss << a << b;
return oss.str();
}
//////////////////////////////////////////////////
// Parser grammar
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, expr(), Skipper>
{
parser() : parser::base_type(expr_)
{
using namespace qi;
n_ary_ops.add
("AnswerToLTUAE", nullary_function_impl{ &::AnswerToLTUAE })
("ReverseString", unary_function_impl { &::ReverseString })
("Concatenate" , binary_function_impl { &::Concatenate });
function_call_ = n_ary_ops > '(' > expr_list > ')';
string_ = qi::lexeme [ "'" >> *~qi::char_("'") >> "'" ];
value_ = qi::int_ | string_;
expr_list = -expr_ % ',';
expr_ = function_call_ | value_;
on_error<fail> ( expr_, std::cout
<< phx::val("Error! Expecting ") << _4 << phx::val(" here: \"")
<< phx::construct<std::string>(_3, _2) << phx::val("\"\n"));
BOOST_SPIRIT_DEBUG_NODES((expr_)(expr_list)(function_call_)(value_)(string_))
}
private:
function_table n_ary_ops;
template <typename Attr> using Rule = qi::rule<It, Attr(), Skipper>;
Rule<std::string> string_;
Rule<value> value_;
Rule<function_call> function_call_;
Rule<std::vector<expr>> expr_list;
Rule<expr> expr_;
};
//////////////////////////////////////////////////
// Output generator
template <typename It>
struct generator : karma::grammar<It, expr()>
{
generator() : generator::base_type(expr_)
{
using namespace karma;
nullary_ = eps << "nullary_function_call"; // TODO reverse lookup :)
unary_ = eps << "unary_function_call";
binary_ = eps << "binary_function_call";
function_ = nullary_ | unary_ | binary_;
function_call_ = function_ << expr_list;
expr_list = '(' << -(expr_ % ',') << ')';
value_ = karma::int_ | ("'" << karma::string << "'");
expr_ = function_call_ | value_;
}
private:
template <typename Attr> using Rule = karma::rule<It, Attr()>;
Rule<nullary_function_impl> nullary_;
Rule<unary_function_impl> unary_;
Rule<binary_function_impl> binary_;
Rule<function_impl> function_;
Rule<function_call> function_call_;
Rule<value> value_;
Rule<std::vector<expr>> expr_list;
Rule<expr> expr_;
};
int main()
{
for (const std::string input: std::vector<std::string> {
"-99",
"'string'",
"AnswerToLTUAE()",
"ReverseString('string')",
"Concatenate('string', 987)",
"Concatenate('The Answer Is ', AnswerToLTUAE())",
})
{
auto f(std::begin(input)), l(std::end(input));
const static parser<decltype(f)> p;
expr parsed_script;
bool ok = qi::phrase_parse(f,l,p,qi::space,parsed_script);
if (!ok)
std::cout << "invalid input\n";
else
{
const static generator<boost::spirit::ostream_iterator> g;
std::cout << "input:\t" << input << "\n";
std::cout << "tree:\t" << karma::format(g, parsed_script) << "\n";
std::cout << "eval:\t" << evaluate(parsed_script) << "\n\n";
}
if (f!=l) std::cout << "unparsed: '" << std::string(f,l) << "'\n";
}
}