Cleanest way to handle both quoted and unquoted strings in Spirit.X3

Cleanest way to handle both quoted and unquoted strings in Spirit.X3 - c++

Buon giorno,
I have to parse something such as:
foo: 123
"bar": 456
The quotes should be removed if they are here. I tried:
((+x3::alnum) | ('"' >> (+x3::alnum) >> '"'))
But the parser actions for this are of type variant<string, string> ; is there a way to make it so that the parser understands that those two are equivalent, and for my action to only get a single std::string as argument in its call?
edit: minimal repro (live on godbolt: https://gcc.godbolt.org/z/GcE8Pj4r5) :
#include <boost/spirit/home/x3.hpp>
using namespace boost::spirit;
// action handlers
struct handlers {
void create_member(const std::string& str) { }
};
// rules
static const x3::rule<struct id_obj_mem> obj_mem = "obj_mem";
#define EVENT(e) ([](auto& ctx) { x3::get<handlers>(ctx).e(x3::_attr(ctx)); })
static const auto obj_mem_def = ((
((+x3::alnum) | ('"' >> (+x3::alnum) >> '"'))
>> ':' >> x3::lit("123"))[EVENT(create_member)] % ',');
BOOST_SPIRIT_DEFINE(obj_mem)
// execution
int main()
{
handlers r;
std::string str = "foo: 123";
auto first = str.begin();
auto last = str.end();
bool res = phrase_parse(
first,
last,
boost::spirit::x3::with<handlers>(r)[obj_mem_def],
boost::spirit::x3::ascii::space);
}

I too consider this a kind of defect. X3 is definitely less "friendly" in terms of the synthesized attribute types. I guess it's just a tacit side-effect of being more core-language oriented, where attribute assignment is effectively done via default "visitor" actions.
Although I understand the value of keeping the magic to a minimum, and staying close to "pure C++", I vastly prefer the Qi way of synthesizing attributes here. I believe it has proven a hard problem to fix, as this problem has been coming/going in some iterations of X3.
I've long decided to basically fix it myself with variations of this idiom:
template <typename T> struct as_type {
auto operator()(auto p) const { return x3::rule<struct Tag, T>{} = p; }
};
static constexpr as_type<std::string> as_string{};
Now I'd write that as:
auto quoted = '"' >> +x3::alnum >> '"';
auto name = as_string(+x3::alnum | quoted);
auto prop = (name >> ':' >> "123")[EVENT(create_member)] % ',';
That will compile no problem:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct handlers {
void create_member(std::string const& str) {
std::cerr << __FUNCTION__ << " " << std::quoted(str) << "\n";
}
};
namespace Parser {
#define EVENT(e) ([](auto& ctx) { get<handlers>(ctx).e(_attr(ctx)); })
template <typename T> struct as_type {
auto operator()(auto p) const { return x3::rule<struct Tag, T>{} = p; }
};
static constexpr as_type<std::string> as_string{};
auto quoted = '"' >> +x3::alnum >> '"';
auto name = as_string(+x3::alnum | quoted);
auto prop = (name >> ':' >> "123")[EVENT(create_member)] % ',';
auto grammar = x3::skip(x3::space)[prop];
} // namespace Parser
int main() {
handlers r;
std::string const str = "foo: 123";
auto first = str.begin(), last = str.end();
bool res = parse(first, last, x3::with<handlers>(r)[Parser::grammar]);
return res ? 1 : 0;
}
Prints
create_member "foo"
Interesting Links
Spirit X3, How to get attribute type to match rule type?
Combining rules at runtime and returning rules
spirit x3 cannot propagate attributes of type optional<vector>
etc.

Related

boost x3 grammar for structs with multiple constructors

Trying to figure out how to parse structs that have multiple constructors or overloaded constructors. For example in this case, a range struct that contains either a range or a singleton case where the start/end of the range is equal.
case 1: look like
"start-stop"
case 2:
"start"
For the range case
auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
works
but
auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);
unsurprisingly, won't match the signature and fails to compile.
Not sure what the fix is?
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange
{
size_t start;
size_t end;
// little bit weird because should be end+1, but w/e
explicit MyRange(size_t start, size_t end = 0) : start(start), end(end == 0 ? start : end)
{
}
};
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
// BOOST_FUSION_ADAPT_STRUCT(MyRange, start)
//
int main()
{
auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
// auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);
for (std::string input :
{"1-2", "1","1-" ,"garbage"})
{
auto success = x3::phrase_parse(input.begin(), input.end(),
// Begin grammar
range_constraint,
// End grammar
x3::ascii::space);
std::cout << "`" << input << "`"
<< "-> " << success<<std::endl;
}
return 0;
}

It's important to realize that sequence adaptation by definition uses default construction with subsequent sequence element assignment.
Another issue is branch ordering in PEG grammars. int_ will always success where int_ >> '‑' >> int_ would so you would never match the range version.
Finally, to parse size_t usually prefer uint_/uint_parser<size_t> :)
Things That Don't Work
There are several ways to skin this cat. For one, there's BOOST_FUSION_ADAPT_STRUCT_NAMED, which would allow you to do
BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, Range, start, end)
BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, SingletonRange, start)
So one pretty elaborate would seem to spell it out:
auto range = x3::rule<struct _, Range>{} = uint_ >> '-' >> uint_;
auto singleton = x3::rule<struct _, SingletonRange>{} = uint_;
auto rule = x3::rule<struct _, MyRange>{} = range | singleton;
TIL that this doesn't even compile, apparently Qi was differently: Live On Coliru
X3 requires the attribute to be default-constructible whereas Qi would attempt to bind to the passed-in attribute reference first.
Even in the Qi version you can see that the fact Fusion sequences will be default-contructed-then-memberwise-assigned leads to results you didn't expect or want:
`1-2` -> true
-- [1,NIL)
`1` -> true
-- [1,NIL)
`1-` -> true
-- [1,NIL)
`garbage` -> false
What Works
Instead of doing the complicated things, do the simple thing. Anytime you see an optional value you can usually provide a default value. Alternatively you can not use Sequence adaptation at all, and go straight to semantic actions.
Semantic Actions
The simplest way would be to have specific branches:
auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx))); };
auto rule = x3::rule<void, MyRange>{} =
(uint_ >> '-' >> uint_)[assign2] | uint_[assign1];
Slighty more advanced, but more efficient:
auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(_val(ctx).start, _attr(ctx)); };
auto rule = x3::rule<void, MyRange>{} = uint_[assign1] >> -('-' >> uint_[assign2]);
Lastly, we can move towards defaulting the optional end:
auto rule = x3::rule<void, MyRange>{} =
(uint_ >> ('-' >> uint_ | x3::attr(MyRange::unspecified))) //
[assign];
Now the semantic action will have to deal with the variant end type:
auto assign = [](auto& ctx) {
auto start = at_c<0>(_attr(ctx));
_val(ctx) = apply_visitor( //
[=](auto end) { return MyRange(start, end); }, //
at_c<1>(_attr(ctx)));
};
Also Live On Coliru
Simplify?
I'd consider modeling the range explicitly as having an optional end:
struct MyRange {
MyRange() = default;
MyRange(size_t s, boost::optional<size_t> e = {}) : start(s), end(e) {
assert(!e || *e >= s);
}
size_t size() const { return end? *end - start : 1; }
bool empty() const { return size() == 0; }
size_t start = 0;
boost::optional<size_t> end = 0;
};
Now you can directly use the optional to construct:
auto assign = [](auto& ctx) {
_val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx)));
};
auto rule = x3::rule<void, MyRange>{} = (uint_ >> -('-' >> uint_))[assign];
Actually, here we can go back to using adapted sequences, although with different semantics:
Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange {
size_t start = 0;
boost::optional<size_t> end = 0;
};
static inline std::ostream& operator<<(std::ostream& os, MyRange const& mr) {
if (mr.end)
return os << "[" << mr.start << "," << *mr.end << ")";
else
return os << "[" << mr.start << ",)";
}
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
int main() {
x3::uint_parser<size_t> uint_;
auto rule = x3::rule<void, MyRange>{} = uint_ >> -('-' >> uint_);
for (std::string const input : {"1-2", "1", "1-", "garbage"}) {
MyRange into;
auto success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
<< std::endl;
if (success) {
std::cout << " -- " << into << "\n";
}
}
}
Summarizing
I hope these strategies give you all the things you needed. Pay close attention to the semantics of your range. Specifically, I never payed any attention to difference between "1" and "1-". You might want one to be [1,2) and the other to be [1,inf), both to be equivalent, or the second one might even be considered invalid?
Stepping back even further, I'd suggest that maybe you just needed
using Bound = std::optional<size_t>;
using MyRange = std::pair<Bound, Bound>;
Which you could parse directly with:
auto boundary = -x3::uint_parser<size_t>{};
auto rule = x3::rule<void, MyRange>{} = boundary >> '-' >> boundary;
It would allow for more inputs:
for (std::string const input : {"-2", "1-2", "1", "1-", "garbage"}) {
MyRange into;
auto success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
<< std::endl;
if (success) {
std::cout << " -- " << into << "\n";
}
}
Prints: Live On Coliru
`-2` -> true
-- [,2)
`1-2` -> true
-- [1,2)
`1` -> false
`1-` -> true
-- [1,)
`garbage` -> false

Make boost::spirit::symbol parser non greedy

I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over. I use x3::symbols to get automatically get the parsed keyword represented as an enum value.
Minimal example:
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>
namespace x3 = boost::spirit::x3;
enum class TypeKeyword { Int, Float, Bool };
struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
TypeKeywordSymbolTable()
{
add("float", TypeKeyword::Float)
("int", TypeKeyword::Int)
("bool", TypeKeyword::Bool);
}
};
const TypeKeywordSymbolTable type_keyword_symbol_table;
struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;
const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);
using Iterator = std::string_view::const_iterator;
/* Thrown when the parser has failed to parse the whole input stream. Contains
* the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
public:
LeftoverError(Iterator begin, Iterator end)
: std::runtime_error(std::string(begin, end))
{}
std::string_view get_leftover_data() const noexcept { return what(); }
};
template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
Iterator begin = input.begin();
Iterator end = input.end();
using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
typename Rule::attribute_type result;
try {
bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
if (r && begin == end) {
return result;
} else { // Occurs when the whole input stream has not been consumed.
throw LeftoverError(begin, end);
}
} catch (const ExpectationFailure& exc) {
throw LeftoverError(exc.where(), end);
}
}
int main()
{
// TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
// "boolean" leftover is desired.
parse("boolean", type_keyword);
// TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
// "integer" leftover is desired.
parse("integer", type_keyword);
// TypeKeyword::Int is parsed successfully and this is the desired behavior.
parse("int", type_keyword);
}
Basicly, I want integer not to be recognized as a keyword with additional eger left to parse.

I morphed the test cases into self-describing expectations:
Live On Compiler Explorer
Prints:
FAIL "boolean" -> TypeKeyword::Bool (expected Leftover:"boolean")
FAIL "integer" -> TypeKeyword::Int (expected Leftover:"integer")
OK "int" -> TypeKeyword::Int
Now, the simplest, naive approach would be to make sure you parse till the eoi, by simply changing
auto actual = parse(input, Parser::type_keyword);
To
auto actual = parse(input, Parser::type_keyword >> x3::eoi);
And indeed the tests pass: Live
OK "boolean" -> Leftover:"boolean"
OK "integer" -> Leftover:"integer"
OK "int" -> TypeKeyword::Int
However, this fits the tests, but not the goal. Let's imagine a more involved grammar, where type identifier; is to be parsed:
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z_0-9")];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_;
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
I'll leave the details for Compiler Explorer:
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
Looks good. But what if we add some interesting tests:
{"flo at f;", LeftoverError("flo at f;")},
{"boolean;", LeftoverError("boolean;")},
It prints (Live)
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
FAIL "boolean;" -> (TypeKeyword::Bool ean) (expected Leftover:"boolean;")
So, the test cases were lacking. Your prose description is actually closer:
I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over
That correctly implies you want to check the lexeme inside the type_keyword rule. A naive try might be checking that no identifier character follows the type keyword:
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_ >> !identchar;
Where identchar was factored out of identifier like so:
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
However, this doesn't work. Can you see why (peeking allowed: https://godbolt.org/z/jb4zfhfWb)?
Our latest devious test case now passes (yay), but int j; is now rejected! If you think about it, it only makes sense, because you have spaced skipped.
The essential word I used a moment ago was lexeme: you want to treat some units as lexemes (and whitespace stops the lexeme. Or rather, whitespace isn't automatically skipped inside the lexeme¹). So, a fix would be:
auto type_keyword
// ...
= x3::lexeme[type_ >> !identchar];
(Note how I sneakily already included that on the identifier rule earlier)
Lo and behold (Live):
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
OK "boolean;" -> Leftover:"boolean;"
Summarizing
This topic is a frequently recurring one, and it requires a solid understanding of skippers, lexemes first and foremost. Here are some other posts for inspiration:
Stop X3 symbols from matching substrings
parsing identifiers except keywords
Boost Spirit x3: parse delimited string Where I introduce a more general helper you might find useful:
auto kw = [](auto p) {
return x3::lexeme [ x3::as_parser(p) >> !x3::char_("a-zA-Z0-9_") ];
};
Stop X3 symbols from matching substrings
Dynamically switching symbol tables in x3
Good luck!
Complete Listing
Anti-Bitrot, the final listing:
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/as_vector.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace Ast {
enum class TypeKeyword { Int, Float, Bool };
static std::ostream& operator<<(std::ostream& os, TypeKeyword tk) {
switch (tk) {
case TypeKeyword::Int: return os << "TypeKeyword::Int";
case TypeKeyword::Float: return os << "TypeKeyword::Float";
case TypeKeyword::Bool: return os << "TypeKeyword::Bool";
};
return os << "?";
}
using Identifier = std::string;
struct Declaration {
TypeKeyword type;
Identifier id;
bool operator==(Declaration const&) const = default;
};
} // namespace Ast
BOOST_FUSION_ADAPT_STRUCT(Ast::Declaration, type, id)
namespace Ast{
static std::ostream& operator<<(std::ostream& os, Ast::Declaration const& d) {
return os << boost::lexical_cast<std::string>(boost::fusion::as_vector(d));
}
} // namespace Ast
namespace Parser {
struct Type : x3::symbols<Ast::TypeKeyword> {
Type() {
add("float", Ast::TypeKeyword::Float) //
("int", Ast::TypeKeyword::Int) //
("bool", Ast::TypeKeyword::Bool); //
}
} const static type_;
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= x3::lexeme[type_ >> !identchar];
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
} // namespace Parser
struct LeftoverError : std::runtime_error {
using std::runtime_error::runtime_error;
friend std::ostream& operator<<(std::ostream& os, LeftoverError const& e) {
return os << (std::string("Leftover:\"") + e.what() + "\"");
}
bool operator==(LeftoverError const& other) const {
return std::string_view(what()) == other.what();
}
};
template<typename T>
using Maybe = boost::variant<T, LeftoverError>;
template <typename Rule,
typename Attr = typename x3::traits::attribute_of<Rule, x3::unused_type>::type,
typename R = Maybe<Attr>>
R parse(std::string_view input, Rule const& rule) {
Attr result;
auto f = input.begin(), l = input.end();
return x3::phrase_parse(f, l, rule, x3::space, result)
? R(std::move(result))
: LeftoverError({f, l});
}
int main()
{
using namespace Ast;
struct {
std::string_view input;
Maybe<Declaration> expected;
} cases[] = {
{"boolean b;", LeftoverError("boolean b;")},
{"integer i;", LeftoverError("integer i;")},
{"int j;", Declaration{TypeKeyword::Int, "j"}},
{"boolean;", LeftoverError("boolean;")},
};
for (auto [input, expected] : cases) {
auto actual = parse(input, Parser::declaration >> x3::eoi);
bool ok = expected == actual;
std::cout << std::left << std::setw(6) << (ok ? "OK" : "FAIL")
<< std::setw(12) << std::quoted(input) << " -> "
<< std::setw(20) << actual;
if (not ok)
std::cout << " (expected " << expected << ")";
std::cout << "\n";
}
}
¹ see Boost spirit skipper issues

Boost Spirit X3: Collapsing one-element lists

Say I have a (simplified) recursive grammar like this:
OrExpr := AndExpr % "or"
AndExpr := Term % "and"
Term := ParenExpr | String
ParenExpr := '(' >> OrExpr >> ')'
String := lexeme['"' >> *(char_ - '"') >> '"']
So this works, but the problem is that it will wrap everything in multiple layers of expression. For example, the string "hello" and ("world" or "planet" or "globe") would parse as OrExpr(AndExpr("hello", OrExpr(AndExpr("world"), AndExpr("planet"), AndExpr("globe")))) (playing fast and loose with the syntax, but hopefully you understand). What I'd like is for the one-element nodes to be collapsed into their parent, so it would end up as AndExpr("hello", OrExpr("world", "parent", "globe"))
This can be solved with actions and using a state machine that only constructs the outer object if there's more than one child inside it. But I'm wondering if there's a way to fix this problem without using parser actions?
EDIT: Almost minimal example
Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace burningmime::setmatch::ast
{
// an expression node (either an AND or an OR)
struct Expr;
// child of an expression -- either another expression, or a terminal
struct Node : x3::variant<std::string, x3::forward_ast<Expr>>
{
using base_type::base_type;
using base_type::operator=;
};
// tags for expression type
enum OPER
{
OPER_AND = 1,
OPER_OR = 2
};
// see above
struct Expr
{
OPER op;
std::vector<Node> children;
};
// for debugging purposes; this will print all the expressions
struct AstPrinter
{
void operator()(const Expr& node) const
{
std::cout << (node.op == OPER_AND ? "And(" : "Or(");
bool first = true;
for(const auto& child : node.children)
{
if(!first) std::cout << ", ";
first = false;
boost::apply_visitor(*this, child);
}
std::cout << ")";
}
void operator()(const std::string& node) const
{
std::cout << node;
}
};
}
// these need to be at top-level scope
// basically this adds compile-time type information, so the parser knows where to put various attributes
BOOST_FUSION_ADAPT_STRUCT(burningmime::setmatch::ast::Expr, op, children)
#define DECLARE_RULE(NAME, TYPE) static const x3::rule<class NAME, TYPE> NAME = #NAME;
#define KEYWORD(X) static const auto kw_##X = x3::no_case[#X];
#define DEFINE_RULE(NAME, GRAMMAR) \
static const auto NAME##_def = GRAMMAR; \
BOOST_SPIRIT_DEFINE(NAME)
namespace burningmime::setmatch::parser
{
// we need to pre-declare the rules so they can be used recursively
DECLARE_RULE(Phrase, std::string)
DECLARE_RULE(Term, ast::Node)
DECLARE_RULE(AndExpr, ast::Expr)
DECLARE_RULE(OrExpr, ast::Expr)
DECLARE_RULE(ParenExpr, ast::Expr)
// keywords
KEYWORD(and)
KEYWORD(or)
static const auto lparen = x3::lit('(');
static const auto rparen = x3::lit(')');
// helper parsers
static const auto keywords = kw_and | kw_or | lparen | rparen;
static const auto word = x3::lexeme[+(x3::char_ - x3::ascii::space - lparen - rparen)];
static const auto bareWord = word - keywords;
static const auto quotedString = x3::lexeme[x3::char_('"') >> *(x3::char_ - '"') >> x3::char_('"')];
DEFINE_RULE(Phrase, quotedString | bareWord)
DEFINE_RULE(Term, ParenExpr | Phrase)
DEFINE_RULE(ParenExpr, lparen >> OrExpr >> rparen)
DEFINE_RULE(AndExpr, x3::attr(ast::OPER_AND) >> (Term % kw_and))
DEFINE_RULE(OrExpr, x3::attr(ast::OPER_OR) >> (AndExpr % kw_or))
}
namespace burningmime::setmatch
{
void parseRuleFluent(const char* buf)
{
ast::Expr root;
auto start = buf, end = start + strlen(buf);
bool success = x3::phrase_parse(start, end, parser::OrExpr, x3::ascii::space, root);
if(!success || start != end)
throw std::runtime_error(std::string("Could not parse rule: ") + buf);
printf("Result of parsing: %s\n=========================\n", start);
ast::Node root2(root);
boost::apply_visitor(ast::AstPrinter(), root2);
}
}
int main()
{
burningmime::setmatch::parseRuleFluent(R"#("hello" and ("world" or "planet" or "globe"))#");
}

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace burningmime::setmatch::ast
{
// an expression node (either an AND or an OR)
struct Expr;
// child of an expression -- either another expression, or a terminal
struct Node : x3::variant<std::string, x3::forward_ast<Expr>>
{
using base_type::base_type;
using base_type::operator=;
};
// tags for expression type
enum OPER
{
OPER_AND = 1,
OPER_OR = 2
};
// see above
struct Expr
{
OPER op;
std::vector<Node> children;
};
// for debugging purposes; this will print all the expressions
struct AstPrinter
{
void operator()(const Expr& node) const
{
std::cout << (node.op == OPER_AND ? "And(" : "Or(");
bool first = true;
for(const auto& child : node.children)
{
if(!first) std::cout << ", ";
first = false;
boost::apply_visitor(*this, child);
}
std::cout << ")";
}
void operator()(const std::string& node) const
{
std::cout << node;
}
};
}
// these need to be at top-level scope
// basically this adds compile-time type information, so the parser knows where to put various attributes
BOOST_FUSION_ADAPT_STRUCT(burningmime::setmatch::ast::Expr, op, children)
#define DECLARE_RULE(NAME, TYPE) static const x3::rule<class NAME##_r, TYPE> NAME = #NAME;
#define KEYWORD(X) static const auto kw_##X = x3::no_case[#X];
#define DEFINE_RULE(NAME, GRAMMAR) \
static const auto NAME##_def = GRAMMAR; \
BOOST_SPIRIT_DEFINE(NAME)
namespace burningmime::setmatch::parser
{
// we need to pre-declare the rules so they can be used recursively
DECLARE_RULE(Phrase, std::string)
DECLARE_RULE(Term, ast::Node)
DECLARE_RULE(AndExpr, ast::Node)
DECLARE_RULE(OrExpr, ast::Node)
DECLARE_RULE(ParenExpr, ast::Node)
// keywords
KEYWORD(and)
KEYWORD(or)
static const auto lparen = x3::lit('(');
static const auto rparen = x3::lit(')');
// helper parsers
static const auto keywords = kw_and | kw_or | lparen | rparen;
static const auto word = x3::lexeme[+(x3::char_ - x3::ascii::space - lparen - rparen)];
static const auto bareWord = word - keywords;
static const auto quotedString = x3::lexeme[x3::char_('"') >> *(x3::char_ - '"') >> x3::char_('"')];
DEFINE_RULE(Phrase, quotedString | bareWord)
DEFINE_RULE(Term, ParenExpr | Phrase)
DEFINE_RULE(ParenExpr, lparen >> OrExpr >> rparen)
template <ast::OPER Op>
struct make_node
{
template <typename Context >
void operator()(Context const& ctx) const
{
if (_attr(ctx).size() == 1)
_val(ctx) = std::move(_attr(ctx)[0]);
else
_val(ctx) = ast::Expr{ Op, std::move(_attr(ctx)) };
}
};
DEFINE_RULE(AndExpr, (Term % kw_and)[make_node<ast::OPER_AND>{}])
DEFINE_RULE(OrExpr, (AndExpr % kw_or)[make_node<ast::OPER_OR>{}])
}
namespace burningmime::setmatch
{
void parseRuleFluent(const char* buf)
{
ast::Node root;
auto start = buf, end = start + strlen(buf);
bool success = x3::phrase_parse(start, end, parser::OrExpr, x3::ascii::space, root);
if (!success || start != end)
throw std::runtime_error(std::string("Could not parse rule: ") + buf);
printf("Result of parsing: %s\n=========================\n", start);
boost::apply_visitor(ast::AstPrinter(), root);
}
}
int main()
{
burningmime::setmatch::parseRuleFluent(R"#("hello" and ("world" or "planet" or "globe"))#");
}
https://wandbox.org/permlink/kMSHOHG0pgwGr0zv
Output:
Result of parsing:
=========================
And("hello", Or("world", "planet", "globe"))

parsing chemical formula with mixtures of elements

I would like to use boost::spirit in order to extract the stoichiometry of compounds made of several elements from a brute formula. Within a given compound, my parser should be able to distinguish three kind of chemical element patterns:
natural element made of a mixture of isotopes in natural abundance
pure isotope
mixture of isotopes in non-natural abundance
Those patterns are then used to parse such following compounds:
"C" --> natural carbon made of C[12] and C[13] in natural abundance
"CH4" --> methane made of natural carbon and hydrogen
"C2H{H[1](0.8)H[2](0.2)}6" --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
"U[235]" --> pure uranium 235
Obviously, the chemical element patterns can be in any order (e.g. CH[1]4 and H[1]4C ...) and frequencies.
I wrote my parser which is quite close to do the job but I still face one problem.
Here is my code:
template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>>
{
ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
{
namespace phx = boost::phoenix;
// Semantic action for handling the case of pure isotope
phx::function<PureIsotopeBuilder> const build_pure_isotope = PureIsotopeBuilder();
// Semantic action for handling the case of pure isotope mixture
phx::function<IsotopesMixtureBuilder> const build_isotopes_mixture = IsotopesMixtureBuilder();
// Semantic action for handling the case of natural element
phx::function<NaturalElementBuilder> const build_natural_element = NaturalElementBuilder();
phx::function<UpdateElement> const update_element = UpdateElement();
// XML database that store all the isotopes of the periodical table
ChemicalDatabaseManager<Isotope>* imgr=ChemicalDatabaseManager<Isotope>::Instance();
const auto& isotopeDatabase=imgr->getDatabase();
// Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
for (const auto& isotope : isotopeDatabase) {
_isotopeNames.add(isotope.second.getName(),isotope.second.getName());
_elementSymbols.add(isotope.second.getProperty<std::string>("symbol"),isotope.second.getProperty<std::string>("symbol"));
}
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> qi::double_ >> ")") >> "}";
_isotopesMixtureToken = (_elementSymbols[qi::_a=qi::_1] >> _mixtureToken[qi::_b=qi::_1])[qi::_pass=build_isotopes_mixture(qi::_val,qi::_a,qi::_b)];
_pureIsotopeToken = (_isotopeNames[qi::_a=qi::_1])[qi::_pass=build_pure_isotope(qi::_val,qi::_a)];
_naturalElementToken = (_elementSymbols[qi::_a=qi::_1])[qi::_pass=build_natural_element(qi::_val,qi::_a)];
_start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[qi::_a=qi::_1] >>
(qi::double_|qi::attr(1.0))[qi::_b=qi::_1])[qi::_pass=update_element(qi::_val,qi::_a,qi::_b)] );
}
//! Defines the rule for matching a prefix
qi::symbols<char,std::string> _isotopeNames;
qi::symbols<char,std::string> _elementSymbols;
qi::rule<Iterator,isotopesMixture()> _mixtureToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<std::string,isotopesMixture>> _isotopesMixtureToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _pureIsotopeToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _naturalElementToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>> _start;
};
Basically each separate element pattern can be parsed properly with their respective semantic action which produces as ouput a map between the isotopes that builds the compound and their corresponding stoichiometry. The problem starts when parsing the following compound:
CH{H[1](0.9)H[2](0.4)}
In such case the semantic action build_isotopes_mixture return false because 0.9+0.4 is non sense for a sum of ratio. Hence I would have expected and wanted my parser to fail for this compound. However, because of the _start rule which uses alternative operator for the three kind of chemical element pattern, the parser manages to parse it by 1) throwing away the {H[1](0.9)H[2](0.4)} part 2) keeping the preceding H 3) parsing it using the _naturalElementToken. Is my grammar not clear enough for being expressed as a parser ? How to use the alternative operator in such a way that, when an occurrence has been found but gave a false when running the semantic action, the parser stops ?

How to use the alternative operator in such a way that, when an occurrence has been found but gave a false when running the semantic action, the parser stops ?
In general, you achieve this by adding an expectation point to prevent backtracking.
In this case you are actually "conflating" several tasks:
matching input
interpreting matched input
validating matched input
Spirit excels at matching input, has
great facilities when it comes to interpreting (mostly in the sense of AST creation). However, things get "nasty" with validating on the fly.
An advice I often repeat is to consider separating the concerns whenever possible. I'd consider
building a direct AST representation of the input first,
transforming/normalizing/expanding/canonicalizing to a more convenient or meaningful domain representation
do final validations on the result
This gives you the most expressive code while keeping it highly maintainable.
Because I don't understand the problem domain well enough and the code sample is not nearly complete enough to induce it, I will not try to give a full sample of what I have in mind. Instead I'll try my best at sketching the expectation point approach I mentioned at the outset.
Mock Up Sample To Compile
This took the most time. (Consider doing the leg work for the people who are going to help you)
Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <map>
namespace qi = boost::spirit::qi;
struct DummyBuilder {
using result_type = bool;
template <typename... Ts>
bool operator()(Ts&&...) const { return true; }
};
struct PureIsotopeBuilder : DummyBuilder { };
struct IsotopesMixtureBuilder : DummyBuilder { };
struct NaturalElementBuilder : DummyBuilder { };
struct UpdateElement : DummyBuilder { };
struct Isotope {
std::string getName() const { return _name; }
Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }
template <typename T> std::string getProperty(std::string const& name) const {
if (name == "symbol")
return _symbol;
throw std::domain_error("no such property (" + name + ")");
}
private:
std::string _name, _symbol;
};
using MixComponent = std::pair<Isotope, double>;
using isotopesMixture = std::list<MixComponent>;
template <typename Isotope>
struct ChemicalDatabaseManager {
static ChemicalDatabaseManager* Instance() {
static ChemicalDatabaseManager s_instance;
return &s_instance;
}
auto& getDatabase() { return _db; }
private:
std::map<int, Isotope> _db {
{ 1, { "H[1]", "H" } },
{ 2, { "H[2]", "H" } },
{ 3, { "Carbon", "C" } },
{ 4, { "U[235]", "U" } },
};
};
template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> >
{
ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
{
using namespace qi;
namespace phx = boost::phoenix;
phx::function<PureIsotopeBuilder> build_pure_isotope; // Semantic action for handling the case of pure isotope
phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
phx::function<NaturalElementBuilder> build_natural_element; // Semantic action for handling the case of natural element
phx::function<UpdateElement> update_element;
// XML database that store all the isotopes of the periodical table
ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
const auto& isotopeDatabase=imgr->getDatabase();
// Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
for (const auto& isotope : isotopeDatabase) {
_isotopeNames.add(isotope.second.getName(),isotope.second.getName());
_elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"),isotope.second.template getProperty<std::string>("symbol"));
}
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
_isotopesMixtureToken = (_elementSymbols[_a=_1] >> _mixtureToken[_b=_1])[_pass=build_isotopes_mixture(_val,_a,_b)];
_pureIsotopeToken = (_isotopeNames[_a=_1])[_pass=build_pure_isotope(_val,_a)];
_naturalElementToken = (_elementSymbols[_a=_1])[_pass=build_natural_element(_val,_a)];
_start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[_a=_1] >>
(double_|attr(1.0))[_b=_1]) [_pass=update_element(_val,_a,_b)] );
}
private:
//! Defines the rule for matching a prefix
qi::symbols<char, std::string> _isotopeNames;
qi::symbols<char, std::string> _elementSymbols;
qi::rule<Iterator, isotopesMixture()> _mixtureToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<std::string, isotopesMixture> > _isotopesMixtureToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _pureIsotopeToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _naturalElementToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> > _start;
};
int main() {
using It = std::string::const_iterator;
ChemicalFormulaParser<It> parser;
for (std::string const input : {
"C", // --> natural carbon made of C[12] and C[13] in natural abundance
"CH4", // --> methane made of natural carbon and hydrogen
"C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
"C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
"U[235]", // --> pure uranium 235
})
{
std::cout << " ============= '" << input << "' ===========\n";
It f = input.begin(), l = input.end();
isotopesMixture mixture;
bool ok = qi::parse(f, l, parser, mixture);
if (ok)
std::cout << "Parsed successfully\n";
else
std::cout << "Parse failure\n";
if (f != l)
std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
}
}
Which, as given, just prints
============= 'C' ===========
Parsed successfully
============= 'CH4' ===========
Parsed successfully
============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
Parsed successfully
============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
Parsed successfully
============= 'U[235]' ===========
Parsed successfully
General remarks:
no need for the locals, just use the regular placeholders:
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
_isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];
_pureIsotopeToken = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
_naturalElementToken = _elementSymbols [ _pass=build_natural_element(_val, _1) ];
_start = +(
( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
(double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ]
);
// ....
qi::rule<Iterator, isotopesMixture()> _mixtureToken;
qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
qi::rule<Iterator, isotopesMixture()> _start;
you will want to handle conflicts between names/symbols (possibly just by prioritizing one or the other)
conforming compilers will require the template qualifier (unless I totally mis-guessed your datastructure, in which case I don't know what the template argument to ChemicalDatabaseManager was supposed to mean).
Hint, MSVC is not a standards-conforming compiler
Live On Coliru
Expectation Point Sketch
Assuming that the "weights" need to add up to 100% inside the _mixtureToken rule, we can either make build_isotopes_micture "not dummy" and add the validation:
struct IsotopesMixtureBuilder {
bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
using namespace boost::adaptors;
// validate weights total only
return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
}
};
However, as you note, it will thwart things by backtracking. Instead you might /assert/ that any complete mixture add up to 100%:
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
With something like
struct ValidateWeightTotal {
bool operator()(isotopesMixture const& mixture) const {
using namespace boost::adaptors;
bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
return ok;
// or perhaps just :
return ok? ok : throw InconsistentsWeights {};
}
struct InconsistentsWeights : virtual std::runtime_error {
InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
};
};
Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/numeric.hpp>
#include <map>
namespace qi = boost::spirit::qi;
struct DummyBuilder {
using result_type = bool;
template <typename... Ts>
bool operator()(Ts&&...) const { return true; }
};
struct PureIsotopeBuilder : DummyBuilder { };
struct NaturalElementBuilder : DummyBuilder { };
struct UpdateElement : DummyBuilder { };
struct Isotope {
std::string getName() const { return _name; }
Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }
template <typename T> std::string getProperty(std::string const& name) const {
if (name == "symbol")
return _symbol;
throw std::domain_error("no such property (" + name + ")");
}
private:
std::string _name, _symbol;
};
using MixComponent = std::pair<Isotope, double>;
using isotopesMixture = std::list<MixComponent>;
struct IsotopesMixtureBuilder {
bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
using namespace boost::adaptors;
// validate weights total only
return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
}
};
struct ValidateWeightTotal {
bool operator()(isotopesMixture const& mixture) const {
using namespace boost::adaptors;
bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
return ok;
// or perhaps just :
return ok? ok : throw InconsistentsWeights {};
}
struct InconsistentsWeights : virtual std::runtime_error {
InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
};
};
template <typename Isotope>
struct ChemicalDatabaseManager {
static ChemicalDatabaseManager* Instance() {
static ChemicalDatabaseManager s_instance;
return &s_instance;
}
auto& getDatabase() { return _db; }
private:
std::map<int, Isotope> _db {
{ 1, { "H[1]", "H" } },
{ 2, { "H[2]", "H" } },
{ 3, { "Carbon", "C" } },
{ 4, { "U[235]", "U" } },
};
};
template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture()>
{
ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
{
using namespace qi;
namespace phx = boost::phoenix;
phx::function<PureIsotopeBuilder> build_pure_isotope; // Semantic action for handling the case of pure isotope
phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
phx::function<NaturalElementBuilder> build_natural_element; // Semantic action for handling the case of natural element
phx::function<UpdateElement> update_element;
phx::function<ValidateWeightTotal> validate_weight_total;
// XML database that store all the isotopes of the periodical table
ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
const auto& isotopeDatabase=imgr->getDatabase();
// Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
for (const auto& isotope : isotopeDatabase) {
_isotopeNames.add(isotope.second.getName(),isotope.second.getName());
_elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"), isotope.second.template getProperty<std::string>("symbol"));
}
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
_isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];
_pureIsotopeToken = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
_naturalElementToken = _elementSymbols [ _pass=build_natural_element(_val, _1) ];
_start = +(
( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
(double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ]
);
}
private:
//! Defines the rule for matching a prefix
qi::symbols<char, std::string> _isotopeNames;
qi::symbols<char, std::string> _elementSymbols;
qi::rule<Iterator, isotopesMixture()> _mixtureToken;
qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
qi::rule<Iterator, isotopesMixture()> _start;
};
int main() {
using It = std::string::const_iterator;
ChemicalFormulaParser<It> parser;
for (std::string const input : {
"C", // --> natural carbon made of C[12] and C[13] in natural abundance
"CH4", // --> methane made of natural carbon and hydrogen
"C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
"C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
"U[235]", // --> pure uranium 235
}) try
{
std::cout << " ============= '" << input << "' ===========\n";
It f = input.begin(), l = input.end();
isotopesMixture mixture;
bool ok = qi::parse(f, l, parser, mixture);
if (ok)
std::cout << "Parsed successfully\n";
else
std::cout << "Parse failure\n";
if (f != l)
std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
} catch(std::exception const& e) {
std::cout << "Caught exception '" << e.what() << "'\n";
}
}
Prints
============= 'C' ===========
Parsed successfully
============= 'CH4' ===========
Parsed successfully
============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
Parsed successfully
============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
Caught exception 'boost::spirit::qi::expectation_failure'
============= 'U[235]' ===========
Parsed successfully

Using lexer token attributes in grammar rules with Lex and Qi from Boost.Spirit

Let's consider following code:
#include <boost/phoenix.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;
struct operation
{
enum type
{
add,
sub,
mul,
div
};
};
template<typename Lexer>
class expression_lexer
: public lex::lexer<Lexer>
{
public:
typedef lex::token_def<operation::type> operator_token_type;
typedef lex::token_def<double> value_token_type;
typedef lex::token_def<std::string> variable_token_type;
typedef lex::token_def<lex::omit> parenthesis_token_type;
typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
typedef lex::token_def<lex::omit> whitespace_token_type;
expression_lexer()
: operator_add('+'),
operator_sub('-'),
operator_mul("[x*]"),
operator_div("[:/]"),
value("\\d+(\\.\\d+)?"),
variable("%(\\w+)"),
parenthesis({
std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
}),
whitespace("[ \\t]+")
{
this->self
+= operator_add [lex::_val = operation::add]
| operator_sub [lex::_val = operation::sub]
| operator_mul [lex::_val = operation::mul]
| operator_div [lex::_val = operation::div]
| value
| variable [lex::_val = phoenix::construct<std::string>(lex::_start + 1, lex::_end)]
| whitespace [lex::_pass = lex::pass_flags::pass_ignore]
;
std::for_each(parenthesis.cbegin(), parenthesis.cend(),
[&](parenthesis_token_pair_type const& token_pair)
{
this->self += token_pair.first | token_pair.second;
}
);
}
operator_token_type operator_add;
operator_token_type operator_sub;
operator_token_type operator_mul;
operator_token_type operator_div;
value_token_type value;
variable_token_type variable;
std::vector<parenthesis_token_pair_type> parenthesis;
whitespace_token_type whitespace;
};
template<typename Iterator>
class expression_grammar
: public qi::grammar<Iterator>
{
public:
template<typename Tokens>
explicit expression_grammar(Tokens const& tokens)
: expression_grammar::base_type(start)
{
start %= expression >> qi::eoi;
expression %= sum_operand >> -(sum_operator >> expression);
sum_operator %= tokens.operator_add | tokens.operator_sub;
sum_operand %= fac_operand >> -(fac_operator >> sum_operand);
fac_operator %= tokens.operator_mul | tokens.operator_div;
if(!tokens.parenthesis.empty())
fac_operand %= parenthesised | terminal;
else
fac_operand %= terminal;
terminal %= tokens.value | tokens.variable;
if(!tokens.parenthesis.empty())
{
parenthesised %= tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
[&](typename Tokens::parenthesis_token_pair_type const& token_pair)
{
parenthesised %= parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
}
);
}
}
private:
qi::rule<Iterator> start;
qi::rule<Iterator> expression;
qi::rule<Iterator> sum_operand;
qi::rule<Iterator> sum_operator;
qi::rule<Iterator> fac_operand;
qi::rule<Iterator> fac_operator;
qi::rule<Iterator> terminal;
qi::rule<Iterator> parenthesised;
};
int main()
{
typedef lex::lexertl::token<std::string::const_iterator, boost::mpl::vector<operation::type, double, std::string>> token_type;
typedef expression_lexer<lex::lexertl::actor_lexer<token_type>> expression_lexer_type;
typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
typedef expression_grammar<expression_lexer_iterator_type> expression_grammar_type;
expression_lexer_type lexer;
expression_grammar_type grammar(lexer);
while(std::cin)
{
std::string line;
std::getline(std::cin, line);
std::string::const_iterator first = line.begin();
std::string::const_iterator const last = line.end();
bool const result = lex::tokenize_and_parse(first, last, lexer, grammar);
if(!result)
std::cout << "Parsing failed! Reminder: >" << std::string(first, last) << "<" << std::endl;
else
{
if(first != last)
std::cout << "Parsing succeeded! Reminder: >" << std::string(first, last) << "<" << std::endl;
else
std::cout << "Parsing succeeded!" << std::endl;
}
}
}
It is a simple parser for arithmetic expressions with values and variables. It is build using expression_lexer for extracting tokens, and then with expression_grammar to parse the tokens.
Use of lexer for such a small case might seem an overkill and probably is one. But that is the cost of simplified example. Also note that use of lexer allows to easily define tokens with regular expression while that allows to easily define them by external code (and user provided configuration in particular). With the example provided it would be no issue at all to read definition of tokens from an external config file and for example allow user to change variables from %name to $name.
The code seems to be working fine (checked on Visual Studio 2013 with Boost 1.61).
The expression_lexer has attributes attached to tokens. I guess they work since they compile. But I don't really know how to check.
Ultimately I would like the grammar to build me an std::vector with reversed polish notation of the expression. (Where every element would be a boost::variant over either operator::type or double or std::string.)
The problem is however that I failed to use token attributes in my expression_grammar. For example if you try to change sum_operator following way:
qi::rule<Iterator, operation::type ()> sum_operator;
you will get compilation error. I expected this to work since operation::type is the attribute for both operator_add and operator_sub and so also for their alternative. And still it doesn't compile. Judging from the error in assign_to_attribute_from_iterators it seems that parser tries to build the attribute value directly from input stream range. Which means it ignores the [lex::_val = operation::add] I specified in my lexer.
Changing that to
qi::rule<Iterator, operation::type (operation::type)> sum_operator;
didn't help either.
Also I tried changing definition to
sum_operator %= (tokens.operator_add | tokens.operator_sub) [qi::_val = qi::_1];
didn't help either.
How to work around that? I know I could use symbols from Qi. But I want to have the lexer to make it easy to configure regexes for the tokens. I could also extend the assign_to_attribute_from_iterators as described in the documentation but this kind of double the work. I guess I could also skip the attributes on lexer and just have them on grammar. But this again doesn't work well with flexibility on variable token (in my actual case there is slightly more logic there so that it is configurable also which part of the token forms actual name of the variable - while here it is fixed to just skip the first character). Anything else?
Also a side question - maybe anyone knows. Is there a way to get to capture groups of the regular expression of the token from tokens action? So that instead of having
variable [lex::_val = phoenix::construct<std::string>(lex::_start + 1, lex::_end)]
instead I would be able to make a string from the capture group and so easily handle formats like $var$.
Edited! I have improved whitespace skipping along conclusions from Whitespace skipper when using Boost.Spirit Qi and Lex. It is a simplification that does not affect questions asked here.

Ok, here's my take on the RPN 'requirement'. I heavily favor natural (automatic) attribute propagation over semantic actions (see Boost Spirit: "Semantic actions are evil"?)
I consider the other options (uglifying) optimizations. You might do them if you're happy with the overall design and don't mind making it harder to maintain :)
Live On Coliru
Beyond the sample from my comment that you've already studied, I added that RPN transformation step:
namespace RPN {
using cell = boost::variant<AST::operation, AST::value, AST::variable>;
using rpn_stack = std::vector<cell>;
struct transform : boost::static_visitor<> {
void operator()(rpn_stack& stack, AST::expression const& e) const {
boost::apply_visitor(boost::bind(*this, boost::ref(stack), ::_1), e);
}
void operator()(rpn_stack& stack, AST::bin_expr const& e) const {
(*this)(stack, e.lhs);
(*this)(stack, e.rhs);
stack.push_back(e.op);
}
void operator()(rpn_stack& stack, AST::value const& v) const { stack.push_back(v); }
void operator()(rpn_stack& stack, AST::variable const& v) const { stack.push_back(v); }
};
}
That's all! Use it like so, e.g.:
RPN::transform compiler;
RPN::rpn_stack program;
compiler(program, expr);
for (auto& instr : program) {
std::cout << instr << " ";
}
Which makes the output:
Parsing success: (3 + (8 * 9))
3 8 9 * +
Full Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/phoenix.hpp>
#include <boost/bind.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;
struct operation
{
enum type
{
add,
sub,
mul,
div
};
friend std::ostream& operator<<(std::ostream& os, type op) {
switch (op) {
case type::add: return os << "+";
case type::sub: return os << "-";
case type::mul: return os << "*";
case type::div: return os << "/";
}
return os << "<" << static_cast<int>(op) << ">";
}
};
template<typename Lexer>
class expression_lexer
: public lex::lexer<Lexer>
{
public:
//typedef lex::token_def<operation::type> operator_token_type;
typedef lex::token_def<lex::omit> operator_token_type;
typedef lex::token_def<double> value_token_type;
typedef lex::token_def<std::string> variable_token_type;
typedef lex::token_def<lex::omit> parenthesis_token_type;
typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
typedef lex::token_def<lex::omit> whitespace_token_type;
expression_lexer()
: operator_add('+'),
operator_sub('-'),
operator_mul("[x*]"),
operator_div("[:/]"),
value("\\d+(\\.\\d+)?"),
variable("%(\\w+)"),
parenthesis({
std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
}),
whitespace("[ \\t]+")
{
this->self
+= operator_add [lex::_val = operation::add]
| operator_sub [lex::_val = operation::sub]
| operator_mul [lex::_val = operation::mul]
| operator_div [lex::_val = operation::div]
| value
| variable [lex::_val = phoenix::construct<std::string>(lex::_start + 1, lex::_end)]
| whitespace [lex::_pass = lex::pass_flags::pass_ignore]
;
std::for_each(parenthesis.cbegin(), parenthesis.cend(),
[&](parenthesis_token_pair_type const& token_pair)
{
this->self += token_pair.first | token_pair.second;
}
);
}
operator_token_type operator_add;
operator_token_type operator_sub;
operator_token_type operator_mul;
operator_token_type operator_div;
value_token_type value;
variable_token_type variable;
std::vector<parenthesis_token_pair_type> parenthesis;
whitespace_token_type whitespace;
};
namespace AST {
using operation = operation::type;
using value = double;
using variable = std::string;
struct bin_expr;
using expression = boost::variant<value, variable, boost::recursive_wrapper<bin_expr> >;
struct bin_expr {
expression lhs, rhs;
operation op;
friend std::ostream& operator<<(std::ostream& os, bin_expr const& be) {
return os << "(" << be.lhs << " " << be.op << " " << be.rhs << ")";
}
};
}
BOOST_FUSION_ADAPT_STRUCT(AST::bin_expr, lhs, op, rhs)
template<typename Iterator>
class expression_grammar : public qi::grammar<Iterator, AST::expression()>
{
public:
template<typename Tokens>
explicit expression_grammar(Tokens const& tokens)
: expression_grammar::base_type(start)
{
start = expression >> qi::eoi;
bin_sum_expr = sum_operand >> sum_operator >> expression;
bin_fac_expr = fac_operand >> fac_operator >> sum_operand;
expression = bin_sum_expr | sum_operand;
sum_operand = bin_fac_expr | fac_operand;
sum_operator = tokens.operator_add >> qi::attr(AST::operation::add) | tokens.operator_sub >> qi::attr(AST::operation::sub);
fac_operator = tokens.operator_mul >> qi::attr(AST::operation::mul) | tokens.operator_div >> qi::attr(AST::operation::div);
if(tokens.parenthesis.empty()) {
fac_operand = terminal;
}
else {
fac_operand = parenthesised | terminal;
parenthesised = tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
[&](typename Tokens::parenthesis_token_pair_type const& token_pair)
{
parenthesised = parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
});
}
terminal = tokens.value | tokens.variable;
BOOST_SPIRIT_DEBUG_NODES(
(start) (expression) (bin_sum_expr) (bin_fac_expr)
(fac_operand) (terminal) (parenthesised) (sum_operand)
(sum_operator) (fac_operator)
);
}
private:
qi::rule<Iterator, AST::expression()> start;
qi::rule<Iterator, AST::expression()> expression;
qi::rule<Iterator, AST::expression()> sum_operand;
qi::rule<Iterator, AST::expression()> fac_operand;
qi::rule<Iterator, AST::expression()> terminal;
qi::rule<Iterator, AST::expression()> parenthesised;
qi::rule<Iterator, int()> sum_operator;
qi::rule<Iterator, int()> fac_operator;
// extra rules to help with AST creation
qi::rule<Iterator, AST::bin_expr()> bin_sum_expr;
qi::rule<Iterator, AST::bin_expr()> bin_fac_expr;
};
namespace RPN {
using cell = boost::variant<AST::operation, AST::value, AST::variable>;
using rpn_stack = std::vector<cell>;
struct transform : boost::static_visitor<> {
void operator()(rpn_stack& stack, AST::expression const& e) const {
boost::apply_visitor(boost::bind(*this, boost::ref(stack), ::_1), e);
}
void operator()(rpn_stack& stack, AST::bin_expr const& e) const {
(*this)(stack, e.lhs);
(*this)(stack, e.rhs);
stack.push_back(e.op);
}
void operator()(rpn_stack& stack, AST::value const& v) const { stack.push_back(v); }
void operator()(rpn_stack& stack, AST::variable const& v) const { stack.push_back(v); }
};
}
int main()
{
typedef lex::lexertl::token<std::string::const_iterator, boost::mpl::vector<operation::type, double, std::string>> token_type;
typedef expression_lexer<lex::lexertl::actor_lexer<token_type>> expression_lexer_type;
typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
typedef expression_grammar<expression_lexer_iterator_type> expression_grammar_type;
expression_lexer_type lexer;
expression_grammar_type grammar(lexer);
RPN::transform compiler;
std::string line;
while(std::getline(std::cin, line) && !line.empty())
{
std::string::const_iterator first = line.begin();
std::string::const_iterator const last = line.end();
AST::expression expr;
bool const result = lex::tokenize_and_parse(first, last, lexer, grammar, expr);
if(!result)
std::cout << "Parsing failed!\n";
else
{
std::cout << "Parsing success: " << expr << "\n";
RPN::rpn_stack program;
compiler(program, expr);
for (auto& instr : program) {
std::cout << instr << " ";
}
}
if(first != last)
std::cout << "Remainder: >" << std::string(first, last) << "<\n";
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Cleanest way to handle both quoted and unquoted strings in Spirit.X3 - c++

Related

boost x3 grammar for structs with multiple constructors

Make boost::spirit::symbol parser non greedy

Boost Spirit X3: Collapsing one-element lists

parsing chemical formula with mixtures of elements

Using lexer token attributes in grammar rules with Lex and Qi from Boost.Spirit

Categories

Resources