boost x3 grammar for structs with multiple constructors - c++

Trying to figure out how to parse structs that have multiple constructors or overloaded constructors. For example in this case, a range struct that contains either a range or a singleton case where the start/end of the range is equal.
case 1: look like
"start-stop"
case 2:
"start"
For the range case
auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
works
but
auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);
unsurprisingly, won't match the signature and fails to compile.
Not sure what the fix is?
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange
{
size_t start;
size_t end;
// little bit weird because should be end+1, but w/e
explicit MyRange(size_t start, size_t end = 0) : start(start), end(end == 0 ? start : end)
{
}
};
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
// BOOST_FUSION_ADAPT_STRUCT(MyRange, start)
//
int main()
{
auto range_constraint = x3::rule<struct test_struct, MyRange>{} = (x3::int_ >> x3::lit("-") >> x3::int_);
// auto range_constraint = x3::rule<struct test_struct, MyRange>{} = x3::int_ | (x3::int_ >> x3::lit("-") >> x3::int_);
for (std::string input :
{"1-2", "1","1-" ,"garbage"})
{
auto success = x3::phrase_parse(input.begin(), input.end(),
// Begin grammar
range_constraint,
// End grammar
x3::ascii::space);
std::cout << "`" << input << "`"
<< "-> " << success<<std::endl;
}
return 0;
}

It's important to realize that sequence adaptation by definition uses default construction with subsequent sequence element assignment.
Another issue is branch ordering in PEG grammars. int_ will always success where int_ >> '‑' >> int_ would so you would never match the range version.
Finally, to parse size_t usually prefer uint_/uint_parser<size_t> :)
Things That Don't Work
There are several ways to skin this cat. For one, there's BOOST_FUSION_ADAPT_STRUCT_NAMED, which would allow you to do
BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, Range, start, end)
BOOST_FUSION_ADAPT_STRUCT_NAMED(MyRange, SingletonRange, start)
So one pretty elaborate would seem to spell it out:
auto range = x3::rule<struct _, Range>{} = uint_ >> '-' >> uint_;
auto singleton = x3::rule<struct _, SingletonRange>{} = uint_;
auto rule = x3::rule<struct _, MyRange>{} = range | singleton;
TIL that this doesn't even compile, apparently Qi was differently: Live On Coliru
X3 requires the attribute to be default-constructible whereas Qi would attempt to bind to the passed-in attribute reference first.
Even in the Qi version you can see that the fact Fusion sequences will be default-contructed-then-memberwise-assigned leads to results you didn't expect or want:
`1-2` -> true
-- [1,NIL)
`1` -> true
-- [1,NIL)
`1-` -> true
-- [1,NIL)
`garbage` -> false
What Works
Instead of doing the complicated things, do the simple thing. Anytime you see an optional value you can usually provide a default value. Alternatively you can not use Sequence adaptation at all, and go straight to semantic actions.
Semantic Actions
The simplest way would be to have specific branches:
auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx))); };
auto rule = x3::rule<void, MyRange>{} =
(uint_ >> '-' >> uint_)[assign2] | uint_[assign1];
Slighty more advanced, but more efficient:
auto assign1 = [](auto& ctx) { _val(ctx) = MyRange(_attr(ctx)); };
auto assign2 = [](auto& ctx) { _val(ctx) = MyRange(_val(ctx).start, _attr(ctx)); };
auto rule = x3::rule<void, MyRange>{} = uint_[assign1] >> -('-' >> uint_[assign2]);
Lastly, we can move towards defaulting the optional end:
auto rule = x3::rule<void, MyRange>{} =
(uint_ >> ('-' >> uint_ | x3::attr(MyRange::unspecified))) //
[assign];
Now the semantic action will have to deal with the variant end type:
auto assign = [](auto& ctx) {
auto start = at_c<0>(_attr(ctx));
_val(ctx) = apply_visitor( //
[=](auto end) { return MyRange(start, end); }, //
at_c<1>(_attr(ctx)));
};
Also Live On Coliru
Simplify?
I'd consider modeling the range explicitly as having an optional end:
struct MyRange {
MyRange() = default;
MyRange(size_t s, boost::optional<size_t> e = {}) : start(s), end(e) {
assert(!e || *e >= s);
}
size_t size() const { return end? *end - start : 1; }
bool empty() const { return size() == 0; }
size_t start = 0;
boost::optional<size_t> end = 0;
};
Now you can directly use the optional to construct:
auto assign = [](auto& ctx) {
_val(ctx) = MyRange(at_c<0>(_attr(ctx)), at_c<1>(_attr(ctx)));
};
auto rule = x3::rule<void, MyRange>{} = (uint_ >> -('-' >> uint_))[assign];
Actually, here we can go back to using adapted sequences, although with different semantics:
Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyRange {
size_t start = 0;
boost::optional<size_t> end = 0;
};
static inline std::ostream& operator<<(std::ostream& os, MyRange const& mr) {
if (mr.end)
return os << "[" << mr.start << "," << *mr.end << ")";
else
return os << "[" << mr.start << ",)";
}
BOOST_FUSION_ADAPT_STRUCT(MyRange, start, end)
int main() {
x3::uint_parser<size_t> uint_;
auto rule = x3::rule<void, MyRange>{} = uint_ >> -('-' >> uint_);
for (std::string const input : {"1-2", "1", "1-", "garbage"}) {
MyRange into;
auto success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
<< std::endl;
if (success) {
std::cout << " -- " << into << "\n";
}
}
}
Summarizing
I hope these strategies give you all the things you needed. Pay close attention to the semantics of your range. Specifically, I never payed any attention to difference between "1" and "1-". You might want one to be [1,2) and the other to be [1,inf), both to be equivalent, or the second one might even be considered invalid?
Stepping back even further, I'd suggest that maybe you just needed
using Bound = std::optional<size_t>;
using MyRange = std::pair<Bound, Bound>;
Which you could parse directly with:
auto boundary = -x3::uint_parser<size_t>{};
auto rule = x3::rule<void, MyRange>{} = boundary >> '-' >> boundary;
It would allow for more inputs:
for (std::string const input : {"-2", "1-2", "1", "1-", "garbage"}) {
MyRange into;
auto success = phrase_parse(input.begin(), input.end(), rule, x3::space, into);
std::cout << quoted(input, '`') << " -> " << std::boolalpha << success
<< std::endl;
if (success) {
std::cout << " -- " << into << "\n";
}
}
Prints: Live On Coliru
`-2` -> true
-- [,2)
`1-2` -> true
-- [1,2)
`1` -> false
`1-` -> true
-- [1,)
`garbage` -> false

Related

Cleanest way to handle both quoted and unquoted strings in Spirit.X3

Buon giorno,
I have to parse something such as:
foo: 123
"bar": 456
The quotes should be removed if they are here. I tried:
((+x3::alnum) | ('"' >> (+x3::alnum) >> '"'))
But the parser actions for this are of type variant<string, string> ; is there a way to make it so that the parser understands that those two are equivalent, and for my action to only get a single std::string as argument in its call?
edit: minimal repro (live on godbolt: https://gcc.godbolt.org/z/GcE8Pj4r5) :
#include <boost/spirit/home/x3.hpp>
using namespace boost::spirit;
// action handlers
struct handlers {
void create_member(const std::string& str) { }
};
// rules
static const x3::rule<struct id_obj_mem> obj_mem = "obj_mem";
#define EVENT(e) ([](auto& ctx) { x3::get<handlers>(ctx).e(x3::_attr(ctx)); })
static const auto obj_mem_def = ((
((+x3::alnum) | ('"' >> (+x3::alnum) >> '"'))
>> ':' >> x3::lit("123"))[EVENT(create_member)] % ',');
BOOST_SPIRIT_DEFINE(obj_mem)
// execution
int main()
{
handlers r;
std::string str = "foo: 123";
auto first = str.begin();
auto last = str.end();
bool res = phrase_parse(
first,
last,
boost::spirit::x3::with<handlers>(r)[obj_mem_def],
boost::spirit::x3::ascii::space);
}
I too consider this a kind of defect. X3 is definitely less "friendly" in terms of the synthesized attribute types. I guess it's just a tacit side-effect of being more core-language oriented, where attribute assignment is effectively done via default "visitor" actions.
Although I understand the value of keeping the magic to a minimum, and staying close to "pure C++", I vastly prefer the Qi way of synthesizing attributes here. I believe it has proven a hard problem to fix, as this problem has been coming/going in some iterations of X3.
I've long decided to basically fix it myself with variations of this idiom:
template <typename T> struct as_type {
auto operator()(auto p) const { return x3::rule<struct Tag, T>{} = p; }
};
static constexpr as_type<std::string> as_string{};
Now I'd write that as:
auto quoted = '"' >> +x3::alnum >> '"';
auto name = as_string(+x3::alnum | quoted);
auto prop = (name >> ':' >> "123")[EVENT(create_member)] % ',';
That will compile no problem:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct handlers {
void create_member(std::string const& str) {
std::cerr << __FUNCTION__ << " " << std::quoted(str) << "\n";
}
};
namespace Parser {
#define EVENT(e) ([](auto& ctx) { get<handlers>(ctx).e(_attr(ctx)); })
template <typename T> struct as_type {
auto operator()(auto p) const { return x3::rule<struct Tag, T>{} = p; }
};
static constexpr as_type<std::string> as_string{};
auto quoted = '"' >> +x3::alnum >> '"';
auto name = as_string(+x3::alnum | quoted);
auto prop = (name >> ':' >> "123")[EVENT(create_member)] % ',';
auto grammar = x3::skip(x3::space)[prop];
} // namespace Parser
int main() {
handlers r;
std::string const str = "foo: 123";
auto first = str.begin(), last = str.end();
bool res = parse(first, last, x3::with<handlers>(r)[Parser::grammar]);
return res ? 1 : 0;
}
Prints
create_member "foo"
Interesting Links
Spirit X3, How to get attribute type to match rule type?
Combining rules at runtime and returning rules
spirit x3 cannot propagate attributes of type optional<vector>
etc.

Spirit.X3: passing local data to a parser

The examples in the Boost.Spirit documentation seem to fall in two cases:
1/ Define a parser in a function: semantic actions can access local variables and data as they are local lambdas. Like push_back here: https://www.boost.org/doc/libs/master/libs/spirit/doc/x3/html/spirit_x3/tutorials/number_list___stuffing_numbers_into_a_std__vector.html
2/ Define a parser in a namespace, like here: https://www.boost.org/doc/libs/1_69_0/libs/spirit/doc/x3/html/spirit_x3/tutorials/minimal.html
which seems to be necessary to be able to invoke BOOST_SPIRIT_DEFINE.
My question is: how to combine both (properly, without globals) ? My dream API would be to pass some argument to phrase_parse and then do some x3::_arg(ctx) but I couldn't find anything like this.
Here is for instance my parser: for now the actions are writing to std::cerr. What if I wanted to write to a custom std::ostream& instead, that would be passed to the parse function?
using namespace boost::spirit;
using namespace boost::spirit::x3;
rule<struct id_action> action = "action";
rule<struct id_array> array = "array";
rule<struct id_empty_array> empty_array = "empty_array";
rule<struct id_atom> atom = "atom";
rule<struct id_sequence> sequence = "sequence";
rule<struct id_root> root = "root";
auto access_index_array = [] (const auto& ctx) { std::cerr << "access_array: " << x3::_attr(ctx) << "\n" ;};
auto access_empty_array = [] (const auto& ctx) { std::cerr << "access_empty_array\n" ;};
auto access_named_member = [] (const auto& ctx) { std::cerr << "access_named_member: " << x3::_attr(ctx) << "\n" ;};
auto start_action = [] (const auto& ctx) { std::cerr << "start action\n" ;};
auto finish_action = [] (const auto& ctx) { std::cerr << "finish action\n" ;};
auto create_array = [] (const auto& ctx) { std::cerr << "create_array\n" ;};
const auto action_def = +(lit('.')[start_action]
>> -((+alnum)[access_named_member])
>> *(('[' >> x3::int_ >> ']')[access_index_array] | lit("[]")[access_empty_array]));
const auto sequence_def = (action[finish_action] % '|');
const auto array_def = ('[' >> sequence >> ']')[create_array];
const auto root_def = array | action;
BOOST_SPIRIT_DEFINE(action)
BOOST_SPIRIT_DEFINE(array)
BOOST_SPIRIT_DEFINE(sequence)
BOOST_SPIRIT_DEFINE(root)
bool parse(std::string_view str)
{
using ascii::space;
auto first = str.begin();
auto last = str.end();
bool r = phrase_parse(
first, last,
parser::array_def | parser::sequence_def,
ascii::space
);
if (first != last)
return false;
return r;
}
About the approaches:
1/ Yes, this is viable for small, contained parsers. Typically only used in a single TU, and exposed via non-generic interface.
2/ This is the approach for (much) larger grammars, that you might wish to spread across TUs, and/or are instantiated across several TU's generically.
Note that you do NOT need BOOST_SPIRIT_DEFINE unless you
have recursive rules
want to split declaration from definition. [This becomes pretty complicated, and I recommend against using that for X3.]
The Question
My question is: how to combine both (properly, without globals) ?
You can't combine something with namespace level declarations, if one of the requiremenents is "without globals".
My dream API would be to pass some argument to phrase_parse and then do some x3::_arg(ctx) but I couldn't find anything like this.
I don't know what you think x3::_arg(ctx) would do, in that particular dream :)
Here is for instance my parser: for now the actions are writing to std::cerr. What if I wanted to write to a custom std::ostream& instead, that would be passed to the parse function?
Now that's a concrete question. I'd say: use the context.
You could make it so that you can use x3::get<ostream>(ctx) returns the stream:
struct ostream{};
auto access_index_array = [] (const auto& ctx) { x3::get<ostream>(ctx) << "access_array: " << x3::_attr(ctx) << "\n" ;};
auto access_empty_array = [] (const auto& ctx) { x3::get<ostream>(ctx) << "access_empty_array\n" ;};
auto access_named_member = [] (const auto& ctx) { x3::get<ostream>(ctx) << "access_named_member: " << x3::_attr(ctx) << "\n" ;};
auto start_action = [] (const auto& ctx) { x3::get<ostream>(ctx) << "start action\n" ;};
auto finish_action = [] (const auto& ctx) { x3::get<ostream>(ctx) << "finish action\n" ;};
auto create_array = [] (const auto& ctx) { x3::get<ostream>(ctx) << "create_array\n";};
Now you need to put the tagged param in the context during parsing:
bool r = phrase_parse(
f, l,
x3::with<parser::ostream>(std::cerr)[parser::array_def | parser::sequence_def],
x3::space);
Live Demo: http://coliru.stacked-crooked.com/a/a26c8eb0af6370b9
Prints
start action
access_named_member: a
finish action
start action
access_named_member: b
start action
start action
access_array: 2
start action
access_named_member: foo
start action
access_empty_array
finish action
start action
access_named_member: c
finish action
create_array
true
Intermixed with the standard X3 debug output:
<sequence>
<try>.a|.b..[2].foo.[]|.c</try>
<action>
<try>.a|.b..[2].foo.[]|.c</try>
<success>|.b..[2].foo.[]|.c]</success>
</action>
<action>
<try>.b..[2].foo.[]|.c]</try>
<success>|.c]</success>
</action>
<action>
<try>.c]</try>
<success>]</success>
</action>
<success>]</success>
</sequence>
But Wait #1 - Event Handlers
It looks like you're parsing something similar to JSON Pointer or jq syntax. In the case that you wanted to provide a callback-interface (SAX-events), why not bind the callback interface instead of the actions:
struct handlers {
using N = x3::unused_type;
virtual void index(int) {}
virtual void index(N) {}
virtual void property(std::string) {}
virtual void start(N) {}
virtual void finish(N) {}
virtual void create_array(N) {}
};
#define EVENT(e) ([](auto& ctx) { x3::get<handlers>(ctx).e(x3::_attr(ctx)); })
const auto action_def =
+(x3::lit('.')[EVENT(start)] >> -((+x3::alnum)[EVENT(property)]) >>
*(('[' >> x3::int_ >> ']')[EVENT(index)] | x3::lit("[]")[EVENT(index)]));
const auto sequence_def = action[EVENT(finish)] % '|';
const auto array_def = ('[' >> sequence >> ']')[EVENT(create_array)];
const auto root_def = array | action;
Now you can implement all handlers neatly in one interface:
struct default_handlers : parser::handlers {
std::ostream& os;
default_handlers(std::ostream& os) : os(os) {}
void index(int i) override { os << "access_array: " << i << "\n"; };
void index(N) override { os << "access_empty_array\n" ; };
void property(std::string n) override { os << "access_named_member: " << n << "\n" ; };
void start(N) override { os << "start action\n" ; };
void finish(N) override { os << "finish action\n" ; };
void create_array(N) override { os << "create_array\n"; };
};
auto f = str.begin(), l = str.end();
bool r = phrase_parse(f, l,
x3::with<parser::handlers>(default_handlers{std::cout}) //
[parser::array_def | parser::sequence_def],
x3::space);
See it Live On Coliru once again:
start action
access_named_member: a
finish action
start action
access_named_member: b
start action
start action
access_array: 2
start action
access_named_member: foo
start action
access_empty_array
finish action
start action
access_named_member: c
finish action
create_array
true
But Wait #2 - No Actions
The natural way to expose attributes would be to build an AST. See also Boost Spirit: "Semantic actions are evil"?
Without further ado:
namespace AST {
using Id = std::string;
using Index = int;
struct Member {
std::optional<Id> name;
};
struct Indexer {
std::optional<int> index;
};
struct Action {
Member member;
std::vector<Indexer> indexers;
};
using Actions = std::vector<Action>;
using Sequence = std::vector<Actions>;
struct ArrayCtor {
Sequence actions;
};
using Root = boost::variant<ArrayCtor, Actions>;
}
Of course, I'm making some assumptions. The rules can be much simplified:
namespace parser {
template <typename> struct Tag {};
#define AS(T, p) (x3::rule<Tag<AST::T>, AST::T>{#T} = p)
auto id = AS(Id, +x3::alnum);
auto member = AS(Member, x3::lit('.') >> -id);
auto indexer = AS(Indexer,'[' >> -x3::int_ >> ']');
auto action = AS(Action, member >> *indexer);
auto actions = AS(Actions, +action);
auto sequence = AS(Sequence, actions % '|');
auto array = AS(ArrayCtor, '[' >> -sequence >> ']'); // covers empty array
auto root = AS(Root, array | actions);
} // namespace parser
And the parsing function returns the AST:
AST::Root parse(std::string_view str) {
auto f = str.begin(), l = str.end();
AST::Root parsed;
phrase_parse(f, l, x3::expect[parser::root >> x3::eoi], x3::space, parsed);
return parsed;
}
(Note that it now throws x3::expection_failure if the input is invalid or not completely parsed)
int main() {
std::cout << parse("[.a|.b..[2].foo.[]|.c]");
}
Now prints:
[.a|.b./*none*/./*none*/[2].foo./*none*/[/*none*/]|.c]
See it Live On Coliru
//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <ostream>
#include <optional>
namespace x3 = boost::spirit::x3;
namespace AST {
using Id = std::string;
using Index = int;
struct Member {
std::optional<Id> name;
};
struct Indexer {
std::optional<int> index;
};
struct Action {
Member member;
std::vector<Indexer> indexers;
};
using Actions = std::vector<Action>;
using Sequence = std::vector<Actions>;
struct ArrayCtor {
Sequence actions;
};
using Root = boost::variant<ArrayCtor, Actions>;
}
BOOST_FUSION_ADAPT_STRUCT(AST::Member, name)
BOOST_FUSION_ADAPT_STRUCT(AST::Indexer, index)
BOOST_FUSION_ADAPT_STRUCT(AST::Action, member, indexers)
BOOST_FUSION_ADAPT_STRUCT(AST::ArrayCtor, actions)
namespace parser {
template <typename> struct Tag {};
#define AS(T, p) (x3::rule<Tag<AST::T>, AST::T>{#T} = p)
auto id = AS(Id, +x3::alnum);
auto member = AS(Member, x3::lit('.') >> -id);
auto indexer = AS(Indexer,'[' >> -x3::int_ >> ']');
auto action = AS(Action, member >> *indexer);
auto actions = AS(Actions, +action);
auto sequence = AS(Sequence, actions % '|');
auto array = AS(ArrayCtor, '[' >> -sequence >> ']'); // covers empty array
auto root = AS(Root, array | actions);
} // namespace parser
AST::Root parse(std::string_view str) {
auto f = str.begin(), l = str.end();
AST::Root parsed;
phrase_parse(f, l, x3::expect[parser::root >> x3::eoi], x3::space, parsed);
return parsed;
}
// for debug output
#include <iostream>
#include <iomanip>
namespace AST {
static std::ostream& operator<<(std::ostream& os, Member const& m) {
return os << "." << m.name.value_or("/*none*/");
}
static std::ostream& operator<<(std::ostream& os, Indexer const& i) {
if (i.index)
return os << "[" << *i.index << "]";
else
return os << "[/*none*/]";
}
static std::ostream& operator<<(std::ostream& os, Action const& a) {
os << a.member;
for (auto& i : a.indexers)
os << i;
return os;
}
static std::ostream& operator<<(std::ostream& os, Actions const& aa) {
for (auto& a : aa)
os << a;
return os;
}
static std::ostream& operator<<(std::ostream& os, Sequence const& s) {
bool first = true;
for (auto& a : s)
os << (std::exchange(first, false) ? "" : "|") << a;
return os;
}
static std::ostream& operator<<(std::ostream& os, ArrayCtor const& ac) {
return os << "[" << ac.actions << "]";
}
}
int main() {
std::cout << parse("[.a|.b..[2].foo.[]|.c]");
}

Parsing characters into an std::map<char,int> using boost::qi

I am trying to parse a sequence of characters separated by a "," into an std::map<char,int> of pairs where the key is the character and the value just the a count of parsed characters.
For example, if the input is
a,b,c
The map should contain the pairs:
(a,1) , (b,2) , (c,3)
Here's the code I am using :
namespace myparser
{
std::map<int, std::string> mapping;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
int i = 0;
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, std::map<char,int>& v)
{
using qi::double_;
using qi::char_;
using qi::phrase_parse;
using qi::_1;
using ascii::space;
using phoenix::push_back;
bool r = phrase_parse(first, last,
// Begin grammar
(
char_[v.insert(std::make_pair(_1,0)]
>> *(',' >> char_[v.insert(std::make_pair(_1,0)])
)
,
// End grammar
space);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
//]
}
Then I try to print the pair in main like this:
int main() {
std::string str;
while (getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q')
break;
std::map<char,int> v;
std::map<std::string, int>::iterator it = v.begin();
if (myparser::parse_numbers(str.begin(), str.end(), v))
{
std::cout << "-------------------------\n";
std::cout << "Parsing succeeded\n";
std::cout << str << " Parses OK: " << std::endl;
while (it != v.end())
{
// Accessing KEY from element pointed by it.
std::string word = it->first;
// Accessing VALUE from element pointed by it.
int count = it->second;
std::cout << word << " :: " << count << std::endl;
// Increment the Iterator to point to next entry
it++;
}
std::cout << "\n-------------------------\n";
}
else
{
std::cout << "-------------------------\n";
std::cout << "Parsing failed\n";
std::cout << "-------------------------\n";
}
}
return 0;
}
I am a beginner and I don't know how to fix this code . I also want to use strings instead of characters so I enter a sequence of strings separated by a "," and store them in a map similar to the one mentioned above. I would appreciate any help !
You cannot use Phoenix place holders outside Phoenix deferred actors. E.g. the type of std::make_pair(qi::_1, 0) is std::pair<boost::phoenix::actor<boost::phoenix::argument<0>>, int>.
Nothing interoperates with such a thing. Certainly not std::map<>::insert.
What you'd need to do is wrap all the operations in semantic actions as Phoenix actors.
#include <boost/phoenix.hpp>
namespace px = boost::phoenix;
Then you can:
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
namespace myparser {
using Map = std::map<char, int>;
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, Map& m) {
auto action = px::insert(px::ref(m), px::end(px::ref(m)),
px::construct<std::pair<char, int>>(qi::_1, 0));
bool r = qi::phrase_parse( //
first, last,
// Begin grammar
qi::char_[action] >> *(',' >> qi::char_[action]),
// End grammar
qi::space);
return r && first == last;
}
} // namespace myparser
See it Live
Easy peasy. Right.
I spent half an hour on that thing debugging why it wouldn't work. Why is this so hard?
It's because someone invented a whole meta-DSL to write "normal C++" but with defferred execution. Back when that happened it was pretty neat, but it is the mother of all leaky abstractions, with razor sharp edges.
So, what's new? Using C++11 you could:
Live
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, Map& m) {
struct action_f {
Map& m_;
void operator()(char ch) const { m_.emplace(ch, 0); }
};
px::function<action_f> action{{m}};
bool r = qi::phrase_parse( //
first, last,
// Begin grammar
qi::char_[action(qi::_1)] >> *(',' >> qi::char_[action(qi::_1)]),
// End grammar
qi::space);
return r && first == last;
}
Or using c++17:
Live
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, Map& m) {
px::function action{[&m](char ch) { m.emplace(ch, 0); }};
bool r = qi::phrase_parse( //
first, last,
// Begin grammar
qi::char_[action(qi::_1)] >> *(',' >> qi::char_[action(qi::_1)]),
// End grammar
qi::space);
return r && first == last;
}
On a tangent, you probably wanted to count things, so, maybe use
Live
px::function action{[&m](char ch) { m[ch] += 1; }};
By this time, you could switch to Spirit X3 (which requires C++14):
Live
#include <boost/spirit/home/x3.hpp>
#include <map>
namespace x3 = boost::spirit::x3;
namespace myparser {
using Map = std::map<char, int>;
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, Map& m) {
auto action = [&m](auto& ctx) { m[_attr(ctx)] += 1; };
return x3::phrase_parse( //
first, last,
// Begin grammar
x3::char_[action] >> *(',' >> x3::char_[action]) >> x3::eoi,
// End grammar
x3::space);
}
} // namespace myparser
Now finally, let's simplify. p >> *(',' >> p) is just a clumsy way of saying p % ',':
Live
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, Map& m) {
auto action = [&m](auto& ctx) { m[_attr(ctx)] += 1; };
return x3::phrase_parse( //
first, last, //
x3::char_[action] % ',', //
x3::space);
}
And you wanted words, not characters:
Live
#include <boost/spirit/home/x3.hpp>
#include <map>
namespace x3 = boost::spirit::x3;
namespace myparser {
using Map = std::map<std::string, int>;
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, Map& m) {
auto action = [&m](auto& ctx) { m[_attr(ctx)] += 1; };
auto word_ = (*~x3::char_(','))[action];
return phrase_parse(first, last, word_ % ',', x3::space);
}
} // namespace myparser
#include <iomanip>
#include <iostream>
int main() {
for (std::string const str : {"foo,c++ is strange,bar,qux,foo,c++ is strange ,cuz"}) {
std::map<std::string, int> m;
std::cout << "Parsing " << std::quoted(str) << std::endl;
if (myparser::parse_numbers(str.begin(), str.end(), m)) {
std::cout << m.size() << " words:\n";
for (auto& [word,count]: m)
std::cout << " - " << std::quoted(word) << " :: " << count << std::endl;
} else {
std::cerr << "Parsing failed\n";
}
}
}
Prints
Parsing "foo,c++ is strange,bar,qux,foo,c++ is strange ,cuz"
5 words:
- "bar" :: 1
- "c++isstrange" :: 2
- "cuz" :: 1
- "foo" :: 2
- "qux" :: 1
Note the behaviour of the x3::space (like qi::space and qi::ascii::space above).

Make boost::spirit::symbol parser non greedy

I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over. I use x3::symbols to get automatically get the parsed keyword represented as an enum value.
Minimal example:
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>
namespace x3 = boost::spirit::x3;
enum class TypeKeyword { Int, Float, Bool };
struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
TypeKeywordSymbolTable()
{
add("float", TypeKeyword::Float)
("int", TypeKeyword::Int)
("bool", TypeKeyword::Bool);
}
};
const TypeKeywordSymbolTable type_keyword_symbol_table;
struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;
const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);
using Iterator = std::string_view::const_iterator;
/* Thrown when the parser has failed to parse the whole input stream. Contains
* the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
public:
LeftoverError(Iterator begin, Iterator end)
: std::runtime_error(std::string(begin, end))
{}
std::string_view get_leftover_data() const noexcept { return what(); }
};
template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
Iterator begin = input.begin();
Iterator end = input.end();
using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
typename Rule::attribute_type result;
try {
bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
if (r && begin == end) {
return result;
} else { // Occurs when the whole input stream has not been consumed.
throw LeftoverError(begin, end);
}
} catch (const ExpectationFailure& exc) {
throw LeftoverError(exc.where(), end);
}
}
int main()
{
// TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
// "boolean" leftover is desired.
parse("boolean", type_keyword);
// TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
// "integer" leftover is desired.
parse("integer", type_keyword);
// TypeKeyword::Int is parsed successfully and this is the desired behavior.
parse("int", type_keyword);
}
Basicly, I want integer not to be recognized as a keyword with additional eger left to parse.
I morphed the test cases into self-describing expectations:
Live On Compiler Explorer
Prints:
FAIL "boolean" -> TypeKeyword::Bool (expected Leftover:"boolean")
FAIL "integer" -> TypeKeyword::Int (expected Leftover:"integer")
OK "int" -> TypeKeyword::Int
Now, the simplest, naive approach would be to make sure you parse till the eoi, by simply changing
auto actual = parse(input, Parser::type_keyword);
To
auto actual = parse(input, Parser::type_keyword >> x3::eoi);
And indeed the tests pass: Live
OK "boolean" -> Leftover:"boolean"
OK "integer" -> Leftover:"integer"
OK "int" -> TypeKeyword::Int
However, this fits the tests, but not the goal. Let's imagine a more involved grammar, where type identifier; is to be parsed:
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z_0-9")];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_;
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
I'll leave the details for Compiler Explorer:
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
Looks good. But what if we add some interesting tests:
{"flo at f;", LeftoverError("flo at f;")},
{"boolean;", LeftoverError("boolean;")},
It prints (Live)
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
FAIL "boolean;" -> (TypeKeyword::Bool ean) (expected Leftover:"boolean;")
So, the test cases were lacking. Your prose description is actually closer:
I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over
That correctly implies you want to check the lexeme inside the type_keyword rule. A naive try might be checking that no identifier character follows the type keyword:
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_ >> !identchar;
Where identchar was factored out of identifier like so:
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
However, this doesn't work. Can you see why (peeking allowed: https://godbolt.org/z/jb4zfhfWb)?
Our latest devious test case now passes (yay), but int j; is now rejected! If you think about it, it only makes sense, because you have spaced skipped.
The essential word I used a moment ago was lexeme: you want to treat some units as lexemes (and whitespace stops the lexeme. Or rather, whitespace isn't automatically skipped inside the lexeme¹). So, a fix would be:
auto type_keyword
// ...
= x3::lexeme[type_ >> !identchar];
(Note how I sneakily already included that on the identifier rule earlier)
Lo and behold (Live):
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
OK "boolean;" -> Leftover:"boolean;"
Summarizing
This topic is a frequently recurring one, and it requires a solid understanding of skippers, lexemes first and foremost. Here are some other posts for inspiration:
Stop X3 symbols from matching substrings
parsing identifiers except keywords
Boost Spirit x3: parse delimited string Where I introduce a more general helper you might find useful:
auto kw = [](auto p) {
return x3::lexeme [ x3::as_parser(p) >> !x3::char_("a-zA-Z0-9_") ];
};
Stop X3 symbols from matching substrings
Dynamically switching symbol tables in x3
Good luck!
Complete Listing
Anti-Bitrot, the final listing:
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/as_vector.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace Ast {
enum class TypeKeyword { Int, Float, Bool };
static std::ostream& operator<<(std::ostream& os, TypeKeyword tk) {
switch (tk) {
case TypeKeyword::Int: return os << "TypeKeyword::Int";
case TypeKeyword::Float: return os << "TypeKeyword::Float";
case TypeKeyword::Bool: return os << "TypeKeyword::Bool";
};
return os << "?";
}
using Identifier = std::string;
struct Declaration {
TypeKeyword type;
Identifier id;
bool operator==(Declaration const&) const = default;
};
} // namespace Ast
BOOST_FUSION_ADAPT_STRUCT(Ast::Declaration, type, id)
namespace Ast{
static std::ostream& operator<<(std::ostream& os, Ast::Declaration const& d) {
return os << boost::lexical_cast<std::string>(boost::fusion::as_vector(d));
}
} // namespace Ast
namespace Parser {
struct Type : x3::symbols<Ast::TypeKeyword> {
Type() {
add("float", Ast::TypeKeyword::Float) //
("int", Ast::TypeKeyword::Int) //
("bool", Ast::TypeKeyword::Bool); //
}
} const static type_;
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= x3::lexeme[type_ >> !identchar];
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
} // namespace Parser
struct LeftoverError : std::runtime_error {
using std::runtime_error::runtime_error;
friend std::ostream& operator<<(std::ostream& os, LeftoverError const& e) {
return os << (std::string("Leftover:\"") + e.what() + "\"");
}
bool operator==(LeftoverError const& other) const {
return std::string_view(what()) == other.what();
}
};
template<typename T>
using Maybe = boost::variant<T, LeftoverError>;
template <typename Rule,
typename Attr = typename x3::traits::attribute_of<Rule, x3::unused_type>::type,
typename R = Maybe<Attr>>
R parse(std::string_view input, Rule const& rule) {
Attr result;
auto f = input.begin(), l = input.end();
return x3::phrase_parse(f, l, rule, x3::space, result)
? R(std::move(result))
: LeftoverError({f, l});
}
int main()
{
using namespace Ast;
struct {
std::string_view input;
Maybe<Declaration> expected;
} cases[] = {
{"boolean b;", LeftoverError("boolean b;")},
{"integer i;", LeftoverError("integer i;")},
{"int j;", Declaration{TypeKeyword::Int, "j"}},
{"boolean;", LeftoverError("boolean;")},
};
for (auto [input, expected] : cases) {
auto actual = parse(input, Parser::declaration >> x3::eoi);
bool ok = expected == actual;
std::cout << std::left << std::setw(6) << (ok ? "OK" : "FAIL")
<< std::setw(12) << std::quoted(input) << " -> "
<< std::setw(20) << actual;
if (not ok)
std::cout << " (expected " << expected << ")";
std::cout << "\n";
}
}
¹ see Boost spirit skipper issues

Boost.Spirit X3 compile time explodes with recursive rule

The following program takes 10s to compile. When I change the parenProcess rule below to '(' >> process >> ')' the compiler spends CPU but does not seem to finish. (I tried making a smaller reproducible program -- by removing rules between the process and parenProcess, but then the compile time no longer exploded).
How do I fix the compile (time) when embedding process instead?
(Minor other question: is there a nicer way to make rule 'x' and 'xActual'?)
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <iostream>
#include <string>
#include <vector>
namespace wccs_parser {
namespace x3 = boost::spirit::x3;
namespace ascii = x3::ascii;
using x3::long_;
using x3::ulong_;
using x3::lexeme;
//--- Ast structures
struct AstChannel {
std::string label;
bool complement;
};
struct AstAction {
AstChannel channel;
uint32_t weight;
};
struct AstRenaming {
std::string from;
std::string to;
};
struct AstNullProcess;
struct AstActionPrefixProcess;
struct AstChoiceProcess;
struct AstCompositionProcess;
struct AstRestrictionProcess;
struct AstRenamingProcess;
struct AstConstantProcess;
using AstAnyProcess = x3::variant<
x3::forward_ast<AstNullProcess>,
x3::forward_ast<AstActionPrefixProcess>,
x3::forward_ast<AstChoiceProcess>,
x3::forward_ast<AstCompositionProcess>,
x3::forward_ast<AstRestrictionProcess>,
x3::forward_ast<AstRenamingProcess>,
x3::forward_ast<AstConstantProcess>
>;
struct AstNullProcess {};
struct AstActionPrefixProcess {
AstAction action;
AstAnyProcess subProcess;
};
struct AstChoiceProcess {
std::vector<AstAnyProcess> subProcesses;
};
struct AstCompositionProcess {
std::vector<AstAnyProcess> subProcesses;
};
struct AstRestrictionProcess {
AstAnyProcess subProcess;
std::vector<std::string> labels;
};
struct AstRenamingProcess {
AstAnyProcess subProcess;
std::vector<AstRenaming> renamings;
};
struct AstConstantProcess {
std::string processName;
};
} // End namespace
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstChannel, label, complement)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstAction, channel, weight)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstRenaming, from, to)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstActionPrefixProcess, action, subProcess)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstChoiceProcess, subProcesses)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstCompositionProcess, subProcesses)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstRestrictionProcess, subProcess, labels)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstRenamingProcess, subProcess, renamings)
BOOST_FUSION_ADAPT_STRUCT(wccs_parser::AstConstantProcess, processName)
namespace wccs_parser {
//--- Rules
auto const constantName = x3::rule<struct constantRule, std::string> {"constantName"} =
x3::lexeme[ascii::upper >> *(ascii::alnum)];
auto const label = x3::rule<struct labelRule, std::string> {"label"} =
x3::lexeme[ascii::lower >> *(ascii::alnum)];
auto const channel = x3::rule<struct channelRule, AstChannel> {"channel"} =
label >> x3::matches['!'];
auto const action = x3::rule<struct actionRule, AstAction> {"action"} =
'<' >> channel >> ',' >> ulong_ >> '>';
auto renamingPair = x3::rule<struct renamingPairRule, AstRenaming> {"renamingPair"} =
label > "=>" > label;
x3::rule<struct processRule, AstAnyProcess> process{"process"};
auto const nullProcess = x3::rule<struct nullProcessRule, AstNullProcess> {"nullProcess"} = '0' >> x3::attr(AstNullProcess());
auto const constant = x3::rule<struct constantRule, AstConstantProcess> {"constant"} = constantName;
/// HERE:
auto const parenProcess = '(' > nullProcess > ')';
auto const primitive = x3::rule<struct primitiveRule, AstAnyProcess> {"primitive"} =
parenProcess
| nullProcess
| constant;
auto const restrictionActual = x3::rule<struct restrictionActual, AstRestrictionProcess> {"restrictionActual"} =
primitive >> '\\' >> '{' >> label % ',' >> '}';
auto const restriction = x3::rule<struct restrictionRule, AstAnyProcess> {"restriction"} =
primitive >> !x3::lit('\\')
| restrictionActual;
auto const renamingActual = x3::rule<struct renamingActualRule, AstRenamingProcess> {"renamingActual"} =
restriction >> '[' >> renamingPair % ',' >> ']';
auto const renaming = x3::rule<struct renamingRule, AstAnyProcess> {"renaming"} =
restriction >> !x3::lit('[')
| renamingActual;
x3::rule<struct actionPrefixingRule, AstAnyProcess> actionPrefix{"actionPrefix"};
auto const actionPrefixActual = x3::rule<struct actionPrefixActualRule, AstActionPrefixProcess> {"actionPrefixActual"} =
action > ('.' > actionPrefix);
auto const actionPrefix_def =
actionPrefixActual
| renaming;
BOOST_SPIRIT_DEFINE(actionPrefix)
auto const compositionActual = x3::rule<struct choiceActualrule, AstCompositionProcess> {"compositionActual"} =
actionPrefix % '|';
auto const composition = x3::rule<struct compositionRule, AstAnyProcess> {"composition"} =
actionPrefix >> !x3::lit('|')
| compositionActual;
auto const choiceActual = x3::rule<struct choiceActualrule, AstChoiceProcess> {"choiceActual"} =
composition % '+';
auto const choice = x3::rule<struct choiceRule, AstAnyProcess> {"choice"} =
composition >> !x3::lit('+')
| choiceActual;
auto const process_def = choice;
BOOST_SPIRIT_DEFINE(process)
auto const entry = x3::skip(ascii::space) [process];
} //End namespace
int main() {
std::string str("0 + (0)");
wccs_parser::AstAnyProcess root;
auto iter = str.begin();
auto end = str.end();
bool r = parse(iter, end, wccs_parser::entry, root);
if (r) {
std::cout << str << std::endl << std::endl << " Parses OK: " << std::endl;
}
else {
std::cout << "Parsing failed\n";
}
if (iter != end) std::cout << "Partial match" << std::endl;
return 0;
}
This is a known problem. CppEvans (?) on the mailing list claims to have a workaround on a branch, but that branch is far behind and the changes very intrusive, so I can't vet it/vouch for it.
So, the right recourse would be to post on the mailing list in a bid to get the main developer(s) involved, and raise awareness of this stopping issue.
Regardless, without changing the behaviour of your code, you can use a shorthand:
template <typename T> auto rule = [](const char* name = typeid(T).name()) {
struct _{};
return x3::rule<_, T> {name};
};
template <typename T> auto as = [](auto p) { return rule<T>() = p; };
This will make it much more convenient to write the repetitive Ast coercions:
auto constantName = as<std::string>(x3::lexeme[ascii::upper >> *(ascii::alnum)]);
auto label = as<std::string>(x3::lexeme[ascii::lower >> *(ascii::alnum)]);
auto channel = as<AstChannel>(label >> x3::matches['!']);
auto action = as<AstAction>('<' >> channel >> ',' >> x3::ulong_ >> '>');
auto renamingPair = as<AstRenaming>(label > "=>" > label);
auto nullProcess = as<AstNullProcess>(x3::omit['0']);
auto constant = as<AstConstantProcess>(constantName);
auto parenProcess = '(' > nullProcess > ')';
auto primitive = rule<AstAnyProcess> ("primitive")
= parenProcess
| nullProcess
| constant;
auto restrictionActual = as<AstRestrictionProcess>(primitive >> '\\' >> '{' >> label % ',' >> '}');
auto restriction = rule<AstAnyProcess> ("restriction")
= primitive >> !x3::lit('\\')
| restrictionActual
;
auto renamingActual = as<AstRenamingProcess>(restriction >> '[' >> renamingPair % ',' >> ']');
auto renaming = rule<AstAnyProcess> ("renaming")
= restriction >> !x3::lit('[')
| renamingActual
;
auto actionPrefixActual = as<AstActionPrefixProcess>(action > ('.' > actionPrefix));
auto actionPrefix_def = actionPrefixActual | renaming;
auto compositionActual = as<AstCompositionProcess>(actionPrefix % '|');
auto composition = rule<AstAnyProcess> ("composition")
= actionPrefix >> !x3::lit('|')
| compositionActual
;
auto choiceActual = as<AstChoiceProcess>(composition % '+');
auto choice = rule<AstAnyProcess> ("choice")
= composition >> !x3::lit('+')
| choiceActual
;
auto process_def = choice;
BOOST_SPIRIT_DEFINE(actionPrefix, process)
auto const entry = x3::skip(ascii::space) [process];
Program still runs with same output.