I have a quite simple grammar I try to implement using boost spirit x3, without success.
It does not compile, and due to all the templates and complex concepts used in the library (I know, it is rather a "header"), the compilation error message is way too long to be intelligible.
I tried to comment part of the code in order narrow down the culprit, without success as it comes down to several parts, for which I don't see any error anyway.
Edit2: the first error message is in indeed in push_front_impl.hpp highlighting that:
::REQUESTED_PUSH_FRONT_SPECIALISATION_FOR_SEQUENCE_DOES_NOT_EXIST::*
I suspect the keyword auto or maybe the p2 statement with ulong_long...but with no faith.
Need the help of you guys...spirit's elites !
Below a minimal code snippet reproducing the compilation error.
Edit: using boost 1.70 and visual studio 2019 v16.1.6
#include <string>
#include <iostream>
#include "boost/spirit/home/x3.hpp"
#include "boost/spirit/include/support_istream_iterator.hpp"
int main(void)
{
std::string input = \
"\"nodes\":{ {\"type\":\"bb\", \"id\" : 123456567, \"label\" : \"0x12023049\"}," \
"{\"type\":\"bb\", \"id\" : 123123123, \"label\" : \"0x01223234\"}," \
"{\"type\":\"ib\", \"id\" : 223092343, \"label\" : \"0x03020343\"}}";
std::istringstream iss(input);
namespace x3 = boost::spirit::x3;
using x3::char_;
using x3::ulong_long;
using x3::lit;
auto q = lit('\"'); /* q => quote */
auto p1 = q >> lit("type") >> q >> lit(':') >> q >> (lit("bb") | lit("ib")) >> q;
auto p2 = q >> lit("id") >> q >> lit(':') >> ulong_long;
auto p3 = q >> lit("label") >> q >> lit(':') >> q >> (+x3::alpha) >> q;
auto node = lit('{') >> p1 >> lit(',') >> p2 >> lit(',') >> p3 >> lit('}');
auto nodes = q >> lit("nodes") >> q >> lit(':') >> lit('{') >> node % lit(',') >> lit('}');
boost::spirit::istream_iterator f(iss >> std::noskipws), l{};
bool b = x3::phrase_parse(f, l, nodes, x3::space);
return 0;
}
It is an known MPL limitation (Issue with X3 and MS VS2017, https://github.com/boostorg/spirit/issues/515) + bug/difference of implementation for MSVC/ICC compilers (https://github.com/boostorg/mpl/issues/43).
I rewrote an offending part without using MPL (https://github.com/boostorg/spirit/pull/607), it will be released in Boost 1.74, until then you should be able to workaround with:
#define BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
#define BOOST_MPL_LIMIT_VECTOR_SIZE 50
Alternatively you could wrap different parts of your grammar into rules, what will reduce sequence parser chain.
Note that q >> lit("x") >> q >> lit(':') >> ... probably is not what you really want, it (with a skipper) will allow " x ": to be parsed. If you do not want that use simply lit("\"x\"") >> lit(':') >> ...
There's a chance that there might be a missing indirect include for your specific platform/version (if I had to guess it might be caused by using the istream iterator support header from Qi).
If that's not the issue, my attention is drawn by the where T = boost::mpl::aux::vector_tag<20> (/HT #Rup - number 20 seems suspiciously like it might be some kind of limit.
Either we can find what trips the limit and see if we can raise it, but I'll do the "unscientific" approach in the interest of helping you along with the parser.
Simplifying The Expressions
I see a lot (lot) of lit() nodes in your parser expressions that you don't need. I suspect all the quoted constructs need to be lexemes, and instead of painstakingly repeating the quote symbol all over the place, perhaps package it as follows:
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
Notes:
I improved the naming so it's more natural to read:
auto node = '{' >> type >> ',' >> id >> ',' >> label >> '}';
I changed alpha to alnum so it would actually match your sample input
Hypothesis: The expressions are structurally simplified to be more hierarchical - the sequences consist of fewer >>-ed terms - the hope is that this removes a potential mpl::vector size limit.
There's one missing piece, bb_ib that I left out because it changes when you want to actually assign parsed values to attributes. Let's do that:
Attributes
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
As you can see I opted for an enum to represent type. The most natural way to parse that would be using symbols<>
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
Now you can parse into a vector of Node:
Demo
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
namespace { // debug output
inline std::ostream& operator<<(std::ostream& os, Node::Type t) {
switch (t) {
case Node::bb: return os << "bb";
case Node::ib: return os << "ib";
}
return os << "?";
}
inline std::ostream& operator<<(std::ostream& os, Node const& n) {
return os << "Node{" << n.type << ", " << n.id << ", " << std::quoted(n.label) << "}";
}
}
// attribute propagation
BOOST_FUSION_ADAPT_STRUCT(Node, type, id, label)
int main() {
std::string input = R"("nodes": {
{
"type": "bb",
"id": 123456567,
"label": "0x12023049"
},
{
"type": "bb",
"id": 123123123,
"label": "0x01223234"
},
{
"type": "ib",
"id": 223092343,
"label": "0x03020343"
}
})";
namespace x3 = boost::spirit::x3;
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
auto node
= x3::rule<Node, Node> {"node"}
= '{' >> type >> ',' >> id >> ',' >> label >> '}';
auto nodes = q("nodes") >> ':' >> '{' >> node % ',' >> '}';
std::vector<Node> parsed;
auto f = begin(input);
auto l = end(input);
if (x3::phrase_parse(f, l, nodes, x3::space, parsed)) {
for (Node& node : parsed) {
std::cout << node << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
Prints
Node{bb, 123456567, "0x12023049"}
Node{bb, 123123123, "0x01223234"}
Node{ib, 223092343, "0x03020343"}
Related
I want to parse header columns of a text file. The column names should be allowed to be quoted and any case of letters. Currently I am using the following grammar:
#include <string>
#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper>
struct Grammar : qi::grammar<Iterator, void(), Skipper>
{
static constexpr char colsep = '|';
Grammar() : Grammar::base_type(header)
{
using namespace qi;
using ascii::char_;
#define COL(name) (no_case[name] | ('"' >> no_case[name] >> '"'))
header = (COL("columna") | COL("column_a")) >> colsep >>
(COL("columnb") | COL("column_b")) >> colsep >>
(COL("columnc") | COL("column_c")) >> eol >> eoi;
#undef COL
}
qi::rule<Iterator, void(), Skipper> header;
};
int main()
{
const std::string s{"columnA|column_B|column_c\n"};
auto begin(std::begin(s)), end(std::end(s));
Grammar<std::string::const_iterator, qi::blank_type> p;
bool ok = qi::phrase_parse(begin, end, p, qi::blank);
if (ok && begin == end)
std::cout << "Header ok" << std::endl;
else if (ok && begin != end)
std::cout << "Remaining unparsed: '" << std::string(begin, end) << "'" << std::endl;
else
std::cout << "Parse failed" << std::endl;
return 0;
}
Is this possible without the use of a macro? Further I would like to ignore any underscores at all. Can this be achieved with a custom skipper? In the end it would be ideal if one could write:
header = col("columna") >> colsep >> col("columnb") >> colsep >> column("columnc") >> eol >> eoi;
where col would be an appropriate grammar or rule.
#sehe how can I fix this grammar to support "\"Column_A\"" as well? 6 hours ago
By this time you should probably have realized that there's two different things going on here.
Separate Yo Concerns
On the one hand you have a grammar (that allows |-separated columns like columna or "Column_A").
On the other hand you have semantic analysis (the phase where you check that the parsed contents match certain criteria).
The thing that is making your life hard is trying to conflate the two. Now, don't get me wrong, there could be (very rare) circumstances where fusing those responsibilities together is absolutely required - but I feel that would always be an optimization. If you need that, Spirit is not your thing, and you're much more likely to be served with a handwritten parser.
Parsing
So let's get brain-dead simple about the grammar:
static auto headers = (quoted|bare) % '|' > (eol|eoi);
The bare and quoted rules can be pretty much the same as before:
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
As you can see this will implicitly take care of quoting and escaping as well whitespace skipping outside lexemes. When applied simply, it will result in a clean list of column names:
std::string const s = "\"columnA\"|column_B| column_c \n";
std::vector<std::string> headers;
bool ok = phrase_parse(begin(s), end(s), Grammar::headers, x3::blank, headers);
std::cout << "Parse " << (ok?"ok":"invalid") << std::endl;
if (ok) for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
Prints Live On Coliru
Parse ok
"columnA"
"column_B"
"column_c"
INTERMEZZO: Coding Style
Let's structure our code so that the separation of concerns is reflected. Our parsing code might use X3, but our validation code doesn't need to be in the same translation unit (cpp file).
Have a header defining some basic types:
#include <string>
#include <vector>
using Header = std::string;
using Headers = std::vector<Header>;
Define the operations we want to perform on them:
Headers parse_headers(std::string const& input);
bool header_match(Header const& actual, Header const& expected);
bool headers_match(Headers const& actual, Headers const& expected);
Now, main can be rewritten as just:
auto headers = parse_headers("\"columnA\"|column_B| column_c \n");
for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
bool valid = headers_match(headers, {"columna","columnb","columnc"});
std::cout << "Validation " << (valid?"passed":"failed") << "\n";
And e.g. a parse_headers.cpp could contain:
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace Grammar {
using namespace x3;
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
static auto headers = (quoted|bare) % '|' > (eol|eoi);
}
Headers parse_headers(std::string const& input) {
Headers output;
if (phrase_parse(begin(input), end(input), Grammar::headers, x3::blank, output))
return output;
return {}; // or throw, if you prefer
}
Validating
This is what is known as "semantic checks". You take the vector of strings and check them according to your logic:
#include <boost/range/adaptors.hpp>
#include <boost/algorithm/string.hpp>
bool header_match(Header const& actual, Header const& expected) {
using namespace boost::adaptors;
auto significant = [](unsigned char ch) {
return ch != '_' && std::isgraph(ch);
};
return boost::algorithm::iequals(actual | filtered(significant), expected);
}
bool headers_match(Headers const& actual, Headers const& expected) {
return boost::equal(actual, expected, header_match);
}
That's all. All the power of algorithms and modern C++ at your disposal, no need to fight with constraints due to parsing context.
Full Demo
The above, Live On Wandbox
Both parts got significantly simpler:
your parser doesn't have to deal with quirky comparison logic
your comparison logic doesn't have to deal with grammar concerns (quotes, escapes, delimiters and whitespace)
The following Spirit x3 grammar for a simple robot command language generates compiler errors in Windows Visual Studio 17. For this project, I am required to compile with the warning level to 4 (/W4) and treat warnings as errors (/WX).
Warning C4127 conditional expression is
constant SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\char\detail\cast_char.hpp 29
Error C2039 'insert': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259 Error C2039 'end': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259 Error C2039 'empty': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 254 Error C2039 'begin': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259
Clearly, something is wrong with my grammar, but the error messages are completely unhelpful. I have found that if I remove the Kleene star in the last line of the grammar (*parameter to just parameter) the errors disappear, but then I get lots of warnings like this:
Warning C4459 declaration of 'digit' hides global
declaration SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\support\numeric_utils\detail\extract_int.hpp 174
Warning C4127 conditional expression is constant SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\char\detail\cast_char.hpp 29
#include <string>
#include <iostream>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
//
// Grammar for simple command language
//
namespace scl
{
using boost::spirit::x3::char_;
using boost::spirit::x3::double_;
using boost::spirit::x3::int_;
using boost::spirit::x3::lexeme;
using boost::spirit::x3::lit;
using boost::spirit::x3::no_case;
auto valid_identifier_chars = char_ ("a-zA-Z_");
auto quoted_string = '"' >> *(lexeme [~char_ ('"')]) >> '"';
auto keyword_value_chars = char_ ("a-zA-Z0-9$_.");
auto qual = lexeme [!(no_case [lit ("no")]) >> +valid_identifier_chars] >> -('=' >> (quoted_string | int_ | double_ | +keyword_value_chars));
auto neg_qual = lexeme [no_case [lit ("no")] >> +valid_identifier_chars];
auto qualifier = lexeme ['/' >> (qual | neg_qual)];
auto verb = +valid_identifier_chars >> *qualifier;
auto parameter = +keyword_value_chars >> *qualifier;
auto command = verb >> *parameter;
}; // End namespace scl
using namespace std; // Must be after Boost stuff!
int
main ()
{
vector <string> input =
{
"show/out=\"somefile.txt\" motors/all cameras/full",
"start/speed=5 motors arm1 arm2/speed=2.5/track arm3",
"rotate camera1/notrack/axis=y/angle=45"
};
//
// Parse each of the strings in the input vector
//
for (string str : input)
{
auto b = str.begin ();
auto e = str.end ();
cout << "Parsing: " << str << endl;
x3::phrase_parse (b, e, scl::command, x3::space);
if (b != e)
{
cout << "Error, only parsed to position: " << b - str.begin () << endl;
}
} // End for
return 0;
} // End main
There is a regression since Boost 1.65 that causes problems with some rules that potentially propagate into container type attributes.
They dispatch to the wrong overload when instantiated without an actual bound attribute. When this happens there is a "mock" attribute type called unused_type. The errors you are seeing indicate that unused_type is being treated as if it were a concrete attribute type, and clearly that won't fly.
The regression was fixed in https://github.com/boostorg/spirit/commit/ee4943d5891bdae0706fb616b908e3bf528e0dfa
You can see that it's a regression by compiling with Boost 1.64:
Boost 1.64 compiles it fine GCC
and Clang
Boost 1.65 breaks it GCC and Clang again
Now, latest develop is supposed to fix it, but you can simply copy the patched file, even just the 7-line patch.
All of the above was already available when I linked the duplicate question How to make a recursive rule in boost spirit x3 in VS2017, which highlights the same regression
Review
using namespace std; // Must be after Boost stuff!
Actually, it probably needs to be nowhere unless very locally scoped, where you can see the impact of any potential name colisions.
Consider encapsulating the skipper, since it's likely logically part of your grammar spec, not something to be overridden by the caller.
This is a bug:
auto quoted_string = '"' >> *(lexeme[~char_('"')]) >> '"';
You probably meant to assert the whole literal is lexeme, not individual characters (that's... moot because whitespace would never hit the parser anyways, because of the skipper).
auto quoted_string = lexeme['"' >> *~char_('"') >> '"'];
Likewise, you might have intended +keyword_value_chars to be lexeme, because right now one=two three four would parse the "qualifier" one with a "keyword value" of onethreefour, not one three four¹
x3::space skips embedded newlines, if that's not the intent, use x3::blank
Since PEG grammars are parsed left-to-right greedy, you can order the qualifier production and do without the !(no_case["no"]) lookahead assertion. That not only removes duplication but also makes the grammar simpler and more efficient:
auto qual = lexeme[+valid_identifier_chars] >>
-('=' >> (quoted_string | int_ | double_ | +keyword_value_chars)); // TODO lexeme
auto neg_qual = lexeme[no_case["no"] >> +valid_identifier_chars];
auto qualifier = lexeme['/' >> (neg_qual | qual)];
¹ Note (Post-Scriptum) now that we notice qualifier is, itself, already a lexeme, there's no need to lexeme[] things inside (unless, of course they're reused in contexts with skippers).
However, this also gives rise to the question whether whitespace around the = operator should be accepted (currently, it is not), or whether qualifiers can be separated with whitespace (like id /a /b; currently they can).
Perhaps verb needed some lexemes[] as well (unless you really did want to parse "one two three" as a verb)
If no prefix for negative qualifiers, then maybe the identifier itself is, too? This could simplify the grammar
The ordering of int_ and double_ makes it so that most doubles are mis-parsed as int before they could ever be recognized. Consider something more explicit like x3::strict_real_policies<double>>{} | int_
If you're parsing quoted constructs, perhaps you want to recognize escapes too ('\"' and '\\' for example):
auto quoted_string = lexeme['"' >> *('\\' >> char_ | ~char_('"')) >> '"'];
If you have a need for "keyword values" consider listing known values in x3::symbols<>. This can also be used to parse directly into an enum type.
Here's a version that parses into AST types and prints it back for demonstration purposes:
Live On Coliru
#include <boost/config/warning_disable.hpp>
#include <string>
#include <vector>
#include <boost/variant.hpp>
namespace Ast {
struct Keyword : std::string { // needs to be strong-typed to distinguish from quoted values
using std::string::string;
using std::string::operator=;
};
struct Nil {};
using Value = boost::variant<Nil, std::string, int, double, Keyword>;
struct Qualifier {
enum Kind { positive, negative } kind;
std::string identifier;
Value value;
};
struct Param {
Keyword keyword;
std::vector<Qualifier> qualifiers;
};
struct Command {
std::string verb;
std::vector<Qualifier> qualifiers;
std::vector<Param> params;
};
}
#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(Ast::Qualifier, kind, identifier, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::Param, keyword, qualifiers)
BOOST_FUSION_ADAPT_STRUCT(Ast::Command, verb, qualifiers, params)
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace scl {
//
// Grammar for simple command language
//
using x3::char_;
using x3::int_;
using x3::lexeme;
using x3::no_case;
// lexeme tokens
auto keyword = x3::rule<struct _keyword, Ast::Keyword> { "keyword" }
= lexeme [ +char_("a-zA-Z0-9$_.") ];
auto identifier = lexeme [ +char_("a-zA-Z_") ];
auto quoted_string = lexeme['"' >> *('\\' >> x3::char_ | ~x3::char_('"')) >> '"'];
auto value
= quoted_string
| x3::real_parser<double, x3::strict_real_policies<double>>{}
| x3::int_
| keyword;
auto qual
= x3::attr(Ast::Qualifier::positive) >> identifier >> -('=' >> value);
auto neg_qual
= x3::attr(Ast::Qualifier::negative) >> lexeme[no_case["no"] >> identifier] >> x3::attr(Ast::Nil{}); // never a value
auto qualifier
= lexeme['/' >> (neg_qual | qual)];
auto verb
= identifier;
auto parameter = x3::rule<struct _parameter, Ast::Param> {"parameter"}
= keyword >> *qualifier;
auto command = x3::rule<struct _command, Ast::Command> {"command"}
= x3::skip(x3::space) [ verb >> *qualifier >> *parameter ];
} // End namespace scl
// For Demo, Debug: printing the Ast types back
#include <iostream>
#include <iomanip>
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
struct {
std::ostream& _os;
void operator()(std::string const& s) const { _os << std::quoted(s); }
void operator()(int i) const { _os << i; }
void operator()(double d) const { _os << d; }
void operator()(Keyword const& kwv) const { _os << kwv; }
void operator()(Nil) const { }
} vis{os};
boost::apply_visitor(vis, v);
return os;
}
static inline std::ostream& operator<<(std::ostream& os, Qualifier const& q) {
os << "/" << (q.kind==Qualifier::negative?"no":"") << q.identifier;
if (q.value.which())
os << "=" << q.value;
return os;
}
static inline std::ostream& operator<<(std::ostream& os, std::vector<Qualifier> const& qualifiers) {
for (auto& qualifier : qualifiers)
os << qualifier;
return os;
}
static inline std::ostream& operator<<(std::ostream& os, Param const& p) {
return os << p.keyword << p.qualifiers;
}
static inline std::ostream& operator<<(std::ostream& os, Command const& cmd) {
os << cmd.verb << cmd.qualifiers;
for (auto& param : cmd.params) os << " " << param;
return os;
}
}
int main() {
for (std::string const str : {
"show/out=\"somefile.txt\" motors/all cameras/full",
"start/speed=5 motors arm1 arm2/speed=2.5/track arm3",
"rotate camera1/notrack/axis=y/angle=45",
})
{
auto b = str.begin(), e = str.end();
Ast::Command cmd;
bool ok = parse(b, e, scl::command, cmd);
std::cout << (ok?"OK":"FAIL") << '\t' << std::quoted(str) << '\n';
if (ok) {
std::cout << " -- Full AST: " << cmd << "\n";
std::cout << " -- Verb+Qualifiers: " << cmd.verb << cmd.qualifiers << "\n";
for (auto& param : cmd.params)
std::cout << " -- Param+Qualifiers: " << param << "\n";
}
if (b != e) {
std::cout << " -- Remaining unparsed: " << std::quoted(std::string(b,e)) << "\n";
}
}
}
Prints
OK "show/out=\"somefile.txt\" motors/all cameras/full"
-- Full AST: show/out="somefile.txt" motors/all cameras/full
-- Verb+Qualifiers: show/out="somefile.txt"
-- Param+Qualifiers: motors/all
-- Param+Qualifiers: cameras/full
OK "start/speed=5 motors arm1 arm2/speed=2.5/track arm3"
-- Full AST: start/speed=5 motors arm1 arm2/speed=2.5/track arm3
-- Verb+Qualifiers: start/speed=5
-- Param+Qualifiers: motors
-- Param+Qualifiers: arm1
-- Param+Qualifiers: arm2/speed=2.5/track
-- Param+Qualifiers: arm3
OK "rotate camera1/notrack/axis=y/angle=45"
-- Full AST: rotate camera1/notrack/axis=y/angle=45
-- Verb+Qualifiers: rotate
-- Param+Qualifiers: camera1/notrack/axis=y/angle=45
For completeness
Demo also Live On MSVC (Rextester) - note that RexTester uses Boost 1.60
Coliru uses Boost 1.66 but the problem doesn't manifest itself because now, there are concrete attribute values bound to parsers
Think about a preprocessor which will read the raw text (no significant white space or tokens).
There are 3 rules.
resolve_para_entry should solve the Argument inside a call. The top-level text is returned as string.
resolve_para should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve is the entry
On the way I track the iterator and get the text portion
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
Rules:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
Indeed this should be a lot simpler.
First off, I fail to see why the absense of a skipper is at all relevant.
Second, exposing the raw input is best done using qi::raw[] instead of dancing with iter_pos and clumsy semantic actions¹.
Among the other observations I see:
negating a charset is done with ~, so e.g. ~char_(",()")
(p|eps) would be better spelled -p
(lit('(') >> lit(')')) could be just "()" (after all, there's no skipper, right)
p >> *(',' >> p) is equivalent to p % ','
With the above, resolve_para simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
resolve_para_entry seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
Here's my take on it:
Define An AST
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
Creating The Grammar Rules
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
And their definitions:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
Notes:
No more semantic actions
No more eps
No more iter_pos
I've opted to make arglist not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
But in our sample it will generate a lot of noisy output.
Of course your entry point (start) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos that you were already using): seek[]
The hold is there for this reason: boost::spirit::qi duplicate parsing on the output - You might not need it in your actual parser.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
More Ideas
That last output shows you a problem with your current grammar: lit(',') should obviously not be seen as a call with two parameters.
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
Boost spirit parse rule is not applied
or this one boost spirit reporting semantic error
BONUS
Bonus version that uses string_view and also shows exact line/column information of all extracted words.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view from an iterator range.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
¹ Boost Spirit: "Semantic actions are evil"?
I'm basing my app off this example and getting the exact same results. For some reason, the contents of the input string are all parsed into the fusion struct 'comments', and nothing is parsed into the fusion struct 'numbers'. So not sure where I'm going wrong here.
namespace client {
namespace ast {
struct number {
int num1;
int num2;
};
struct comment {
std::string text;
bool dummy;
};
struct input {
std::vector<comment> comments;
std::vector<number> numbers;
};
}
}
BOOST_FUSION_ADAPT_STRUCT(client::ast::comment, text, dummy)
BOOST_FUSION_ADAPT_STRUCT(client::ast::number, num1, num2)
BOOST_FUSION_ADAPT_STRUCT(client::ast::input, comments, numbers)
namespace client {
namespace parser {
namespace x3 = boost::spirit::x3;
using namespace x3;
x3::attr_gen dummy;
typedef std::string::const_iterator It;
using namespace x3;
auto const comment = *(char_ - eol) >> dummy(false);
auto const number = int_ >> int_;
auto lines = [](auto p) { return *(p >> eol); };
auto const input =
lines(comment) >>
lines(number);
}
}
int main()
{
namespace x3 = boost::spirit::x3;
std::string const iss("any char string here\n1 2\n");
auto iter = iss.begin(), eof = iss.end();
client::ast::input types;
bool ok = parse(iter, eof, client::parser::input, types);
if (iter != eof) {
std::cout << "Remaining unparsed: '" << std::string(iter, eof) << "'\n";
}
std::cout << "Parsed: " << (100.0 * std::distance(iss.begin(), iter) / iss.size()) << "%\n";
std::cout << "ok = " << ok << std::endl;
// This range loop prints all contents if input.
for (auto& item : types.comments) { std::cout << "comment: " << boost::fusion::as_deque(item) << "\n"; }
// This loop prints nothing.
for (auto& item : types.numbers) { std::cout << "number: " << boost::fusion::as_deque(item) << "\n"; }
}
My larger application does the same with a large input file and several more AST's, yet it would seem all my examples are consumed by the comment parser.
Here's the complete running example.
http://coliru.stacked-crooked.com/a/f983b26d673305a0
Thoughts?
You took the grammar idea from my answer here: X3, how to populate a more complex AST?
There it worked because the line formats are not ambiguous. In fact the "variant" approach you had required special attention, and I noted that in this bullet:
departments need to be ordered before teams, or you get "team" matched instead of departments
The same kind of ambiguity exists in your grammar. *(char_ - eol) matches "1 2" just fine, so obviously it is added as a comment. You will have to disambiguate the grammar or somehow force the switch to "parse number lines now" mode.
If you wholly don't care what precedes the number lines, just use x3::seek [ lines(number) ].
Can you help me understand the difference between the a % b parser and its expanded a >> *(b >> a) form in Boost.Spirit? Even though the reference manual states that they are equivalent,
The list operator, a % b, is a binary operator that matches a list of one or more repetitions of a separated by occurrences of b. This is equivalent to a >> *(b >> a).
the following program produces different results depending on which is used:
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
struct Record {
int id;
std::vector<int> values;
};
BOOST_FUSION_ADAPT_STRUCT(Record,
(int, id)
(std::vector<int>, values)
)
int main() {
namespace qi = boost::spirit::qi;
const auto str = std::string{"1: 2, 3, 4"};
const auto rule1 = qi::int_ >> ':' >> (qi::int_ % ',') >> qi::eoi;
const auto rule2 = qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_)) >> qi::eoi;
Record record1;
if (qi::phrase_parse(str.begin(), str.end(), rule1, qi::space, record1)) {
std::cout << record1.id << ": ";
for (const auto& value : record1.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
Record record2;
if (qi::phrase_parse(str.begin(), str.end(), rule2, qi::space, record2)) {
std::cout << record2.id << ": ";
for (const auto& value : record2.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
Live on Coliru
1: 2, 3, 4,
1: 2,
rule1 and rule2 are different only in that rule1 uses the list operator ((qi::int_ % ',')) and rule2 uses its expanded form ((qi::int_ >> *(',' >> qi::int_))). However, rule1 produced 1: 2, 3, 4, (as expected) and rule2 produced 1: 2,. I cannot understand the result of rule2: 1) why is it different from that of rule1 and 2) why were 3 and 4 not included in record2.values even though phrase_parse returned true somehow?
Update X3 version added
First off, you fallen into a deep trap here:
Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).
So, first off:
const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',') >> qi::eoi);
Now, when you delete the redundancy in the program, you get:
Reproducing the problem
Live On Coliru
int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
}
Printing
1: 2, 3, 4,
1: 2,
The cause and the fix
What happened to 3, 4 which was successfully parsed?
Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.
If you want to make container attributes parse as "an atomic group", use qi::as<>:
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};
BOOST_FUSION_ADAPT_STRUCT(Record, id, values)
namespace qi = boost::spirit::qi;
template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";
Record record;
if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
}
Prints
1: 2, 3, 4,
1: 2,
1: 2, 3, 4,
Because it's time to get people started with X3 (the new version of Spirit), and because I like to challenge msyelf to do the corresponding tasks in Spirit X3, here is the Spirit X3 version.
There's no problem with auto in X3.
The "broken" case also behaves much better, triggering this static assertion:
// If you got an error here, then you are trying to pass
// a fusion sequence with the wrong number of elements
// as that expected by the (sequence) parser.
static_assert(
fusion::result_of::size<Attribute>::value == (l_size + r_size)
, "Attribute does not have the expected size."
);
That's nice, right?
The workaround seems a bit less readable:
test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));
But it would be trivial to write your own as<> "directive" (or just a function), if you wanted:
namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};
template <typename T> static const as_type<T> as = {};
}
DEMO
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/home/x3.hpp>
struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};
namespace x3 = boost::spirit::x3;
template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";
Record record;
auto attr = std::tie(record.id, record.values);
if (x3::phrase_parse(str.begin(), str.end(), rule >> x3::eoi, x3::space, attr)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};
template <typename T> static const as_type<T> as = {};
}
int main() {
using namespace x3;
test(int_ >> ':' >> (int_ % ','));
//test(int_ >> ':' >> (int_ >> *(',' >> int_))); // COMPILER asserts "Attribute does not have the expected size."
// "clumsy" x3 style workaround
test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));
// using an ad-hoc `as<>` implementation:
test(int_ >> ':' >> as<Record::values_t>[int_ >> *(',' >> int_)]);
}
Prints
1: 2, 3, 4,
1: 2, 3, 4,
1: 2, 3, 4,