Unhelpful compiler errors in x3 grammar - c++

The following Spirit x3 grammar for a simple robot command language generates compiler errors in Windows Visual Studio 17. For this project, I am required to compile with the warning level to 4 (/W4) and treat warnings as errors (/WX).
Warning C4127 conditional expression is
constant SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\char\detail\cast_char.hpp 29
Error C2039 'insert': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259 Error C2039 'end': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259 Error C2039 'empty': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 254 Error C2039 'begin': is not a member of
'boost::spirit::x3::unused_type' SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\core\detail\parse_into_container.hpp 259
Clearly, something is wrong with my grammar, but the error messages are completely unhelpful. I have found that if I remove the Kleene star in the last line of the grammar (*parameter to just parameter) the errors disappear, but then I get lots of warnings like this:
Warning C4459 declaration of 'digit' hides global
declaration SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\support\numeric_utils\detail\extract_int.hpp 174
Warning C4127 conditional expression is constant SpiritTest e:\data\boost\boost_1_65_1\boost\spirit\home\x3\char\detail\cast_char.hpp 29
#include <string>
#include <iostream>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
//
// Grammar for simple command language
//
namespace scl
{
using boost::spirit::x3::char_;
using boost::spirit::x3::double_;
using boost::spirit::x3::int_;
using boost::spirit::x3::lexeme;
using boost::spirit::x3::lit;
using boost::spirit::x3::no_case;
auto valid_identifier_chars = char_ ("a-zA-Z_");
auto quoted_string = '"' >> *(lexeme [~char_ ('"')]) >> '"';
auto keyword_value_chars = char_ ("a-zA-Z0-9$_.");
auto qual = lexeme [!(no_case [lit ("no")]) >> +valid_identifier_chars] >> -('=' >> (quoted_string | int_ | double_ | +keyword_value_chars));
auto neg_qual = lexeme [no_case [lit ("no")] >> +valid_identifier_chars];
auto qualifier = lexeme ['/' >> (qual | neg_qual)];
auto verb = +valid_identifier_chars >> *qualifier;
auto parameter = +keyword_value_chars >> *qualifier;
auto command = verb >> *parameter;
}; // End namespace scl
using namespace std; // Must be after Boost stuff!
int
main ()
{
vector <string> input =
{
"show/out=\"somefile.txt\" motors/all cameras/full",
"start/speed=5 motors arm1 arm2/speed=2.5/track arm3",
"rotate camera1/notrack/axis=y/angle=45"
};
//
// Parse each of the strings in the input vector
//
for (string str : input)
{
auto b = str.begin ();
auto e = str.end ();
cout << "Parsing: " << str << endl;
x3::phrase_parse (b, e, scl::command, x3::space);
if (b != e)
{
cout << "Error, only parsed to position: " << b - str.begin () << endl;
}
} // End for
return 0;
} // End main

There is a regression since Boost 1.65 that causes problems with some rules that potentially propagate into container type attributes.
They dispatch to the wrong overload when instantiated without an actual bound attribute. When this happens there is a "mock" attribute type called unused_type. The errors you are seeing indicate that unused_type is being treated as if it were a concrete attribute type, and clearly that won't fly.
The regression was fixed in https://github.com/boostorg/spirit/commit/ee4943d5891bdae0706fb616b908e3bf528e0dfa
You can see that it's a regression by compiling with Boost 1.64:
Boost 1.64 compiles it fine GCC
and Clang
Boost 1.65 breaks it GCC and Clang again
Now, latest develop is supposed to fix it, but you can simply copy the patched file, even just the 7-line patch.
All of the above was already available when I linked the duplicate question How to make a recursive rule in boost spirit x3 in VS2017, which highlights the same regression
Review
using namespace std; // Must be after Boost stuff!
Actually, it probably needs to be nowhere unless very locally scoped, where you can see the impact of any potential name colisions.
Consider encapsulating the skipper, since it's likely logically part of your grammar spec, not something to be overridden by the caller.
This is a bug:
auto quoted_string = '"' >> *(lexeme[~char_('"')]) >> '"';
You probably meant to assert the whole literal is lexeme, not individual characters (that's... moot because whitespace would never hit the parser anyways, because of the skipper).
auto quoted_string = lexeme['"' >> *~char_('"') >> '"'];
Likewise, you might have intended +keyword_value_chars to be lexeme, because right now one=two three four would parse the "qualifier" one with a "keyword value" of onethreefour, not one three four¹
x3::space skips embedded newlines, if that's not the intent, use x3::blank
Since PEG grammars are parsed left-to-right greedy, you can order the qualifier production and do without the !(no_case["no"]) lookahead assertion. That not only removes duplication but also makes the grammar simpler and more efficient:
auto qual = lexeme[+valid_identifier_chars] >>
-('=' >> (quoted_string | int_ | double_ | +keyword_value_chars)); // TODO lexeme
auto neg_qual = lexeme[no_case["no"] >> +valid_identifier_chars];
auto qualifier = lexeme['/' >> (neg_qual | qual)];
¹ Note (Post-Scriptum) now that we notice qualifier is, itself, already a lexeme, there's no need to lexeme[] things inside (unless, of course they're reused in contexts with skippers).
However, this also gives rise to the question whether whitespace around the = operator should be accepted (currently, it is not), or whether qualifiers can be separated with whitespace (like id /a /b; currently they can).
Perhaps verb needed some lexemes[] as well (unless you really did want to parse "one two three" as a verb)
If no prefix for negative qualifiers, then maybe the identifier itself is, too? This could simplify the grammar
The ordering of int_ and double_ makes it so that most doubles are mis-parsed as int before they could ever be recognized. Consider something more explicit like x3::strict_real_policies<double>>{} | int_
If you're parsing quoted constructs, perhaps you want to recognize escapes too ('\"' and '\\' for example):
auto quoted_string = lexeme['"' >> *('\\' >> char_ | ~char_('"')) >> '"'];
If you have a need for "keyword values" consider listing known values in x3::symbols<>. This can also be used to parse directly into an enum type.
Here's a version that parses into AST types and prints it back for demonstration purposes:
Live On Coliru
#include <boost/config/warning_disable.hpp>
#include <string>
#include <vector>
#include <boost/variant.hpp>
namespace Ast {
struct Keyword : std::string { // needs to be strong-typed to distinguish from quoted values
using std::string::string;
using std::string::operator=;
};
struct Nil {};
using Value = boost::variant<Nil, std::string, int, double, Keyword>;
struct Qualifier {
enum Kind { positive, negative } kind;
std::string identifier;
Value value;
};
struct Param {
Keyword keyword;
std::vector<Qualifier> qualifiers;
};
struct Command {
std::string verb;
std::vector<Qualifier> qualifiers;
std::vector<Param> params;
};
}
#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(Ast::Qualifier, kind, identifier, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::Param, keyword, qualifiers)
BOOST_FUSION_ADAPT_STRUCT(Ast::Command, verb, qualifiers, params)
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace scl {
//
// Grammar for simple command language
//
using x3::char_;
using x3::int_;
using x3::lexeme;
using x3::no_case;
// lexeme tokens
auto keyword = x3::rule<struct _keyword, Ast::Keyword> { "keyword" }
= lexeme [ +char_("a-zA-Z0-9$_.") ];
auto identifier = lexeme [ +char_("a-zA-Z_") ];
auto quoted_string = lexeme['"' >> *('\\' >> x3::char_ | ~x3::char_('"')) >> '"'];
auto value
= quoted_string
| x3::real_parser<double, x3::strict_real_policies<double>>{}
| x3::int_
| keyword;
auto qual
= x3::attr(Ast::Qualifier::positive) >> identifier >> -('=' >> value);
auto neg_qual
= x3::attr(Ast::Qualifier::negative) >> lexeme[no_case["no"] >> identifier] >> x3::attr(Ast::Nil{}); // never a value
auto qualifier
= lexeme['/' >> (neg_qual | qual)];
auto verb
= identifier;
auto parameter = x3::rule<struct _parameter, Ast::Param> {"parameter"}
= keyword >> *qualifier;
auto command = x3::rule<struct _command, Ast::Command> {"command"}
= x3::skip(x3::space) [ verb >> *qualifier >> *parameter ];
} // End namespace scl
// For Demo, Debug: printing the Ast types back
#include <iostream>
#include <iomanip>
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
struct {
std::ostream& _os;
void operator()(std::string const& s) const { _os << std::quoted(s); }
void operator()(int i) const { _os << i; }
void operator()(double d) const { _os << d; }
void operator()(Keyword const& kwv) const { _os << kwv; }
void operator()(Nil) const { }
} vis{os};
boost::apply_visitor(vis, v);
return os;
}
static inline std::ostream& operator<<(std::ostream& os, Qualifier const& q) {
os << "/" << (q.kind==Qualifier::negative?"no":"") << q.identifier;
if (q.value.which())
os << "=" << q.value;
return os;
}
static inline std::ostream& operator<<(std::ostream& os, std::vector<Qualifier> const& qualifiers) {
for (auto& qualifier : qualifiers)
os << qualifier;
return os;
}
static inline std::ostream& operator<<(std::ostream& os, Param const& p) {
return os << p.keyword << p.qualifiers;
}
static inline std::ostream& operator<<(std::ostream& os, Command const& cmd) {
os << cmd.verb << cmd.qualifiers;
for (auto& param : cmd.params) os << " " << param;
return os;
}
}
int main() {
for (std::string const str : {
"show/out=\"somefile.txt\" motors/all cameras/full",
"start/speed=5 motors arm1 arm2/speed=2.5/track arm3",
"rotate camera1/notrack/axis=y/angle=45",
})
{
auto b = str.begin(), e = str.end();
Ast::Command cmd;
bool ok = parse(b, e, scl::command, cmd);
std::cout << (ok?"OK":"FAIL") << '\t' << std::quoted(str) << '\n';
if (ok) {
std::cout << " -- Full AST: " << cmd << "\n";
std::cout << " -- Verb+Qualifiers: " << cmd.verb << cmd.qualifiers << "\n";
for (auto& param : cmd.params)
std::cout << " -- Param+Qualifiers: " << param << "\n";
}
if (b != e) {
std::cout << " -- Remaining unparsed: " << std::quoted(std::string(b,e)) << "\n";
}
}
}
Prints
OK "show/out=\"somefile.txt\" motors/all cameras/full"
-- Full AST: show/out="somefile.txt" motors/all cameras/full
-- Verb+Qualifiers: show/out="somefile.txt"
-- Param+Qualifiers: motors/all
-- Param+Qualifiers: cameras/full
OK "start/speed=5 motors arm1 arm2/speed=2.5/track arm3"
-- Full AST: start/speed=5 motors arm1 arm2/speed=2.5/track arm3
-- Verb+Qualifiers: start/speed=5
-- Param+Qualifiers: motors
-- Param+Qualifiers: arm1
-- Param+Qualifiers: arm2/speed=2.5/track
-- Param+Qualifiers: arm3
OK "rotate camera1/notrack/axis=y/angle=45"
-- Full AST: rotate camera1/notrack/axis=y/angle=45
-- Verb+Qualifiers: rotate
-- Param+Qualifiers: camera1/notrack/axis=y/angle=45
For completeness
Demo also Live On MSVC (Rextester) - note that RexTester uses Boost 1.60
Coliru uses Boost 1.66 but the problem doesn't manifest itself because now, there are concrete attribute values bound to parsers

Related

Parsing to different types of values in boost::spirit and apply casting to negative numbers

I am trying to solve an issue with positive and negative values in Boost Spirit.
The parser should use unsigned numbers (positive) 99% of the time.
The program works reading a string that defines a variables from 1 to 32 bits that should be read from another stream (for question context, not shown in the example), but there is a special case where a string "D_REF" may be a 16 bits signed number (2's complement).
The program codifies all checks as unsigned values in a std::vector, so I need to codify that positive value as unsigned, but previously I need to apply a cast to it to force it into an unsigned short value, and then store it in the unsigned int struct.
This need comes from an after request where a data stream shall be read and values extracted from it as unsigned, and there parsed comparisons apply to them.
I know this request may look weird, but it is a must for a current project, so can anyone help me with this?
Godbolt link: https://godbolt.org/z/8j615Mecx
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace engine
{
struct Check
{
std::string variable;
unsigned int number;
};
using Checks = std::vector<Check>;
}
BOOST_FUSION_ADAPT_STRUCT(engine::Check, variable, number)
namespace engine
{
namespace qi = boost::spirit::qi;
template <typename It>
class Parser : public qi::grammar<It, Checks()>
{
private:
qi::rule<It, Check(), qi::blank_type> equal1, equal2;
qi::rule<It, Checks()> start;
public:
Parser() : Parser::base_type(start)
{
using namespace qi;
//equal1 = as_string["MSG33.D_REF"] >> "==" >> int_[static_cast<unsigned short>(_1)];// This is the idea...
equal1 = as_string["MSG33.D_REF"] >> "==" >> int_;// This may contain negative numbers, but they are only 16 bits length, so they must be casted to "unsigned short" and not to "unsigned int"
equal2 = +(alnum | char_("._")) >> "==" >> uint_;
start = skip(blank)[(equal1 | equal2) % "&&"] > eoi;
}
};
Checks parse(const std::string& str)
{
using It = std::string::const_iterator;
static const Parser<It> parser;
Checks checks;
It first = str.begin(), last = str.end();
if (!qi::parse(first, last, parser, checks))
return {};
return checks;
}
}
int main()
{
auto checks1 = engine::parse("MSG33.ANYTHING == 25");// Normal case. All the checks are done with positive variable values
auto checks2 = engine::parse("MSG33.D_REF == 25");// Especial case extended from normal case. Checks A positive/negative variable with a positiove value.
auto checks3 = engine::parse("MSG33.D_REF == -25");// Especial case. Check a negative value. D_REF should be codified as 2's complement 16 bits unsigned, but it is converted to 32 bits unsigned
std::cout << std::hex << "Obtained: " << checks3.front().number << std::endl << "Wished: " << static_cast<unsigned short>(checks3.front().number);// It displays 0xffffffe7, but I need 0xffe7. Possible semanatic action to force conversion prior to vector insertion???
}
First: A word of caution
Automatic attribute propagation already does exactly what you need. That's pretty much what you'd expect since it compiles.
Your problem really has nothing to do with the parsing at all. It has to do with how you interpret the correctly parsed negative number, correctly converted to the integer type you chose (unsigned int).
Indeed, if you want to treat a unsigned int value as a short (signed or unsigned) you have to coerce it, or use a bitmask to clear the high bits: c.number & 0xffff.
Storing 0xffe7 inside the unsigned int is of course possible. But it is technically just INCORRECT 2's complement encoding. Experience tells me it will lead to error-prone code.
If I were to go for a design like this, I'd choose an integer representation type that is expressly NOT an arithmetic type. Something like
struct Number {
_implementation_defined_ storage;
uint32_t as_uint32() const { return /*some implementation logic on storage*/; }
int16_t as_int16() const { return /*some other implementation logic on storage*/; }
// etc.
};
In the land of parsed AST's, I'd prefer
template <typename V>
struct Check {
std::string name;
V number;
};
using Check = boost::variant<Check<uint32_t>, Check<int16_t>>;
With that out of the way, let's see some answers to your question:
Using static cast in the semantic action
You can force the issue using Boost Phoenix: Live On Coliru
assign_d_ref %= qi::string("MSG33.D_REF") >> "==" >>
qi::int_[_1 = boost::phoenix::static_cast_<uint16_t>(_1)];
IMO, a slightly better approach¹ is to have a parser that parses uint16_t in the first place: Live On Coliru
qi::int_parser<uint16_t> uint16_;
assign_d_ref = qi::string("MSG33.D_REF") >> "==" >> uint16_;
Other Improvements
I'd also improve the expressiveness some more using e.g.:
qi::symbols<char> s16_vars;
s16_vars += "MSG33.D_REF", "MSG34.D_REF";
assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
To generalize for signed 16 bit variables.
qi::rule<It, std::string()> name;
name = +(qi::alnum | qi::char_("._"));
This fixes the missing lexeme[] around the name (by declaring the rule without skipper²).
assign_u32 = name >> "==" >> qi::uint_;
assign = assign_s16 | assign_u32;
start = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
Apart from the readability, it fixes the edge case where blanks are immediately before end-of-input.
See the combined result Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace engine {
struct Check {
std::string variable;
uint32_t number;
friend std::ostream& operator<<(std::ostream& os, Check const& c) {
auto f = os.flags();
os << "{" << std::quoted(c.variable) << " == " //
<< std::hex << std::showbase << c.number << "}";
os.setf(f);
return os;
}
};
using Checks = std::vector<Check>;
} // namespace engine
BOOST_FUSION_ADAPT_STRUCT(engine::Check, variable, number)
namespace engine {
namespace qi = boost::spirit::qi;
template <typename It> class Parser : public qi::grammar<It, Checks()> {
public:
Parser() : Parser::base_type(start) {
using namespace qi::labels;
s16_vars += "MSG33.D_REF", "MSG34.D_REF";
name = +(qi::alnum | qi::char_("._"));
assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
assign_u32 = name >> "==" >> qi::uint_;
assign = assign_s16 | assign_u32;
start = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
BOOST_SPIRIT_DEBUG_NODES((start)(assign)(assign_u32)(assign_s16)(name))
}
private:
qi::int_parser<uint16_t> uint16_;
qi::symbols<char> s16_vars;
qi::rule<It, Check(), qi::blank_type> assign, assign_s16, assign_u32;
qi::rule<It, Checks()> start;
// lexeme:
qi::rule<It, std::string()> name;
};
Checks parse(const std::string& str) {
using It = std::string::const_iterator;
static const Parser<It> parser;
Checks checks;
It first = str.begin(), last = str.end();
if (!qi::parse(first, last, parser, checks))
return {};
return checks;
}
} // namespace engine
int main() {
for (auto sep = ""; auto& c : engine::parse(
"MSG33.ANYTHING == 25 && MSG33.D_REF == 25 && MSG33.D_REF == -25"))
std::cout << std::exchange(sep, " && ") << c;
std::cout << "\n";
}
Printing (like all samples above):
{"MSG33.ANYTHING" == 0x19} && {"MSG33.D_REF" == 0x19} && {"MSG33.D_REF" == 0xffe7}
BONUS: Variant Style
Because you might be interested, here's a version using the variant AST:
Live On Coliru
#include <boost/core/demangle.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace engine {
template <typename T>
struct VarCheck {
std::string variable;
T number;
friend std::ostream& operator<<(std::ostream& os, VarCheck const& c) {
auto f = os.flags();
os << " {" << std::quoted(c.variable) << " == " << std::hex
<< std::showbase << c.number << ":"
<< boost::core::demangle(typeid(T).name()) << "}";
os.setf(f);
return os;
}
};
using S16Var = VarCheck<int16_t>;
using U32Var = VarCheck<uint32_t>;
using Check = boost::variant<U32Var, S16Var>;
using Checks = std::vector<Check>;
} // namespace engine
// BOOST_FUSION_ADAPT_STRUCT(engine::S16Var, variable, number)
// BOOST_FUSION_ADAPT_STRUCT(engine::S16Var, variable, number)
// Or, generically: https://www.boost.org/doc/libs/1_80_0/libs/fusion/doc/html/fusion/adapted/adapt_tpl_struct.html
BOOST_FUSION_ADAPT_TPL_STRUCT((T), (engine::VarCheck)(T), variable, number)
namespace engine {
namespace qi = boost::spirit::qi;
template <typename It> class Parser : public qi::grammar<It, Checks()> {
public:
Parser() : Parser::base_type(start) {
using namespace qi::labels;
s16_vars += "MSG33.D_REF", "MSG34.D_REF";
name = +(qi::alnum | qi::char_("._"));
assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
assign_u32 = name >> "==" >> qi::uint_;
assign = assign_s16 | assign_u32;
start = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
BOOST_SPIRIT_DEBUG_NODES((start)(assign)(assign_u32)(assign_s16)(name))
}
private:
qi::int_parser<uint16_t> uint16_;
qi::symbols<char> s16_vars;
qi::rule<It, Check(), qi::blank_type> assign;
qi::rule<It, U32Var(), qi::blank_type> assign_u32;
qi::rule<It, S16Var(), qi::blank_type> assign_s16;
qi::rule<It, Checks()> start;
// lexeme:
qi::rule<It, std::string()> name;
};
Checks parse(const std::string& str) {
using It = std::string::const_iterator;
static const Parser<It> parser;
Checks checks;
It first = str.begin(), last = str.end();
if (!qi::parse(first, last, parser, checks))
return {};
return checks;
}
} // namespace engine
int main() {
for (auto sep = "";
auto& c : engine::parse("MSG33.ANYTHING == 25 && MSG33.D_REF == 25 && "
"MSG33.D_REF == -25")) {
std::cout << std::exchange(sep, "\n && ") << c;
}
std::cout << "\n";
}
I've extended the output with the static type information for visibility:
{"MSG33.ANYTHING" == 0x19:unsigned int}
&& {"MSG33.D_REF" == 0x19:short}
&& {"MSG33.D_REF" == 0xffe7:short}
It's easy to generalize for more variable type here:
using S16Var = VarCheck<int16_t>;
using U32Var = VarCheck<uint32_t>;
using DblVar = VarCheck<double>;
using StrVar = VarCheck<std::string>;
using Check = boost::variant<U32Var, S16Var, DblVar, StrVar>;
See it Live On Coliru, with the output
{"MSG33.ANYTHING" == 0x19:unsigned int}
&& {"MSG33.D_REF" == 0x19:short}
&& {"SEHE.DBL_1" == 4.2e+10:double}
&& {"SEHE.DBL_2" == -inf:double}
&& {"SEHE.STR_42" == Life The Universe and everything:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}
&& {"SEHE.STR_300" == Three hundred:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}
&& {"MSG33.D_REF" == 0xffe7:short}
¹ E.g. Boost Spirit: "Semantic actions are evil"?
² Boost spirit skipper issues

Unintelligible compilation error for a simple X3 grammar

I have a quite simple grammar I try to implement using boost spirit x3, without success.
It does not compile, and due to all the templates and complex concepts used in the library (I know, it is rather a "header"), the compilation error message is way too long to be intelligible.
I tried to comment part of the code in order narrow down the culprit, without success as it comes down to several parts, for which I don't see any error anyway.
Edit2: the first error message is in indeed in push_front_impl.hpp highlighting that:
::REQUESTED_PUSH_FRONT_SPECIALISATION_FOR_SEQUENCE_DOES_NOT_EXIST::*
I suspect the keyword auto or maybe the p2 statement with ulong_long...but with no faith.
Need the help of you guys...spirit's elites !
Below a minimal code snippet reproducing the compilation error.
Edit: using boost 1.70 and visual studio 2019 v16.1.6
#include <string>
#include <iostream>
#include "boost/spirit/home/x3.hpp"
#include "boost/spirit/include/support_istream_iterator.hpp"
int main(void)
{
       std::string input = \
             "\"nodes\":{ {\"type\":\"bb\", \"id\" : 123456567, \"label\" : \"0x12023049\"}," \
                         "{\"type\":\"bb\", \"id\" : 123123123, \"label\" : \"0x01223234\"}," \
                         "{\"type\":\"ib\", \"id\" : 223092343, \"label\" : \"0x03020343\"}}";
       std::istringstream iss(input);
       namespace x3 = boost::spirit::x3;
       using x3::char_;
       using x3::ulong_long;
       using x3::lit;
 
       auto q = lit('\"'); /* q => quote */
 
       auto p1 = q >> lit("type") >> q >> lit(':') >> q >> (lit("bb") | lit("ib")) >> q;
       auto p2 = q >> lit("id") >> q >> lit(':') >> ulong_long;
       auto p3 = q >> lit("label") >> q >> lit(':') >> q >> (+x3::alpha) >> q;
       auto node =  lit('{') >> p1 >> lit(',') >> p2 >> lit(',') >> p3 >> lit('}');
       auto nodes = q >> lit("nodes") >> q >> lit(':') >> lit('{') >> node % lit(',') >> lit('}');
 
       boost::spirit::istream_iterator f(iss >> std::noskipws), l{};
       bool b = x3::phrase_parse(f, l, nodes, x3::space);
 
       return 0;
}
It is an known MPL limitation (Issue with X3 and MS VS2017, https://github.com/boostorg/spirit/issues/515) + bug/difference of implementation for MSVC/ICC compilers (https://github.com/boostorg/mpl/issues/43).
I rewrote an offending part without using MPL (https://github.com/boostorg/spirit/pull/607), it will be released in Boost 1.74, until then you should be able to workaround with:
#define BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
#define BOOST_MPL_LIMIT_VECTOR_SIZE 50
Alternatively you could wrap different parts of your grammar into rules, what will reduce sequence parser chain.
Note that q >> lit("x") >> q >> lit(':') >> ... probably is not what you really want, it (with a skipper) will allow " x ": to be parsed. If you do not want that use simply lit("\"x\"") >> lit(':') >> ...
There's a chance that there might be a missing indirect include for your specific platform/version (if I had to guess it might be caused by using the istream iterator support header from Qi).
If that's not the issue, my attention is drawn by the where T = boost::mpl::aux::vector_tag<20> (/HT #Rup - number 20 seems suspiciously like it might be some kind of limit.
Either we can find what trips the limit and see if we can raise it, but I'll do the "unscientific" approach in the interest of helping you along with the parser.
Simplifying The Expressions
I see a lot (lot) of lit() nodes in your parser expressions that you don't need. I suspect all the quoted constructs need to be lexemes, and instead of painstakingly repeating the quote symbol all over the place, perhaps package it as follows:
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
Notes:
I improved the naming so it's more natural to read:
auto node = '{' >> type >> ',' >> id >> ',' >> label >> '}';
I changed alpha to alnum so it would actually match your sample input
Hypothesis: The expressions are structurally simplified to be more hierarchical - the sequences consist of fewer >>-ed terms - the hope is that this removes a potential mpl::vector size limit.
There's one missing piece, bb_ib that I left out because it changes when you want to actually assign parsed values to attributes. Let's do that:
Attributes
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
As you can see I opted for an enum to represent type. The most natural way to parse that would be using symbols<>
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
Now you can parse into a vector of Node:
Demo
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
namespace { // debug output
inline std::ostream& operator<<(std::ostream& os, Node::Type t) {
switch (t) {
case Node::bb: return os << "bb";
case Node::ib: return os << "ib";
}
return os << "?";
}
inline std::ostream& operator<<(std::ostream& os, Node const& n) {
return os << "Node{" << n.type << ", " << n.id << ", " << std::quoted(n.label) << "}";
}
}
// attribute propagation
BOOST_FUSION_ADAPT_STRUCT(Node, type, id, label)
int main() {
std::string input = R"("nodes": {
{
"type": "bb",
"id": 123456567,
"label": "0x12023049"
},
{
"type": "bb",
"id": 123123123,
"label": "0x01223234"
},
{
"type": "ib",
"id": 223092343,
"label": "0x03020343"
}
})";
namespace x3 = boost::spirit::x3;
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
auto node
= x3::rule<Node, Node> {"node"}
= '{' >> type >> ',' >> id >> ',' >> label >> '}';
auto nodes = q("nodes") >> ':' >> '{' >> node % ',' >> '}';
std::vector<Node> parsed;
auto f = begin(input);
auto l = end(input);
if (x3::phrase_parse(f, l, nodes, x3::space, parsed)) {
for (Node& node : parsed) {
std::cout << node << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
Prints
Node{bb, 123456567, "0x12023049"}
Node{bb, 123123123, "0x01223234"}
Node{ib, 223092343, "0x03020343"}

boost spirit grammar for parsing header columns

I want to parse header columns of a text file. The column names should be allowed to be quoted and any case of letters. Currently I am using the following grammar:
#include <string>
#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper>
struct Grammar : qi::grammar<Iterator, void(), Skipper>
{
static constexpr char colsep = '|';
Grammar() : Grammar::base_type(header)
{
using namespace qi;
using ascii::char_;
#define COL(name) (no_case[name] | ('"' >> no_case[name] >> '"'))
header = (COL("columna") | COL("column_a")) >> colsep >>
(COL("columnb") | COL("column_b")) >> colsep >>
(COL("columnc") | COL("column_c")) >> eol >> eoi;
#undef COL
}
qi::rule<Iterator, void(), Skipper> header;
};
int main()
{
const std::string s{"columnA|column_B|column_c\n"};
auto begin(std::begin(s)), end(std::end(s));
Grammar<std::string::const_iterator, qi::blank_type> p;
bool ok = qi::phrase_parse(begin, end, p, qi::blank);
if (ok && begin == end)
std::cout << "Header ok" << std::endl;
else if (ok && begin != end)
std::cout << "Remaining unparsed: '" << std::string(begin, end) << "'" << std::endl;
else
std::cout << "Parse failed" << std::endl;
return 0;
}
Is this possible without the use of a macro? Further I would like to ignore any underscores at all. Can this be achieved with a custom skipper? In the end it would be ideal if one could write:
header = col("columna") >> colsep >> col("columnb") >> colsep >> column("columnc") >> eol >> eoi;
where col would be an appropriate grammar or rule.
#sehe how can I fix this grammar to support "\"Column_A\"" as well? 6 hours ago
By this time you should probably have realized that there's two different things going on here.
Separate Yo Concerns
On the one hand you have a grammar (that allows |-separated columns like columna or "Column_A").
On the other hand you have semantic analysis (the phase where you check that the parsed contents match certain criteria).
The thing that is making your life hard is trying to conflate the two. Now, don't get me wrong, there could be (very rare) circumstances where fusing those responsibilities together is absolutely required - but I feel that would always be an optimization. If you need that, Spirit is not your thing, and you're much more likely to be served with a handwritten parser.
Parsing
So let's get brain-dead simple about the grammar:
static auto headers = (quoted|bare) % '|' > (eol|eoi);
The bare and quoted rules can be pretty much the same as before:
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
As you can see this will implicitly take care of quoting and escaping as well whitespace skipping outside lexemes. When applied simply, it will result in a clean list of column names:
std::string const s = "\"columnA\"|column_B| column_c \n";
std::vector<std::string> headers;
bool ok = phrase_parse(begin(s), end(s), Grammar::headers, x3::blank, headers);
std::cout << "Parse " << (ok?"ok":"invalid") << std::endl;
if (ok) for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
Prints Live On Coliru
Parse ok
"columnA"
"column_B"
"column_c"
INTERMEZZO: Coding Style
Let's structure our code so that the separation of concerns is reflected. Our parsing code might use X3, but our validation code doesn't need to be in the same translation unit (cpp file).
Have a header defining some basic types:
#include <string>
#include <vector>
using Header = std::string;
using Headers = std::vector<Header>;
Define the operations we want to perform on them:
Headers parse_headers(std::string const& input);
bool header_match(Header const& actual, Header const& expected);
bool headers_match(Headers const& actual, Headers const& expected);
Now, main can be rewritten as just:
auto headers = parse_headers("\"columnA\"|column_B| column_c \n");
for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
bool valid = headers_match(headers, {"columna","columnb","columnc"});
std::cout << "Validation " << (valid?"passed":"failed") << "\n";
And e.g. a parse_headers.cpp could contain:
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace Grammar {
using namespace x3;
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
static auto headers = (quoted|bare) % '|' > (eol|eoi);
}
Headers parse_headers(std::string const& input) {
Headers output;
if (phrase_parse(begin(input), end(input), Grammar::headers, x3::blank, output))
return output;
return {}; // or throw, if you prefer
}
Validating
This is what is known as "semantic checks". You take the vector of strings and check them according to your logic:
#include <boost/range/adaptors.hpp>
#include <boost/algorithm/string.hpp>
bool header_match(Header const& actual, Header const& expected) {
using namespace boost::adaptors;
auto significant = [](unsigned char ch) {
return ch != '_' && std::isgraph(ch);
};
return boost::algorithm::iequals(actual | filtered(significant), expected);
}
bool headers_match(Headers const& actual, Headers const& expected) {
return boost::equal(actual, expected, header_match);
}
That's all. All the power of algorithms and modern C++ at your disposal, no need to fight with constraints due to parsing context.
Full Demo
The above, Live On Wandbox
Both parts got significantly simpler:
your parser doesn't have to deal with quirky comparison logic
your comparison logic doesn't have to deal with grammar concerns (quotes, escapes, delimiters and whitespace)

boost spirit parsing with no skipper

Think about a preprocessor which will read the raw text (no significant white space or tokens).
There are 3 rules.
resolve_para_entry should solve the Argument inside a call. The top-level text is returned as string.
resolve_para should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve is the entry
On the way I track the iterator and get the text portion
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
Rules:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
Indeed this should be a lot simpler.
First off, I fail to see why the absense of a skipper is at all relevant.
Second, exposing the raw input is best done using qi::raw[] instead of dancing with iter_pos and clumsy semantic actions¹.
Among the other observations I see:
negating a charset is done with ~, so e.g. ~char_(",()")
(p|eps) would be better spelled -p
(lit('(') >> lit(')')) could be just "()" (after all, there's no skipper, right)
p >> *(',' >> p) is equivalent to p % ','
With the above, resolve_para simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
resolve_para_entry seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
Here's my take on it:
Define An AST
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
Creating The Grammar Rules
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
And their definitions:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
Notes:
No more semantic actions
No more eps
No more iter_pos
I've opted to make arglist not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
But in our sample it will generate a lot of noisy output.
Of course your entry point (start) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos that you were already using): seek[]
The hold is there for this reason: boost::spirit::qi duplicate parsing on the output - You might not need it in your actual parser.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
More Ideas
That last output shows you a problem with your current grammar: lit(',') should obviously not be seen as a call with two parameters.
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
Boost spirit parse rule is not applied
or this one boost spirit reporting semantic error
BONUS
Bonus version that uses string_view and also shows exact line/column information of all extracted words.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view from an iterator range.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
¹ Boost Spirit: "Semantic actions are evil"?

Boost spirit parse rule is not applied

i can´t see my error here .. this rule parse some stuff ok but the last two samples not. Could somebody please give me a hint ..
Goal is a parser than can identify member property access and member function calls. Also chained in some way
a()
a(para)
x.a()
x.a(para)
x.a(para).g(para).j()
x.y
x.y.z
x.y.z() <---fail
y.z.z(para) <--- fail
lvalue =
iter_pos >> name[_val = _1]
>> *(lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_parameter, construct<std::vector<common_node> >(_1))]
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _3), key_this, construct<common_node>(_val), key_callname, construct<std::wstring>(_1), key_parameter, construct<std::vector<common_node> >(_2))]
>> *(lit('.') >> name_pure >> iter_pos)[_val = construct<common_node>(type_cmd_dot_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_propname, construct<std::wstring>(_1))]
;
thank you
Markus
You provide very little information to go at. Let me humor you with my entry into this guessing game:
Let's assume you want to parse a simple "language" that merely allows member expressions and function invocations, but chained.
Now, your grammar says nothing about the parameters (though it's clear the param list can be empty), so let me go the next mile and assume that you want to accept the same kind of expressions there (so foo(a) is okay, but also bar(foo(a)) or bar(b.foo(a))).
Since you accept chaining of function calls, it appears that functions are first-class objects (and functions can return functions), so foo(a)(b, c, d) should be accepted as well.
You didn't mention it, but parameters often include literals (sqrt(9) comes to mind, or println("hello world")).
Other items:
you didn't say but likely you want to ignore whitespace in certain spots
from the iter_pos (ab)use it seems you're interested in tracking the original source location inside the resulting AST.
1. Define An AST
We should keep it simple as ever:
namespace Ast {
using Identifier = boost::iterator_range<It>;
struct MemberExpression;
struct FunctionCall;
using Expression = boost::variant<
double, // some literal types
std::string,
// non-literals
Identifier,
boost::recursive_wrapper<MemberExpression>,
boost::recursive_wrapper<FunctionCall>
>;
struct MemberExpression {
Expression object; // antecedent
Identifier member; // function or field
};
using Parameter = Expression;
using Parameters = std::vector<Parameter>;
struct FunctionCall {
Expression function; // could be a member function
Parameters parameters;
};
}
NOTE We're not going to focus on showing source locations, but already made one provision, storing identifiers as an iterator-range.
NOTE Fusion-adapting the only types not directly supported by Spirit:
BOOST_FUSION_ADAPT_STRUCT(Ast::MemberExpression, object, member)
BOOST_FUSION_ADAPT_STRUCT(Ast::FunctionCall, function, parameters)
We will find that we don't use these, because Semantic Actions are more convenient here.
2. A Matching Grammar
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [expression];
identifier = raw [ (alpha|'_') >> *(alnum|'_') ];
parameters = -(expression % ',');
expression
= literal
| identifier >> *(
('.' >> identifier)
| ('(' >> parameters >> ')')
);
literal = double_ | string_;
string_ = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES(
(identifier)(start)(parameters)(expression)(literal)(string_)
);
}
In this skeleton most rules benefit from automatic attribute propagation. The one that doesn't is expression:
qi::rule<It, Expression()> start;
using Skipper = qi::space_type;
qi::rule<It, Expression(), Skipper> expression, literal;
qi::rule<It, Parameters(), Skipper> parameters;
// lexemes
qi::rule<It, Identifier()> identifier;
qi::rule<It, std::string()> string_;
So, let's create some helpers for the semantic actions.
NOTE An important take-away here is to create your own higher-level building blocks instead of toiling away with boost::phoenix::construct<> etc.
Define two simple construction functions:
struct mme_f { MemberExpression operator()(Expression lhs, Identifier rhs) const { return { lhs, rhs }; } };
struct mfc_f { FunctionCall operator()(Expression f, Parameters params) const { return { f, params }; } };
phx::function<mme_f> make_member_expression;
phx::function<mfc_f> make_function_call;
Then use them:
expression
= literal [_val=_1]
| identifier [_val=_1] >> *(
('.' >> identifier) [ _val = make_member_expression(_val, _1)]
| ('(' >> parameters >> ')') [ _val = make_function_call(_val, _1) ]
);
That's all. We're ready to roll!
3. DEMO
Live On Coliru
I created a test bed looking like this:
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> const g;
for (std::string const input : {
"a()", "a(para)", "x.a()", "x.a(para)", "x.a(para).g(para).j()", "x.y", "x.y.z",
"x.y.z()",
"y.z.z(para)",
// now let's add some funkyness that you didn't mention
"bar(foo(a))",
"bar(b.foo(a))",
"foo(a)(b, c, d)", // first class functions
"sqrt(9)",
"println(\"hello world\")",
"allocate(strlen(\"aaaaa\"))",
"3.14",
"object.rotate(180)",
"object.rotate(event.getAngle(), \"torque\")",
"app.mainwindow().find_child(\"InputBox\").font().size(12)",
"app.mainwindow().find_child(\"InputBox\").font(config().preferences.baseFont(style.PROPORTIONAL))"
}) {
std::cout << " =========== '" << input << "' ========================\n";
It f(input.begin()), l(input.end());
Ast::Expression parsed;
bool ok = parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << parsed << "\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Incredible as it may appear, this already parses all the test cases and prints:
=========== 'a()' ========================
Parsed: a()
=========== 'a(para)' ========================
Parsed: a(para)
=========== 'x.a()' ========================
Parsed: x.a()
=========== 'x.a(para)' ========================
Parsed: x.a(para)
=========== 'x.a(para).g(para).j()' ========================
Parsed: x.a(para).g(para).j()
=========== 'x.y' ========================
Parsed: x.y
=========== 'x.y.z' ========================
Parsed: x.y.z
=========== 'x.y.z()' ========================
Parsed: x.y.z()
=========== 'y.z.z(para)' ========================
Parsed: y.z.z(para)
=========== 'bar(foo(a))' ========================
Parsed: bar(foo(a))
=========== 'bar(b.foo(a))' ========================
Parsed: bar(b.foo(a))
=========== 'foo(a)(b, c, d)' ========================
Parsed: foo(a)(b, c, d)
=========== 'sqrt(9)' ========================
Parsed: sqrt(9)
=========== 'println("hello world")' ========================
Parsed: println(hello world)
=========== 'allocate(strlen("aaaaa"))' ========================
Parsed: allocate(strlen(aaaaa))
=========== '3.14' ========================
Parsed: 3.14
=========== 'object.rotate(180)' ========================
Parsed: object.rotate(180)
=========== 'object.rotate(event.getAngle(), "torque")' ========================
Parsed: object.rotate(event.getAngle(), torque)
=========== 'app.mainwindow().find_child("InputBox").font().size(12)' ========================
Parsed: app.mainwindow().find_child(InputBox).font().size(12)
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
4. Too Good To Be True?
You're right. I cheated. I didn't show you this code required to debug print the parsed AST:
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, MemberExpression const& me) {
return os << me.object << "." << me.member;
}
static inline std::ostream& operator<<(std::ostream& os, FunctionCall const& fc) {
os << fc.function << "(";
bool first = true;
for (auto& p : fc.parameters) { if (!first) os << ", "; first = false; os << p; }
return os << ")";
}
}
It's only debug printing, as string literals aren't correctly roundtripped. But it's only 10 lines of code, that's a bonus.
5. The Full Monty: Source Locations
This had your interest, so let's show it working. Let's add a simple loop to print all locations of identifiers:
using IOManip::showpos;
for (auto& id : all_identifiers(parsed)) {
std::cout << " - " << id << " at " << showpos(id, input) << "\n";
}
Of course, this begs the question, what are showpos and all_identifiers?
namespace IOManip {
struct showpos_t {
boost::iterator_range<It> fragment;
std::string const& source;
friend std::ostream& operator<<(std::ostream& os, showpos_t const& manip) {
auto ofs = [&](It it) { return it - manip.source.begin(); };
return os << "[" << ofs(manip.fragment.begin()) << ".." << ofs(manip.fragment.end()) << ")";
}
};
showpos_t showpos(boost::iterator_range<It> fragment, std::string const& source) {
return {fragment, source};
}
}
As for the identifier extraction:
std::vector<Identifier> all_identifiers(Expression const& expr) {
std::vector<Identifier> result;
struct Harvest {
using result_type = void;
std::back_insert_iterator<std::vector<Identifier> > out;
void operator()(Identifier const& id) { *out++ = id; }
void operator()(MemberExpression const& me) { apply_visitor(*this, me.object); *out++ = me.member; }
void operator()(FunctionCall const& fc) {
apply_visitor(*this, fc.function);
for (auto& p : fc.parameters) apply_visitor(*this, p);
}
// non-identifier expressions
void operator()(std::string const&) { }
void operator()(double) { }
} harvest { back_inserter(result) };
boost::apply_visitor(harvest, expr);
return result;
}
That's a tree visitor that harvests all identifiers recursively, inserting them into the back of a container.
Live On Coliru
Where output looks like (excerpt):
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
- app at [0..3)
- mainwindow at [4..14)
- find_child at [17..27)
- font at [40..44)
- config at [45..51)
- preferences at [54..65)
- baseFont at [66..74)
- style at [75..80)
- PROPORTIONAL at [81..93)
Try changing
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')'))
to
>> *(*(lit('.') >> name_pure) >> lit('(') > paralistopt > lit(')'))