Why does qi::skip fail with tokens from the lexer? - c++

I'm using boost::spirit lex and qi to parse some source code.
I already skip white spaces from the input string using the lexer. What I would like to do is to switch skipping the comments depending on the context in the parser.
Here is a basic demo. See the comments in Grammar::Grammar() for my problem:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef lex::lexertl::token<char const*, boost::mpl::vector<std::string>, boost::mpl::false_ > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
struct TokenId
{
enum type
{
INVALID_TOKEN_ID = lex::min_token_id,
COMMENT
};
};
struct Lexer : lex::lexer<lexer_type>
{
public:
lex::token_def<std::string> comment;
lex::token_def<std::string> identifier;
lex::token_def<std::string> lineFeed;
lex::token_def<std::string> space;
Lexer()
{
comment = "\\/\\*.*?\\*\\/|\\/\\/[^\\r\\n]*";
identifier = "[A-Za-z_][A-Za-z0-9_]*";
space = "[\\x20\\t\\f\\v]+";
lineFeed = "(\\r\\n)|\\r|\\n";
this->self = space[lex::_pass = lex::pass_flags::pass_ignore];
this->self += lineFeed[lex::_pass = lex::pass_flags::pass_ignore];
this->self.add
(comment, TokenId::COMMENT)
(identifier)
(';')
;
}
};
typedef Lexer::iterator_type Iterator;
void traceComment(const std::string& content)
{
std::cout << " comment: " << content << std::endl;
}
class Grammar : public qi::grammar<Iterator>
{
typedef token_type skipped_t;
qi::rule<Iterator, qi::unused_type, qi::unused_type> m_start;
qi::rule<Iterator, qi::unused_type, qi::unused_type, skipped_t> m_variable;
qi::rule<Iterator, std::string(), qi::unused_type> m_comment;
public:
Lexer lx;
public:
Grammar() :
Grammar::base_type(m_start)
{
// This does not work (comments are not skipped in m_variable)
m_start = *(
m_comment[phx::bind(&traceComment, qi::_1)]
| qi::skip(qi::token(TokenId::COMMENT))[m_variable]
);
m_variable = lx.identifier >> lx.identifier >> ';';
m_comment = qi::token(TokenId::COMMENT);
/** But this works:
m_start = *(
m_comment[phx::bind(&traceComment, qi::_1)]
| m_variable
);
m_variable = qi::skip(qi::token(TokenId::COMMENT))[lx.identifier >> lx.identifier >> ';'];
m_comment = qi::token(TokenId::COMMENT);
*/
}
};
void test(const char* code)
{
std::cout << code << std::endl;
Grammar parser;
const char* begin = code;
const char* end = code + strlen(code);
tokenize_and_parse(begin, end, parser.lx, parser);
if (begin == end)
std::cout << "-- OK --" << std::endl;
else
std::cout << "-- FAILED --" << std::endl;
std::cout << std::endl;
}
int main(int argc, char* argv[])
{
test("/* kept */ int foo;");
test("int /* ignored */ foo;");
test("int foo /* ignored */;");
test("int foo; // kept");
}
The output is:
/* kept */ int foo;
comment: /* kept */
-- OK --
int /* ignored */ foo;
-- FAILED --
int foo /* ignored */;
-- FAILED --
int foo; // kept
comment: // kept
-- OK --
Is there any issue with skipped_t?

The behavior you are describing is what I would expect from my experience.
When you write
my_rule = qi::skip(ws) [ foo >> lit(',') >> bar >> lit('=') >> baz ];
this is essentially the same as writing
my_rule = *ws >> foo >> *ws >> lit(',') >> *ws >> bar >> *ws >> lit('=') >> *ws >> baz;
(assuming that ws is rule with no attribute. If it has an attribute in your grammar, that attribute is ignored, as if using qi::omit.)
Notably, the skipper does not get propogated inside of the foo rule. So foo, bar, and baz can still be whitespace-sensitive in the above. What the skip directive is doing is causing the grammar not to care about leading whitespace in this rule, or whitespace around the ',' and '=' in this rule.
More info here: http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/
Edit:
Also, I don't think the skipped_t is doing what you think it is there.
When you use a custom skipper, most straightforwardly you specify an actual instance of a parser as the skip parser for that rule. When you use a type instead of an object e.g. qi::skip(qi::blank_type), that is a shorthand, where the tag-type qi::blank_type has been linked via prior template declarations to the type qi::blank, and qi knows that when it sees qi::blank_type in certain places that it should instantiate a qi::blank parser object.
I don't see any evidence that you've actually set up that machinery, you've just typedef'ed skipped_t to token_type. What you should do if you want this to work that way (if it's even possible, I don't know) is read about qi customization points and instead declare qi::skipped_t as an empty struct which is linked via some template boiler plate to the rule m_comment, which is presumably what you actually want to be skipping. (If you skip all tokens of all types, then you can't possibly match anything so that wouldn't make sense, so I'm not sure what your intention was with making token_type the skipper.)
My guess is that when qi saw that typedef token_type in your parameter list, that it either ignored it or interprets it as part of the return value of the rule or something like this, not sure exactly what it would do.

Related

Parsing to different types of values in boost::spirit and apply casting to negative numbers

I am trying to solve an issue with positive and negative values in Boost Spirit.
The parser should use unsigned numbers (positive) 99% of the time.
The program works reading a string that defines a variables from 1 to 32 bits that should be read from another stream (for question context, not shown in the example), but there is a special case where a string "D_REF" may be a 16 bits signed number (2's complement).
The program codifies all checks as unsigned values in a std::vector, so I need to codify that positive value as unsigned, but previously I need to apply a cast to it to force it into an unsigned short value, and then store it in the unsigned int struct.
This need comes from an after request where a data stream shall be read and values extracted from it as unsigned, and there parsed comparisons apply to them.
I know this request may look weird, but it is a must for a current project, so can anyone help me with this?
Godbolt link: https://godbolt.org/z/8j615Mecx
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace engine
{
struct Check
{
std::string variable;
unsigned int number;
};
using Checks = std::vector<Check>;
}
BOOST_FUSION_ADAPT_STRUCT(engine::Check, variable, number)
namespace engine
{
namespace qi = boost::spirit::qi;
template <typename It>
class Parser : public qi::grammar<It, Checks()>
{
private:
qi::rule<It, Check(), qi::blank_type> equal1, equal2;
qi::rule<It, Checks()> start;
public:
Parser() : Parser::base_type(start)
{
using namespace qi;
//equal1 = as_string["MSG33.D_REF"] >> "==" >> int_[static_cast<unsigned short>(_1)];// This is the idea...
equal1 = as_string["MSG33.D_REF"] >> "==" >> int_;// This may contain negative numbers, but they are only 16 bits length, so they must be casted to "unsigned short" and not to "unsigned int"
equal2 = +(alnum | char_("._")) >> "==" >> uint_;
start = skip(blank)[(equal1 | equal2) % "&&"] > eoi;
}
};
Checks parse(const std::string& str)
{
using It = std::string::const_iterator;
static const Parser<It> parser;
Checks checks;
It first = str.begin(), last = str.end();
if (!qi::parse(first, last, parser, checks))
return {};
return checks;
}
}
int main()
{
auto checks1 = engine::parse("MSG33.ANYTHING == 25");// Normal case. All the checks are done with positive variable values
auto checks2 = engine::parse("MSG33.D_REF == 25");// Especial case extended from normal case. Checks A positive/negative variable with a positiove value.
auto checks3 = engine::parse("MSG33.D_REF == -25");// Especial case. Check a negative value. D_REF should be codified as 2's complement 16 bits unsigned, but it is converted to 32 bits unsigned
std::cout << std::hex << "Obtained: " << checks3.front().number << std::endl << "Wished: " << static_cast<unsigned short>(checks3.front().number);// It displays 0xffffffe7, but I need 0xffe7. Possible semanatic action to force conversion prior to vector insertion???
}
First: A word of caution
Automatic attribute propagation already does exactly what you need. That's pretty much what you'd expect since it compiles.
Your problem really has nothing to do with the parsing at all. It has to do with how you interpret the correctly parsed negative number, correctly converted to the integer type you chose (unsigned int).
Indeed, if you want to treat a unsigned int value as a short (signed or unsigned) you have to coerce it, or use a bitmask to clear the high bits: c.number & 0xffff.
Storing 0xffe7 inside the unsigned int is of course possible. But it is technically just INCORRECT 2's complement encoding. Experience tells me it will lead to error-prone code.
If I were to go for a design like this, I'd choose an integer representation type that is expressly NOT an arithmetic type. Something like
struct Number {
_implementation_defined_ storage;
uint32_t as_uint32() const { return /*some implementation logic on storage*/; }
int16_t as_int16() const { return /*some other implementation logic on storage*/; }
// etc.
};
In the land of parsed AST's, I'd prefer
template <typename V>
struct Check {
std::string name;
V number;
};
using Check = boost::variant<Check<uint32_t>, Check<int16_t>>;
With that out of the way, let's see some answers to your question:
Using static cast in the semantic action
You can force the issue using Boost Phoenix: Live On Coliru
assign_d_ref %= qi::string("MSG33.D_REF") >> "==" >>
qi::int_[_1 = boost::phoenix::static_cast_<uint16_t>(_1)];
IMO, a slightly better approach¹ is to have a parser that parses uint16_t in the first place: Live On Coliru
qi::int_parser<uint16_t> uint16_;
assign_d_ref = qi::string("MSG33.D_REF") >> "==" >> uint16_;
Other Improvements
I'd also improve the expressiveness some more using e.g.:
qi::symbols<char> s16_vars;
s16_vars += "MSG33.D_REF", "MSG34.D_REF";
assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
To generalize for signed 16 bit variables.
qi::rule<It, std::string()> name;
name = +(qi::alnum | qi::char_("._"));
This fixes the missing lexeme[] around the name (by declaring the rule without skipper²).
assign_u32 = name >> "==" >> qi::uint_;
assign = assign_s16 | assign_u32;
start = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
Apart from the readability, it fixes the edge case where blanks are immediately before end-of-input.
See the combined result Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace engine {
struct Check {
std::string variable;
uint32_t number;
friend std::ostream& operator<<(std::ostream& os, Check const& c) {
auto f = os.flags();
os << "{" << std::quoted(c.variable) << " == " //
<< std::hex << std::showbase << c.number << "}";
os.setf(f);
return os;
}
};
using Checks = std::vector<Check>;
} // namespace engine
BOOST_FUSION_ADAPT_STRUCT(engine::Check, variable, number)
namespace engine {
namespace qi = boost::spirit::qi;
template <typename It> class Parser : public qi::grammar<It, Checks()> {
public:
Parser() : Parser::base_type(start) {
using namespace qi::labels;
s16_vars += "MSG33.D_REF", "MSG34.D_REF";
name = +(qi::alnum | qi::char_("._"));
assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
assign_u32 = name >> "==" >> qi::uint_;
assign = assign_s16 | assign_u32;
start = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
BOOST_SPIRIT_DEBUG_NODES((start)(assign)(assign_u32)(assign_s16)(name))
}
private:
qi::int_parser<uint16_t> uint16_;
qi::symbols<char> s16_vars;
qi::rule<It, Check(), qi::blank_type> assign, assign_s16, assign_u32;
qi::rule<It, Checks()> start;
// lexeme:
qi::rule<It, std::string()> name;
};
Checks parse(const std::string& str) {
using It = std::string::const_iterator;
static const Parser<It> parser;
Checks checks;
It first = str.begin(), last = str.end();
if (!qi::parse(first, last, parser, checks))
return {};
return checks;
}
} // namespace engine
int main() {
for (auto sep = ""; auto& c : engine::parse(
"MSG33.ANYTHING == 25 && MSG33.D_REF == 25 && MSG33.D_REF == -25"))
std::cout << std::exchange(sep, " && ") << c;
std::cout << "\n";
}
Printing (like all samples above):
{"MSG33.ANYTHING" == 0x19} && {"MSG33.D_REF" == 0x19} && {"MSG33.D_REF" == 0xffe7}
BONUS: Variant Style
Because you might be interested, here's a version using the variant AST:
Live On Coliru
#include <boost/core/demangle.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace engine {
template <typename T>
struct VarCheck {
std::string variable;
T number;
friend std::ostream& operator<<(std::ostream& os, VarCheck const& c) {
auto f = os.flags();
os << " {" << std::quoted(c.variable) << " == " << std::hex
<< std::showbase << c.number << ":"
<< boost::core::demangle(typeid(T).name()) << "}";
os.setf(f);
return os;
}
};
using S16Var = VarCheck<int16_t>;
using U32Var = VarCheck<uint32_t>;
using Check = boost::variant<U32Var, S16Var>;
using Checks = std::vector<Check>;
} // namespace engine
// BOOST_FUSION_ADAPT_STRUCT(engine::S16Var, variable, number)
// BOOST_FUSION_ADAPT_STRUCT(engine::S16Var, variable, number)
// Or, generically: https://www.boost.org/doc/libs/1_80_0/libs/fusion/doc/html/fusion/adapted/adapt_tpl_struct.html
BOOST_FUSION_ADAPT_TPL_STRUCT((T), (engine::VarCheck)(T), variable, number)
namespace engine {
namespace qi = boost::spirit::qi;
template <typename It> class Parser : public qi::grammar<It, Checks()> {
public:
Parser() : Parser::base_type(start) {
using namespace qi::labels;
s16_vars += "MSG33.D_REF", "MSG34.D_REF";
name = +(qi::alnum | qi::char_("._"));
assign_s16 = qi::raw[s16_vars] >> "==" >> uint16_;
assign_u32 = name >> "==" >> qi::uint_;
assign = assign_s16 | assign_u32;
start = qi::skip(qi::blank)[assign % "&&" > qi::eoi];
BOOST_SPIRIT_DEBUG_NODES((start)(assign)(assign_u32)(assign_s16)(name))
}
private:
qi::int_parser<uint16_t> uint16_;
qi::symbols<char> s16_vars;
qi::rule<It, Check(), qi::blank_type> assign;
qi::rule<It, U32Var(), qi::blank_type> assign_u32;
qi::rule<It, S16Var(), qi::blank_type> assign_s16;
qi::rule<It, Checks()> start;
// lexeme:
qi::rule<It, std::string()> name;
};
Checks parse(const std::string& str) {
using It = std::string::const_iterator;
static const Parser<It> parser;
Checks checks;
It first = str.begin(), last = str.end();
if (!qi::parse(first, last, parser, checks))
return {};
return checks;
}
} // namespace engine
int main() {
for (auto sep = "";
auto& c : engine::parse("MSG33.ANYTHING == 25 && MSG33.D_REF == 25 && "
"MSG33.D_REF == -25")) {
std::cout << std::exchange(sep, "\n && ") << c;
}
std::cout << "\n";
}
I've extended the output with the static type information for visibility:
{"MSG33.ANYTHING" == 0x19:unsigned int}
&& {"MSG33.D_REF" == 0x19:short}
&& {"MSG33.D_REF" == 0xffe7:short}
It's easy to generalize for more variable type here:
using S16Var = VarCheck<int16_t>;
using U32Var = VarCheck<uint32_t>;
using DblVar = VarCheck<double>;
using StrVar = VarCheck<std::string>;
using Check = boost::variant<U32Var, S16Var, DblVar, StrVar>;
See it Live On Coliru, with the output
{"MSG33.ANYTHING" == 0x19:unsigned int}
&& {"MSG33.D_REF" == 0x19:short}
&& {"SEHE.DBL_1" == 4.2e+10:double}
&& {"SEHE.DBL_2" == -inf:double}
&& {"SEHE.STR_42" == Life The Universe and everything:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}
&& {"SEHE.STR_300" == Three hundred:std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}
&& {"MSG33.D_REF" == 0xffe7:short}
¹ E.g. Boost Spirit: "Semantic actions are evil"?
² Boost spirit skipper issues

boost spirit parsing with no skipper

Think about a preprocessor which will read the raw text (no significant white space or tokens).
There are 3 rules.
resolve_para_entry should solve the Argument inside a call. The top-level text is returned as string.
resolve_para should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve is the entry
On the way I track the iterator and get the text portion
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
Rules:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
Indeed this should be a lot simpler.
First off, I fail to see why the absense of a skipper is at all relevant.
Second, exposing the raw input is best done using qi::raw[] instead of dancing with iter_pos and clumsy semantic actions¹.
Among the other observations I see:
negating a charset is done with ~, so e.g. ~char_(",()")
(p|eps) would be better spelled -p
(lit('(') >> lit(')')) could be just "()" (after all, there's no skipper, right)
p >> *(',' >> p) is equivalent to p % ','
With the above, resolve_para simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
resolve_para_entry seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
Here's my take on it:
Define An AST
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
Creating The Grammar Rules
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
And their definitions:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
Notes:
No more semantic actions
No more eps
No more iter_pos
I've opted to make arglist not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
But in our sample it will generate a lot of noisy output.
Of course your entry point (start) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos that you were already using): seek[]
The hold is there for this reason: boost::spirit::qi duplicate parsing on the output - You might not need it in your actual parser.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
More Ideas
That last output shows you a problem with your current grammar: lit(',') should obviously not be seen as a call with two parameters.
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
Boost spirit parse rule is not applied
or this one boost spirit reporting semantic error
BONUS
Bonus version that uses string_view and also shows exact line/column information of all extracted words.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view from an iterator range.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
¹ Boost Spirit: "Semantic actions are evil"?

Boost Spirit template specialization failure

Below is a very compact version of a grammar I'm trying to write using boost::spirit::qi.
Environment: VS2013, x86, Boost1.64
When #including the header file, the compiler complains about the line
rBlock = "{" >> +(rInvocation) >> "}";
with a very long log (I've only copied the beginning and the end):
more than one partial specialization matches the template argument list
...
...
see reference to function template instantiation
'boost::spirit::qi::rule
&boost::spirit::qi::rule::operator =>(const Expr &)' being compiled
Where is my mistake?
The header file:
//mygrammar.h
#pragma once
#include <boost/spirit/include/qi.hpp>
namespace myNS
{
typedef std::string Identifier;
typedef ::boost::spirit::qi::rule <const char*, Identifier()> myIdentifierRule;
typedef ::boost::variant<char, int> Expression;
typedef ::boost::spirit::qi::rule <const char*, Expression()> myExpressionRule;
struct IdntifierEqArgument
{
Identifier ident;
Expression arg;
};
typedef ::boost::variant < IdntifierEqArgument, Expression > Argument;
typedef ::boost::spirit::qi::rule <const char*, Argument()> myArgumentRule;
typedef ::std::vector<Argument> ArgumentList;
typedef ::boost::spirit::qi::rule <const char*, myNS::ArgumentList()> myArgumentListRule;
struct Invocation
{
Identifier identifier;
::boost::optional<ArgumentList> args;
};
typedef ::boost::spirit::qi::rule <const char*, Invocation()> myInvocationRule;
typedef ::std::vector<Invocation> Block;
typedef ::boost::spirit::qi::rule <const char*, myNS::Block()> myBlockRule;
}
BOOST_FUSION_ADAPT_STRUCT(
myNS::IdntifierEqArgument,
(auto, ident)
(auto, arg)
);
BOOST_FUSION_ADAPT_STRUCT(
myNS::Invocation,
(auto, identifier)
(auto, args)
);
namespace myNS
{
struct myRules
{
myIdentifierRule rIdentifier;
myExpressionRule rExpression;
myArgumentRule rArgument;
myArgumentListRule rArgumentList;
myInvocationRule rInvocation;
myBlockRule rBlock;
myRules()
{
using namespace ::boost::spirit;
using namespace ::boost::spirit::qi;
rIdentifier = as_string[((qi::alpha | '_') >> *(qi::alnum | '_'))];
rExpression = char_ | int_;
rArgument = (rIdentifier >> "=" >> rExpression) | rExpression;
rArgumentList = rArgument >> *("," >> rArgument);
rInvocation = rIdentifier >> "(" >> -rArgumentList >> ")";
rBlock = "{" >> +(rInvocation) >> "}";
}
};
}
I'm not exactly sure where the issue is triggered, but it clearly is a symptom of too many ambiguities in the attribute forwarding rules.
Conceptually this could be triggered by your attribute types having similar/compatible layouts. In language theory, you're looking at a mismatch between C++'s nominative type system versus the approximation of structural typing in the attribute propagation system. But enough theorism :)
I don't think attr_cast<> will save you here as it probably uses the same mechanics and heuristics under the hood.
It drew my attention that making the ArgumentList optional is ... not very useful (as an empty list already accurately reflects absense of arguments).
So I tried simplifying the rules:
rArgumentList = -(rArgument % ',');
rInvocation = rIdentifier >> '(' >> rArgumentList >> ')';
And the declared attribute type can be simply ArgumentList instead of boost::optional::ArgumentList.
This turns out to remove the ambiguity when propagating into the vector<Invocation>, so ... you're saved.
If this feels "accidental" to you, you should! What would I do if this hadn't removed the ambiguity "by chance"? I'd have created a semantic action to propagate the Invocation by simpler mechanics. There's a good chance that fusion::push_back(_val, _1) or similar would have worked.
See also Boost Spirit: "Semantic actions are evil"?
Review And Demo
In the cleaned up review here I present a few fixes/improvements and a test run that dumps the parsed AST.
Separate AST from parser (you don't want use qi in the AST types. You specifically do not want using namespace directives in the face of generic template libraries)
Do not use auto in the adapt macros. That's not a feature. Instead, since you can ostensibly use C++11, use the C++11 (decltype) based macros
BOOST_FUSION_ADAPT_STRUCT(myAST::IdntifierEqArgument, ident,arg);
BOOST_FUSION_ADAPT_STRUCT(myAST::Invocation, identifier,args);
AST is leading (also, prefer c++11 for clarity):
namespace myAST {
using Identifier = std::string;
using Expression = boost::variant<char, int>;
struct IdntifierEqArgument {
Identifier ident;
Expression arg;
};
using Argument = boost::variant<IdntifierEqArgument, Expression>;
using ArgumentList = std::vector<Argument>;
struct Invocation {
Identifier identifier;
ArgumentList args;
};
using Block = std::vector<Invocation>;
}
It's nice to have the definitions separate
Regarding the parser,
I'd prefer the qi::grammar convention. Also,
You didn't declare any of the rules with a skipper. I "guessed" from context that whitespace is insignificant outside of the rules for Expression and Identifier.
Expression ate every char_, so also would eat ')' or even '3'. I noticed this only when testing and after debugging with:
//#define BOOST_SPIRIT_DEBUG
BOOST_SPIRIT_DEBUG_NODES((start)(rBlock)(rInvocation)(rIdentifier)(rArgumentList)(rArgument)(rExpression))
I highly recommend using these facilities
All in all the parser comes down to
namespace myNS {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const*>
struct myRules : qi::grammar<Iterator, myAST::Block()> {
myRules() : myRules::base_type(start) {
rIdentifier = qi::raw [(qi::alpha | '_') >> *(qi::alnum | '_')];
rExpression = qi::alpha | qi::int_;
rArgument = (rIdentifier >> '=' >> rExpression) | rExpression;
rArgumentList = -(rArgument % ',');
rInvocation = rIdentifier >> '(' >> rArgumentList >> ')';
rBlock = '{' >> +rInvocation >> '}';
start = qi::skip(qi::space) [ rBlock ];
BOOST_SPIRIT_DEBUG_NODES((start)(rBlock)(rInvocation)(rIdentifier)(rArgumentList)(rArgument)(rExpression))
}
private:
qi::rule<Iterator, myAST::Block()> start;
using Skipper = qi::space_type;
qi::rule<Iterator, myAST::Argument(), Skipper> rArgument;
qi::rule<Iterator, myAST::ArgumentList(), Skipper> rArgumentList;
qi::rule<Iterator, myAST::Invocation(), Skipper> rInvocation;
qi::rule<Iterator, myAST::Block(), Skipper> rBlock;
// implicit lexemes
qi::rule<Iterator, myAST::Identifier()> rIdentifier;
qi::rule<Iterator, myAST::Expression()> rExpression;
};
}
Adding a test driver
int main() {
std::string const input = R"(
{
foo()
bar(a, b, 42)
qux(someThing_awful01 = 9)
}
)";
auto f = input.data(), l = f + input.size();
myAST::Block block;
bool ok = parse(f, l, myNS::myRules<>{}, block);
if (ok) {
std::cout << "Parse success\n";
for (auto& invocation : block) {
std::cout << invocation.identifier << "(";
for (auto& arg : invocation.args) std::cout << arg << ",";
std::cout << ")\n";
}
}
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Complete Demo
See it Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
namespace myAST {
using Identifier = std::string;
using Expression = boost::variant<char, int>;
struct IdntifierEqArgument {
Identifier ident;
Expression arg;
};
using Argument = boost::variant<IdntifierEqArgument, Expression>;
using ArgumentList = std::vector<Argument>;
struct Invocation {
Identifier identifier;
ArgumentList args;
};
using Block = std::vector<Invocation>;
// for debug printing
static inline std::ostream& operator<<(std::ostream& os, myAST::IdntifierEqArgument const& named) {
return os << named.ident << "=" << named.arg;
}
}
BOOST_FUSION_ADAPT_STRUCT(myAST::IdntifierEqArgument, ident,arg);
BOOST_FUSION_ADAPT_STRUCT(myAST::Invocation, identifier,args);
namespace myNS {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const*>
struct myRules : qi::grammar<Iterator, myAST::Block()> {
myRules() : myRules::base_type(start) {
rIdentifier = qi::raw [(qi::alpha | '_') >> *(qi::alnum | '_')];
rExpression = qi::alpha | qi::int_;
rArgument = (rIdentifier >> '=' >> rExpression) | rExpression;
rArgumentList = -(rArgument % ',');
rInvocation = rIdentifier >> '(' >> rArgumentList >> ')';
rBlock = '{' >> +rInvocation >> '}';
start = qi::skip(qi::space) [ rBlock ];
BOOST_SPIRIT_DEBUG_NODES((start)(rBlock)(rInvocation)(rIdentifier)(rArgumentList)(rArgument)(rExpression))
}
private:
qi::rule<Iterator, myAST::Block()> start;
using Skipper = qi::space_type;
qi::rule<Iterator, myAST::Argument(), Skipper> rArgument;
qi::rule<Iterator, myAST::ArgumentList(), Skipper> rArgumentList;
qi::rule<Iterator, myAST::Invocation(), Skipper> rInvocation;
qi::rule<Iterator, myAST::Block(), Skipper> rBlock;
// implicit lexemes
qi::rule<Iterator, myAST::Identifier()> rIdentifier;
qi::rule<Iterator, myAST::Expression()> rExpression;
};
}
int main() {
std::string const input = R"(
{
foo()
bar(a, b, 42)
qux(someThing_awful01 = 9)
}
)";
auto f = input.data(), l = f + input.size();
myAST::Block block;
bool ok = parse(f, l, myNS::myRules<>{}, block);
if (ok) {
std::cout << "Parse success\n";
for (auto& invocation : block) {
std::cout << invocation.identifier << "(";
for (auto& arg : invocation.args) std::cout << arg << ",";
std::cout << ")\n";
}
}
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints output
Parse success
foo()
bar(a,b,42,)
qux(someThing_awful01=9,)
Remaining unparsed input: '
'

Parsing recursive structure on boost::spirit

I won to parse structure like "text { < > }". Spirit documentation contents similar AST example.
For parsing string like this
<tag1>text1<tag2>text2</tag1></tag2>
this code work:
templ = (tree | text) [_val = _1];
start_tag = '<'
>> !lit('/')
>> lexeme[+(char_- '>') [_val += _1]]
>>'>';
end_tag = "</"
>> string(_r1)
>> '>';
tree = start_tag [at_c<1>(_val) = _1]
>> *templ [push_back(at_c<0>(_val), _1) ]
>> end_tag(at_c<1>(_val) )
;
For parsing string like this
<tag<tag>some_text>
This code not work:
templ = (tree | text) [_val = _1];
tree = '<'
>> *templ [push_back(at_c<0>(_val), _1) ]
>> '>'
;
templ is parsing structure with recursive_wrapper inside:
namespace client {
struct tmp;
typedef boost::variant <
boost::recursive_wrapper<tmp>,
std::string
> tmp_node;
struct tmp {
std::vector<tmp_node> content;
std::string text;
};
}
BOOST_FUSION_ADAPT_STRUCT(
tmp_view::tmp,
(std::vector<tmp_view::tmp_node>, content)
(std::string,text)
)
Who may explain why it happened? Maybe who knows similar parsers wrote on boost::spirit?
Just guessing you didn't actually want to parse XML at all, but rather some kind of mixed-content markup language for hierarchical text, I'd do
simple = +~qi::char_("><");
nested = '<' >> *soup >> '>';
soup = nested|simple;
With the AST/rules defined as
typedef boost::make_recursive_variant<
boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;
qi::rule<It, std::string()> simple;
qi::rule<It, std::vector<tag_soup>()> nested;
qi::rule<It, tag_soup()> soup;
See it Live On Coliru:
//// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <iostream>
#include <fstream>
namespace client
{
typedef boost::make_recursive_variant<
boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;
namespace qi = boost::spirit::qi;
template <typename It>
struct parser : qi::grammar<It, tag_soup()>
{
parser() : parser::base_type(soup)
{
simple = +~qi::char_("><");
nested = '<' >> *soup >> '>';
soup = nested|simple;
BOOST_SPIRIT_DEBUG_NODES((simple)(nested)(soup))
}
private:
qi::rule<It, std::string()> simple;
qi::rule<It, std::vector<tag_soup>()> nested;
qi::rule<It, tag_soup()> soup;
};
}
namespace boost { // leverage ADL on variant<>
static std::ostream& operator<<(std::ostream& os, std::vector<client::tag_soup> const& soup)
{
os << "<";
std::copy(soup.begin(), soup.end(), std::ostream_iterator<client::tag_soup>(os));
return os << ">";
}
}
int main(int argc, char **argv)
{
if (argc < 2) {
std::cerr << "Error: No input file provided.\n";
return 1;
}
std::ifstream in(argv[1]);
std::string const storage(std::istreambuf_iterator<char>(in), {}); // We will read the contents here.
if (!(in || in.eof())) {
std::cerr << "Error: Could not read from input file\n";
return 1;
}
static const client::parser<std::string::const_iterator> p;
client::tag_soup ast; // Our tree
bool ok = parse(storage.begin(), storage.end(), p, ast);
if (ok) std::cout << "Parsing succeeded\nData: " << ast << "\n";
else std::cout << "Parsing failed\n";
return ok? 0 : 1;
}
If you define BOOST_SPIRIT_DEBUG you'll get verbose output of the parsing process.
For the input
<some text with nested <tags <etc...> >more text>
prints
Parsing succeeded
Data: <some text with nested <tags <etc...> >more text>
Note that the output is printed from the variant, not the original text.

Spirit Qi sequence parsing issues

I have some issues with parser writing with Spirit::Qi 2.4.
I have a series of key-value pairs to parse in following format <key name>=<value>.
Key name can be [a-zA-Z0-9] and is always followed by = sign with no white-space between key name and = sign. Key name is also always preceded by at least one space.
Value can be almost any C expression (spaces are possible as well), with the exception of the expressions containing = char and code blocks { }.
At the end of the sequence of the key value pairs there's a { sign.
I struggle a lot with writing parser for this expression. Since the key name always is preceded by at least one space and followed by = and contains no spaces I defined it as
KeyName %= [+char_("a-zA-Z0-9_") >> lit("=")] ;
Value can be almost anything, but it can not contain = nor { chars, so I defined it as:
Value %= +(char_ - char_("{=")) ;
I thought about using look-ahead's like this to catch the value:
ValueExpression
%= (
Value
>> *space
>> &(KeyName | lit("{"))
)
;
But it won't work, for some reason (seems like the ValueExpression greedily goes up to the = sign and "doesn't know" what to do from there). I have limited knowledge of LL parsers, so I'm not really sure what's cooking here. Is there any other way I could tackle this kind of sequence?
Here's example series:
EXP1=FunctionCall(A, B, C) TEST="Example String" \
AnotherArg=__FILENAME__ - 'BlahBlah' EXP2= a+ b+* {
Additional info: since this is a part of a much larger grammar I can't really solve this problem any other way than by a Spirit.Qi parser (like splitting by '=' and doing some custom parsing or something similar).
Edit:
I've created minimum working example here: http://ideone.com/kgYD8
(compiled under VS 2012 with boost 1.50, but should be fine on older setups as well).
I'd suggest you have a look at the article Parsing a List of Key-Value Pairs Using Spirit.Qi.
I've greatly simplified your code, while
adding attribute handling
removing phoenix semantic actions
debugging of rules
Here it is, without further ado:
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
typedef std::map<std::string, std::string> data_t;
template <typename It, typename Skipper>
struct grammar : qi::grammar<It, data_t(), Skipper>
{
grammar() : grammar::base_type(Sequence)
{
using namespace qi;
KeyName = +char_("a-zA-Z0-9_") >> '=';
Value = qi::no_skip [+(~char_("={") - KeyName)];
Sequence = +(KeyName > Value);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, data_t(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName; // no skipper, removes need for qi::lexeme
qi::rule<It, std::string(), Skipper> Value;
};
template <typename Iterator>
data_t parse (Iterator begin, Iterator end)
{
grammar<Iterator, qi::space_type> p;
data_t data;
if (qi::phrase_parse(begin, end, p, qi::space, data)) {
std::cout << "parse ok\n";
if (begin!=end) {
std::cout << "remaining: " << std::string(begin,end) << '\n';
}
} else {
std::cout << "failed: " << std::string(begin,end) << '\n';
}
return data;
}
int main ()
{
std::string test(" ARG=Test still in first ARG ARG2=Zombie cat EXP2=FunctionCall(A, B C) {" );
auto data = parse(test.begin(), test.end());
for (auto& e : data)
std::cout << e.first << "=" << e.second << '\n';
}
Output will be:
parse ok
remaining: {
ARG=Test still in first ARG
ARG2=Zombie cat
EXP2=FunctionCall(A, B C)
If you really wanted '{' to be part of the last value, change this line:
Value = qi::no_skip [+(char_ - KeyName)];