I want to parse header columns of a text file. The column names should be allowed to be quoted and any case of letters. Currently I am using the following grammar:
#include <string>
#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper>
struct Grammar : qi::grammar<Iterator, void(), Skipper>
{
static constexpr char colsep = '|';
Grammar() : Grammar::base_type(header)
{
using namespace qi;
using ascii::char_;
#define COL(name) (no_case[name] | ('"' >> no_case[name] >> '"'))
header = (COL("columna") | COL("column_a")) >> colsep >>
(COL("columnb") | COL("column_b")) >> colsep >>
(COL("columnc") | COL("column_c")) >> eol >> eoi;
#undef COL
}
qi::rule<Iterator, void(), Skipper> header;
};
int main()
{
const std::string s{"columnA|column_B|column_c\n"};
auto begin(std::begin(s)), end(std::end(s));
Grammar<std::string::const_iterator, qi::blank_type> p;
bool ok = qi::phrase_parse(begin, end, p, qi::blank);
if (ok && begin == end)
std::cout << "Header ok" << std::endl;
else if (ok && begin != end)
std::cout << "Remaining unparsed: '" << std::string(begin, end) << "'" << std::endl;
else
std::cout << "Parse failed" << std::endl;
return 0;
}
Is this possible without the use of a macro? Further I would like to ignore any underscores at all. Can this be achieved with a custom skipper? In the end it would be ideal if one could write:
header = col("columna") >> colsep >> col("columnb") >> colsep >> column("columnc") >> eol >> eoi;
where col would be an appropriate grammar or rule.
#sehe how can I fix this grammar to support "\"Column_A\"" as well? 6 hours ago
By this time you should probably have realized that there's two different things going on here.
Separate Yo Concerns
On the one hand you have a grammar (that allows |-separated columns like columna or "Column_A").
On the other hand you have semantic analysis (the phase where you check that the parsed contents match certain criteria).
The thing that is making your life hard is trying to conflate the two. Now, don't get me wrong, there could be (very rare) circumstances where fusing those responsibilities together is absolutely required - but I feel that would always be an optimization. If you need that, Spirit is not your thing, and you're much more likely to be served with a handwritten parser.
Parsing
So let's get brain-dead simple about the grammar:
static auto headers = (quoted|bare) % '|' > (eol|eoi);
The bare and quoted rules can be pretty much the same as before:
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
As you can see this will implicitly take care of quoting and escaping as well whitespace skipping outside lexemes. When applied simply, it will result in a clean list of column names:
std::string const s = "\"columnA\"|column_B| column_c \n";
std::vector<std::string> headers;
bool ok = phrase_parse(begin(s), end(s), Grammar::headers, x3::blank, headers);
std::cout << "Parse " << (ok?"ok":"invalid") << std::endl;
if (ok) for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
Prints Live On Coliru
Parse ok
"columnA"
"column_B"
"column_c"
INTERMEZZO: Coding Style
Let's structure our code so that the separation of concerns is reflected. Our parsing code might use X3, but our validation code doesn't need to be in the same translation unit (cpp file).
Have a header defining some basic types:
#include <string>
#include <vector>
using Header = std::string;
using Headers = std::vector<Header>;
Define the operations we want to perform on them:
Headers parse_headers(std::string const& input);
bool header_match(Header const& actual, Header const& expected);
bool headers_match(Headers const& actual, Headers const& expected);
Now, main can be rewritten as just:
auto headers = parse_headers("\"columnA\"|column_B| column_c \n");
for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
bool valid = headers_match(headers, {"columna","columnb","columnc"});
std::cout << "Validation " << (valid?"passed":"failed") << "\n";
And e.g. a parse_headers.cpp could contain:
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace Grammar {
using namespace x3;
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
static auto headers = (quoted|bare) % '|' > (eol|eoi);
}
Headers parse_headers(std::string const& input) {
Headers output;
if (phrase_parse(begin(input), end(input), Grammar::headers, x3::blank, output))
return output;
return {}; // or throw, if you prefer
}
Validating
This is what is known as "semantic checks". You take the vector of strings and check them according to your logic:
#include <boost/range/adaptors.hpp>
#include <boost/algorithm/string.hpp>
bool header_match(Header const& actual, Header const& expected) {
using namespace boost::adaptors;
auto significant = [](unsigned char ch) {
return ch != '_' && std::isgraph(ch);
};
return boost::algorithm::iequals(actual | filtered(significant), expected);
}
bool headers_match(Headers const& actual, Headers const& expected) {
return boost::equal(actual, expected, header_match);
}
That's all. All the power of algorithms and modern C++ at your disposal, no need to fight with constraints due to parsing context.
Full Demo
The above, Live On Wandbox
Both parts got significantly simpler:
your parser doesn't have to deal with quirky comparison logic
your comparison logic doesn't have to deal with grammar concerns (quotes, escapes, delimiters and whitespace)
I need a comma delimited output from a struct with optionals. For example, if I have this struct:
MyStruct
{
boost::optional<std::string> one;
boost::optional<int> two;
boost::optional<float> three;
};
An output like: { "string", 1, 3.0 } or { "string" } or { 1, 3.0 } and so on.
Now, I have code like this:
struct MyStruct
{
boost::optional<std::string> one;
boost::optional<int> two;
boost::optional<float> three;
};
BOOST_FUSION_ADAPT_STRUCT
(MyStruct,
one,
two,
three)
template<typename Iterator>
struct MyKarmaGrammar : boost::spirit::karma::grammar<Iterator, MyStruct()>
{
MyKarmaGrammar() : MyKarmaGrammar::base_type(request_)
{
using namespace std::literals::string_literals;
namespace karma = boost::spirit::karma;
using karma::int_;
using karma::double_;
using karma::string;
using karma::lit;
using karma::_r1;
key_ = '"' << string(_r1) << '"';
str_prop_ = key_(_r1) << ':'
<< string
;
int_prop_ = key_(_r1) << ':'
<< int_
;
dbl_prop_ = key_(_r1) << ':'
<< double_
;
//REQUEST
request_ = '{'
<< -str_prop_("one"s) <<
-int_prop_("two"s) <<
-dbl_prop_("three"s)
<< '}'
;
}
private:
//GENERAL RULES
boost::spirit::karma::rule<Iterator, void(std::string)> key_;
boost::spirit::karma::rule<Iterator, double(std::string)> dbl_prop_;
boost::spirit::karma::rule<Iterator, int(std::string)> int_prop_;
boost::spirit::karma::rule<Iterator, std::string(std::string)> str_prop_;
//REQUEST
boost::spirit::karma::rule<Iterator, MyStruct()> request_;
};
int main()
{
using namespace std::literals::string_literals;
MyStruct request = {std::string("one"), 2, 3.1};
std::string generated;
std::back_insert_iterator<std::string> sink(generated);
MyKarmaGrammar<std::back_insert_iterator<std::string>> serializer;
boost::spirit::karma::generate(sink, serializer, request);
std::cout << generated << std::endl;
}
This works but I need a comma delimited output. I tried with a grammar like:
request_ = '{'
<< (str_prop_("one"s) |
int_prop_("two"s) |
dbl_prop_("three"s)) % ','
<< '}'
;
But I receive this compile error:
/usr/include/boost/spirit/home/support/container.hpp:194:52: error: no type named ‘const_iterator’ in ‘struct MyStruct’
typedef typename Container::const_iterator type;
thanks!
Your struct is not a container, therefore list-operator% will not work. The documentation states it expects the attribute to be a container type.
So, just like in the Qi counterpart I showed you to create a conditional delim production:
delim = (&qi::lit('}')) | ',';
You'd need something similar here. However, everything about it is reversed. Instead of "detecting" the end of the input sequence from the presence of a {, we need to track the absense of preceding field from "not having output a field since opening brace yet".
That's a bit trickier since the required state cannot come from the same source as the input. We'll use a parser-member for simplicity here¹:
private:
bool _is_first_field;
Now, when we generate the opening brace, we want to initialize that to true:
auto _f = px::ref(_is_first_field); // short-hand
request_ %= lit('{') [ _f = true ]
Note: Use of %= instead of = tells Spirit that we want automatic attribute propagation to happen, in spite of the presence of a Semantic Action ([ _f = true ]).
Now, we need to generate the delimiter:
delim = eps(_f) | ", ";
Simple. Usage is also simple, except we'll want to conditionally reset the _f:
auto reset = boost::proto::deep_copy(eps [ _f = false ]);
str_prop_ %= (delim << key_(_r1) << string << reset) | "";
int_prop_ %= (delim << key_(_r1) << int_ << reset) | "";
dbl_prop_ %= (delim << key_(_r1) << double_ << reset) | "";
A very subtle point here is that I changed to the declared rule attribute types from T to optional<T>. This allows Karma to do the magic to fail the value generator if it's empty (boost::none), and skipping the reset!
ka::rule<Iterator, boost::optional<double>(std::string)> dbl_prop_;
ka::rule<Iterator, boost::optional<int>(std::string)> int_prop_;
ka::rule<Iterator, boost::optional<std::string>(std::string)> str_prop_;
Now, let's put together some testcases:
Test Cases
Live On Coliru
#include "iostream"
#include <boost/optional/optional_io.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
struct MyStruct {
boost::optional<std::string> one;
boost::optional<int> two;
boost::optional<double> three;
};
BOOST_FUSION_ADAPT_STRUCT(MyStruct, one, two, three)
namespace ka = boost::spirit::karma;
namespace px = boost::phoenix;
template<typename Iterator>
struct MyKarmaGrammar : ka::grammar<Iterator, MyStruct()> {
MyKarmaGrammar() : MyKarmaGrammar::base_type(request_) {
using namespace std::literals::string_literals;
using ka::int_;
using ka::double_;
using ka::string;
using ka::lit;
using ka::eps;
using ka::_r1;
auto _f = px::ref(_is_first_field);
auto reset = boost::proto::deep_copy(eps [ _f = false ]);
key_ = '"' << string(_r1) << "\":";
delim = eps(_f) | ", ";
str_prop_ %= (delim << key_(_r1) << string << reset) | "";
int_prop_ %= (delim << key_(_r1) << int_ << reset) | "";
dbl_prop_ %= (delim << key_(_r1) << double_ << reset) | "";
//REQUEST
request_ %= lit('{') [ _f = true ]
<< str_prop_("one"s) <<
int_prop_("two"s) <<
dbl_prop_("three"s)
<< '}';
}
private:
bool _is_first_field = true;
//GENERAL RULES
ka::rule<Iterator, void(std::string)> key_;
ka::rule<Iterator, boost::optional<double>(std::string)> dbl_prop_;
ka::rule<Iterator, boost::optional<int>(std::string)> int_prop_;
ka::rule<Iterator, boost::optional<std::string>(std::string)> str_prop_;
ka::rule<Iterator> delim;
//REQUEST
ka::rule<Iterator, MyStruct()> request_;
};
template <typename T> std::array<boost::optional<T>, 2> option(T const& v) {
return { { v, boost::none } };
}
int main() {
using namespace std::literals::string_literals;
for (auto a : option("one"s))
for (auto b : option(2))
for (auto c : option(3.1))
for (auto request : { MyStruct { a, b, c } }) {
std::string generated;
std::back_insert_iterator<std::string> sink(generated);
MyKarmaGrammar<std::back_insert_iterator<std::string>> serializer;
ka::generate(sink, serializer, request);
std::cout << boost::fusion::as_vector(request) << ":\t" << generated << "\n";
}
}
Printing:
( one 2 3.1): {"one":one, "two":2, "three":3.1}
( one 2 --): {"one":one, "two":2}
( one -- 3.1): {"one":one, "three":3.1}
( one -- --): {"one":one}
(-- 2 3.1): {"two":2, "three":3.1}
(-- 2 --): {"two":2}
(-- -- 3.1): {"three":3.1}
(-- -- --): {}
¹ Note this limits re-entrant use of the parser, as well as making it non-const etc. karma::locals are the true answer to that, adding a little more complexity
Fiddling around to find out about some strange behaviours of my parsers I finally ended with the finding that qi % does not exactly behave as I expect.
First issue: In the verbous documentation a % b gets described as a shortcut for a >> *(b >> a). But it is actually not. This holds only, if you accept the b's to be discarded.
Say simple_id was any parser. Then actually
simple_id % lit(";")
is the same as
simple_id % some_sophisticated_attribute_emitting_parser_expression
because the right hand side expression will be discarded in any case (i.e. does not contribute to any attributes). In detail: The first expression behaves exactly as (for example):
simple_id % string(";")
So string() is semantically equivalent to lit() if certain constraints hold, i.e. both live in the domain of being rh-operands of %. Here is my first question: Do you consider this to be a bug? Or is it a feature? I discussed that on the mailing list and got the answer that it's a feature, because this behaviour is documented (if you go into the very details of the doc). If you do so you find they are right.
I want to be a user of this library. I found that things go easy with qi on higher levels of grammar. But if you get down to bits and bytes and iterator positions, life gets hard. At a point I decided not to trust any longer and to track down into qi code.
It took me just a few minutes to track down my issue inside qi. Once having the responsible code (list.hpp) on the screen, it was obvious to me, that qi % has another issue. This here is the exact semantic of qi %
a % b <- a >> *(b >> a) >> -(b)
In words: It accepts a trailing b (and consumes it) even if it is not followed by an a. This is definitely not documented. Just for fun I looked into the X3 implementation of %. The bug has been migrated and occurs there as well.
Here are my questions: Is my analysis correct? If so, what parser library do you use? Can you recommend one? If I am wrong, where did I fail?
I post these questions because I am not the only one struggling. I hope the infos provided here are helpful.
Below is a self-contained working example demonstrating the issue(s) and the solution for both problems. If you run the example, have a look at the second test in particular. It shows the % consuming the trailing ; (what it should not, I think).
My env: MSVC 2015, Target: Win32 Console, Boost 1.6.1
///////////////////////////////////////////////////////////////////////////
// This is a self-contained demo which compiles with MSVC 2015 to Win32
// console. Therefore it should compile with any modern compiler. :)
//
//
// This demo implements a new qi operator != which does the same as %
// does but without eating up the delimiters (unless they are non-output
// i.e. lit).
//
// The implementation also shows how to fix a bug which makes the current
// qi % operator eat a trailing b. The current implementation accepts
// a >> *(b >> a) >> -(b).
//
//
// I utilize the not_equal_to proto::tag for the alternative % operation
// See the simple rules to compare both operators.
///////////////////////////////////////////////////////////////////////////
//#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include <map>
#include <boost/spirit/repository/include/qi_confix.hpp>
#include <boost/spirit/include/qi.hpp>
// Change the result type to test containers etc.
// You may need to provide an << ostream operator to have output work
using result_type = std::string;
using iterator_type = std::string::const_iterator;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
namespace proto = boost::proto;
namespace maxence { namespace parser {
///////////////////////////////////////////////////////////////////////////////
// The skipper grammar (just skip this section while reading ;)
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator>
struct skipper : qi::grammar<Iterator>
{
skipper() : skipper::base_type(start)
{
qi::char_type char_;
using boost::spirit::eol;
using boost::spirit::repository::confix;
ascii::space_type space;
start =
space // tab/space/cr/lf
| confix("/*", "*/")[*(char_ - "*/")] // C-style comments
| confix("//", eol)[*(char_ - eol)] // C++-style comments
;
}
qi::rule<Iterator> start;
};
}}
namespace boost { namespace spirit {
///////////////////////////////////////////////////////////////////////////
// Enablers
///////////////////////////////////////////////////////////////////////////
template <>
struct use_operator<qi::domain, proto::tag::not_equal_to> // enables p != d
: mpl::true_ {};
}}
namespace ascii = boost::spirit::ascii;
namespace boost { namespace spirit { namespace qi
{
template <typename Left, typename Right>
struct list_ex : binary_parser<list_ex<Left, Right> >
{
typedef Left left_type;
typedef Right right_type;
template <typename Context, typename Iterator>
struct attribute
{
// Build a std::vector from the LHS's attribute. Note
// that build_std_vector may return unused_type if the
// subject's attribute is an unused_type.
typedef typename
traits::build_std_vector<
typename traits::
attribute_of<Left, Context, Iterator>::type
>::type
type;
};
list_ex(Left const& left_, Right const& right_)
: left(left_), right(right_) {}
/////////////////////////////////////////////////////////////////////////
// code from qi % operator
//
// Note: The original qi code accepts a >> *(b >> a) >> -(b)
// That means a trailing delimiter gets consumed
//
// template <typename F>
// bool parse_container(F f) const
// {
// // in order to succeed we need to match at least one element
// if (f(left)) return false;
// typename F::iterator_type save = f.f.first;
//
// // The while clause below is wrong
// // To correct that (not eat trailing delimiters) it should read:
// // while (!(!right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused) && f(left)))
//
// while (right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused) <--- issue!
// && !f(left))
// {
// save = f.f.first;
// }
//
// f.f.first = save;
// return true;
//
/////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////
// replacement to allow operator not to "eat up" the "delimiter"
//
template <typename F>
bool parse_container(F f) const
{
// in order to succeed we need to match at least one element
if (f(left)) return false;
while (!(f(right) && f(left)));
return true;
}
//
/////////////////////////////////////////////////////////////////////////
template <typename Iterator, typename Context
, typename Skipper, typename Attribute>
bool parse(Iterator& first, Iterator const& last
, Context& context, Skipper const& skipper
, Attribute& attr_) const
{
typedef detail::fail_function<Iterator, Context, Skipper>
fail_function;
// ensure the attribute is actually a container type
traits::make_container(attr_);
Iterator iter = first;
fail_function f(iter, last, context, skipper);
if (!parse_container(detail::make_pass_container(f, attr_)))
return false;
first = f.first;
return true;
}
template <typename Context>
info what(Context& context) const
{
return info("list_ex",
std::make_pair(left.what(context), right.what(context)));
}
Left left;
Right right;
};
///////////////////////////////////////////////////////////////////////////
// Parser generators: make_xxx function (objects)
///////////////////////////////////////////////////////////////////////////
template <typename Elements, typename Modifiers>
struct make_composite<proto::tag::not_equal_to, Elements, Modifiers>
: make_binary_composite<Elements, list_ex>
{};
}}}
namespace boost { namespace spirit { namespace traits {
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right>
struct has_semantic_action<qi::list_ex<Left, Right> >
: binary_has_semantic_action<Left, Right> {};
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right, typename Attribute
, typename Context, typename Iterator>
struct handles_container<qi::list_ex<Left, Right>, Attribute, Context
, Iterator>
: mpl::true_ {};
}}}
using rule_type = qi::rule <iterator_type, result_type(), maxence::parser::skipper<iterator_type>>;
namespace maxence { namespace parser {
template <typename Iterator>
struct ident : qi::grammar < Iterator, result_type() , skipper<Iterator >>
{
ident();
rule_type not_equal_to, modulus, not_used;
};
// we actually don't need the start rule (see below)
template <typename Iterator>
ident<Iterator>::ident() : ident::base_type(not_equal_to)
{
not_equal_to = (qi::alpha | '_') >> *(qi::alnum | '_') != qi::char_(";");
modulus = (qi::alpha | '_') >> *(qi::alnum | '_') % qi::char_(";");
modulus.name("qi modulus operator");
BOOST_SPIRIT_DEBUG_NODES(
(not_equal_to)
)
}
}}
int main()
{
namespace parser = maxence::parser;
using rule_map_type = std::map<std::string, rule_type&>;
using rule_iterator_type = std::map<std::string, rule_type&>::const_iterator;
using ss_map_type = std::map<std::string, std::string>;
using ss_iterator_type = ss_map_type::const_iterator;
parser::ident<iterator_type> ident;
parser::skipper<iterator_type> skipper;
ss_map_type parser_input =
{
{ "; delimited list without trailing delimiter \n(expected result: success, EOI reached)", "willy; anton" },
{ "; delimited list with trailing delimiter \n(expected result: success, EOI not reached)", "willy; anton;" }
};
rule_map_type rules =
{
{ "E1", ident.not_equal_to },
{ "E2", ident.modulus }
};
for (ss_iterator_type input = parser_input.begin(); input != parser_input.end(); input++) {
for (rule_iterator_type example = rules.begin(); example != rules.end(); example++) {
std::string to_parse = input->second;
::result_type result;
std::string parser_name = (example->second).name();
std::cout << "--------------------------------------------" << std::endl;
std::cout << "Description: " << input->first << std::endl;
std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
auto b(to_parse.begin()), e(to_parse.end());
bool success = qi::phrase_parse(b, e, (example)->second, skipper, result);
// --- test for parser success
if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
else std::cout << " Parser failed. " << std::endl;
//--- test for EOI
if (b == e) {
std::cout << "EOI reached.";
} else {
std::cout << "Failure: EOI not reached. Remaining: [";
while (b != e) std::cout << *b++; std::cout << "]";
}
std::cout << std::endl << "--------------------------------------------" << std::endl;
}
}
return 0;
}
Extension: Because of the comments I extend my post:
My != operator is different from the % operator . The != operator would add all the 'delimiters' found to the result vector. (a != qi::char_(";,")). To introduce my proposal to % would discard useful functionality.
Maybe there is a justification to introduce an additional operator. I think I should use another operator for that, != hurts my eyes. Anyway, the != operator has nice applications also. For example:
settings_list = name != expression;
I thought it is wrong that % does not eat trailing 'delimiters'. My code example above seemed to demonstrate that. Anyway, I stripped the example down to focus on that issue only. Now I know that missing ; are sitting happily somewhere in the Carribean having a Caipirinha. Better than being eaten. :)
The example below eats the trailing 'delimiter', because it's not really trailing. The issue was my test string. The Kleene star has a zero match after the last ;. Therefore it gets eaten which is correct behaviour.
I learned much about qi during this 'trip'. More than from the docs. Most important lesson learned: Shape your test cases carefully. A did a quick copy&paste from some example without thought. That introduced the problems.
#include <iostream>
#include <map>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
using iterator_type = std::string::const_iterator;
using result_type = std::string;
template <typename Parser>
void parse(const std::string message, const std::string& input, const Parser& parser)
{
iterator_type iter = input.begin(), end = input.end();
std::vector<result_type> parsed_result;
std::cout << "-------------------------\n";
std::cout << message << "\n";
std::cout << "Parsing: \"" << input << "\"\n";
bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
if (result)
{
std::cout << "Parser succeeded.\n";
std::cout << "Parsed " << parsed_result.size() << " elements:";
for (const auto& str : parsed_result)
std::cout << "[" << str << "]";
std::cout << std::endl;
}
else
{
std::cout << "Something failed. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
}
if (iter == end) {
std::cout << "EOI reached." << std::endl;
}
else {
std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
}
std::cout << "-------------------------\n";
}
int main()
{
auto r1 = (*(qi::alpha | '_')) % qi::char_(";");
auto r2 = qi::as_string[*(qi::alpha | '_')] % qi::char_(";");
parse("% eating the trailing delimiter 'delimiter'",
"willy; anton; 1234", r1);
parse("% eating the trailing 'delimiter' (limited as_string edition)'",
"willy; anton; 1234", r2);
return 0;
}
Here are the answers to all of the questions.
(1) My analysis was incorrect. The % operator does not eat trailing 'delimiters'. The real problem was the parsing rule beeing a Kleene star rule. This rule matched did not find an identifier after the last 'delimiter', but it matched zero. So it is perfectly ok that % consumes the 'delimiter'.
(2) I am not looking for a qi alternative currently.
(3) The current implementation of % does not 'discard' the b of a % b. If in deed you have
simple_id % some_sophisticated_attribute_emitting_parser_expression
then the sophisticated thingy (which may be dynamic (like char_("+-*/")) must match for % to continue. My proposed change to % would break this feature.
To have %= (see below) operate like % you'd have to use (a %= qi::omit[b]). This mimics a % b almost completely. The difference remains that %= intentionally eats the 'trailing delimiter'. There is an example for that in the code below.
Therefore %= can not be taken as a superset of %.
If qi should be extended by an operator which provides the functionality I requested is a discussion I do not want to promote. Regarding parser functionality qi is easy extensible, so that you can produce additional parsers to your liking.
That compilers are allergic to qi 2.x with auto is another topic. More complex. I never thought, that in particular I with my MSVC 2015 environment would ever be on the non-crashing side of life.
Anyway, I owe you what for having me insisting so much so stupidly. The code below provides an implementation of a %= operator (modulus_assign) for qi. It is implemented as list2 living in the mxc::qitoo namespace. I marked the header start and end if somebody finds it valuable and wants to use it.
The main function is a show case demonstrating the commons and differences between the two operators. And showing once more that Kleene star is wild creature.
#include <iostream>
#include <map>
///////////////////////////
// start: header list2.hpp
///////////////////////////
#pragma once
#include <boost/spirit/include/qi.hpp>
namespace boost {
namespace spirit {
///////////////////////////////////////////////////////////////////////////
// Enablers
///////////////////////////////////////////////////////////////////////////
template <>
struct use_operator<qi::domain, proto::tag::modulus_assign> // enables p %= d
: mpl::true_ {};
}
}
namespace mxc {
namespace qitoo {
namespace spirit = boost::spirit;
namespace qi = spirit::qi;
template <typename Left, typename Right>
struct list2 : qi::binary_parser<list2<Left, Right> >
{
typedef Left left_type;
typedef Right right_type;
template <typename Context, typename Iterator>
struct attribute
{
// Build a std::vector from the LHS's and RHS's attribute. Note
// that build_std_vector may return unused_type if the
// subject's attribute is an unused_type.
typedef typename
spirit::traits::build_std_vector<
typename spirit::traits::attribute_of<Left, Context, Iterator>::type>::type type;
};
list2(Left const& left_, Right const& right_) : left(left_), right(right_) {}
template <typename F>
bool parse_container(F f) const
{
typename F::iterator_type save = f.f.first;
// we need a first left match at least
if (f(left)) return false;
// if right does not match rewind iterator and fail
if (f(right)) {
f.f.first = save;
return false;
}
// easy going
while (!f(left) && !f(right))
{
save = f.f.first;
}
f.f.first = save;
return true;
}
template <typename Iterator, typename Context, typename Skipper, typename Attribute>
bool parse(Iterator& first, Iterator const& last, Context& context, Skipper const& skipper, Attribute& attr_) const
{
typedef qi::detail::fail_function<Iterator, Context, Skipper>
fail_function;
// ensure the attribute is actually a container type
spirit::traits::make_container(attr_);
Iterator iter = first;
fail_function f(iter, last, context, skipper);
if (!parse_container(qi::detail::make_pass_container(f, attr_)))
return false;
first = f.first;
return true;
}
template <typename Context>
qi::info what(Context& context) const
{
return qi::info("list2",
std::make_pair(left.what(context), right.what(context)));
}
Left left;
Right right;
};
}
}
namespace boost {
namespace spirit {
namespace qi {
///////////////////////////////////////////////////////////////////////////
// Parser generators: make_xxx function (objects)
///////////////////////////////////////////////////////////////////////////
template <typename Elements, typename Modifiers>
struct make_composite<proto::tag::modulus_assign, Elements, Modifiers>
: make_binary_composite<Elements, mxc::qitoo::list2>
{};
}
namespace traits
{
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right>
struct has_semantic_action<mxc::qitoo::list2<Left, Right> >
: binary_has_semantic_action<Left, Right> {};
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right, typename Attribute
, typename Context, typename Iterator>
struct handles_container<mxc::qitoo::list2<Left, Right>, Attribute, Context
, Iterator>
: mpl::true_ {};
}
}
}
///////////////////////////
// end: header list2.hpp
///////////////////////////
namespace qi = boost::spirit::qi;
namespace qitoo = mxc::qitoo;
using iterator_type = std::string::const_iterator;
using result_type = std::string;
template <typename Parser>
void parse(const std::string message, const std::string& input, const std::string& rule, const Parser& parser)
{
iterator_type iter = input.begin(), end = input.end();
std::vector<result_type> parsed_result;
std::cout << "-------------------------\n";
std::cout << message << "\n";
std::cout << "Rule: " << rule << std::endl;
std::cout << "Parsing: \"" << input << "\"\n";
bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
if (result)
{
std::cout << "Parser succeeded.\n";
std::cout << "Parsed " << parsed_result.size() << " elements:";
for (const auto& str : parsed_result)
std::cout << "[" << str << "]";
std::cout << std::endl;
}
else
{
std::cout << "Parser failed" << std::endl;
}
if (iter == end) {
std::cout << "EOI reached." << std::endl;
}
else {
std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
}
std::cout << "-------------------------\n";
}
int main()
{
parse("Modulus-Assign Operator (%), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "(+(qi::alpha | qi::char_('_'))) % qi::char_(\";,\"))"
, (+(qi::alpha | qi::char_('_'))) % qi::char_(";,"));
parse("Modulus-Assign Operator (%=), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "(+(qi::alpha | qi::char_('_'))) %= qi::char_(\";,\"))"
, (+(qi::alpha | qi::char_('_'))) %= qi::char_(";,"));
parse("Modulus-Assign Operator (%), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(\";,\"))"
, ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(";,"));
parse("Modulus-Assign Operator (%=), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(\";,\"))"
, ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(";,"));
std::cout << std::endl << "Note that %= exposes the trailing 'delimiter' and it has to to enable this usage:" << std::endl;
parse("Modulus-Assign Operator (%=), list with several different 'delimiters'\n using omit to mimic %"
, "willy; anton; frank, joel, 1234"
, "+(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(\";,\"))]"
, +(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(";,")]);
parse("Modulus Operator (%), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
, "x = 5; y = 7; z = 10; = 7;"
, "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
, *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
parse("Modulus-Assign Operator (%=), list of assignments (*bio hazard edition*)\nBe careful with the Kleene star, Eugene!"
, "x = 5; y = 7; z = 10; = 7;"
, "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
, *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
parse("Modulus-Assign Operator (%=), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
, "x = 5; y = 7; z = 10; = 7;"
, "+(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
, +(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
return 0;
}
In my Boost Spirit grammar I would like to have a rule that does this:
rule<...> noCaseLit = no_case[ lit( "KEYWORD" ) ];
but for a custom keyword so that I can do this:
... >> noCaseLit( "SomeSpecialKeyword" ) >> ... >> noCaseLit( "OtherSpecialKeyword1" )
Is this possible with Boost Spirit rules and if so how?
P.S. I use the case insensitive thing as an example, what I'm after is rule parameterization in general.
Edits:
Through the link provided by 'sehe' in the comments I was able to come close to what I wanted but I'm not quite there yet.
/* Defining the noCaseLit rule */
rule<Iterator, string(string)> noCaseLit = no_case[lit(_r1)];
/* Using the noCaseLit rule */
rule<...> someRule = ... >> noCaseLit(phx::val("SomeSpecialKeyword")) >> ...
I haven't yet figured out a way to automatically convert the literal string to the Phoenix value so that I can use the rule like this:
rule<...> someRule = ... >> noCaseLit("SomeSpecialKeyword") >> ...
The easiest way is to simply create a function that returns your rule/parser. In the example near the end of this page you can find a way to declare the return value of your function. (The same here in a commented example).
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace ascii = boost::spirit::ascii;
namespace qi = boost::spirit::qi;
typedef boost::proto::result_of::deep_copy<
BOOST_TYPEOF(ascii::no_case[qi::lit(std::string())])
>::type nocaselit_return_type;
nocaselit_return_type nocaselit(const std::string& keyword)
{
return boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]);
}
//C++11 VERSION EASIER TO MODIFY (AND DOESN'T REQUIRE THE TYPEDEF)
//auto nocaselit(const std::string& keyword) -> decltype(boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]))
//{
// return boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]);
//}
int main()
{
std::string test1="MyKeYWoRD";
std::string::const_iterator iter=test1.begin();
std::string::const_iterator end=test1.end();
if(qi::parse(iter,end,nocaselit("mYkEywOrd"))&& (iter==end))
std::cout << "Parse 1 Successful" << std::endl;
else
std::cout << "Parse 2 Failed. Remaining: " << std::string(iter,end) << std::endl;
qi::rule<std::string::const_iterator,ascii::space_type> myrule =
*(
( nocaselit("double") >> ':' >> qi::double_ )
| ( nocaselit("keyword") >> '-' >> *(qi::char_ - '.') >> '.')
);
std::string test2=" DOUBLE : 3.5 KEYWORD-whatever.Double :2.5";
iter=test2.begin();
end=test2.end();
if(qi::phrase_parse(iter,end,myrule,ascii::space)&& (iter==end))
std::cout << "Parse 2 Successful" << std::endl;
else
std::cout << "Parse 2 Failed. Remaining: " << std::string(iter,end) << std::endl;
return 0;
}
I have a very simple path construct that I am trying to parse with boost spirit.lex.
We have the following grammar:
token := [a-z]+
path := (token : path) | (token)
So we're just talking about colon separated lower-case ASCII strings here.
I have three examples "xyz", "abc:xyz", "abc:xyz:".
The first two should be deemed valid. The third one, which has a trailing colon, should not be deemed valid. Unfortunately the parser I have recognizes all three as being valid. The grammar should not allow an empty token, but apparently spirit is doing just that. What am I missing to get the third one rejected?
Also, if you read the code below, in comments there is another version of the parser that demands that all paths end with semi-colons. I can get appropriate behavior when I activate those lines, (i.e. rejection of "abc:xyz:;"), but this is not really what I want.
Anyone have any ideas?
Thanks.
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
using namespace boost::spirit;
using boost::phoenix::val;
template<typename Lexer>
struct PathTokens : boost::spirit::lex::lexer<Lexer>
{
PathTokens()
{
identifier = "[a-z]+";
separator = ":";
this->self.add
(identifier)
(separator)
(';')
;
}
boost::spirit::lex::token_def<std::string> identifier, separator;
};
template <typename Iterator>
struct PathGrammar
: boost::spirit::qi::grammar<Iterator>
{
template <typename TokenDef>
PathGrammar(TokenDef const& tok)
: PathGrammar::base_type(path)
{
using boost::spirit::_val;
path
=
(token >> tok.separator >> path)[std::cerr << _1 << "\n"]
|
//(token >> ';')[std::cerr << _1 << "\n"]
(token)[std::cerr << _1 << "\n"]
;
token
= (tok.identifier) [_val=_1]
;
}
boost::spirit::qi::rule<Iterator> path;
boost::spirit::qi::rule<Iterator, std::string()> token;
};
int main()
{
typedef std::string::iterator BaseIteratorType;
typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::string> > TokenType;
typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
typedef PathTokens<LexerType>::iterator_type TokensIterator;
typedef std::vector<std::string> Tests;
Tests paths;
paths.push_back("abc");
paths.push_back("abc:xyz");
paths.push_back("abc:xyz:");
/*
paths.clear();
paths.push_back("abc;");
paths.push_back("abc:xyz;");
paths.push_back("abc:xyz:;");
*/
for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
{
std::string str = *iter;
std::cerr << "*****" << str << "*****\n";
PathTokens<LexerType> tokens;
PathGrammar<TokensIterator> grammar(tokens);
BaseIteratorType first = str.begin();
BaseIteratorType last = str.end();
bool r = boost::spirit::lex::tokenize_and_parse(first, last, tokens, grammar);
std::cerr << r << " " << (first==last) << "\n";
}
}
I addition to to what llonesmiz already said, here's a trick using qi::eoi that I sometimes use:
path = (
(token >> tok.separator >> path) [std::cerr << _1 << "\n"]
| token [std::cerr << _1 << "\n"]
) >> eoi;
This makes the grammar require eoi (end-of-input) at the end of a successful match. This leads to the desired result:
http://liveworkspace.org/code/23a7adb11889bbb2825097d7c553f71d
*****abc*****
abc
1 1
*****abc:xyz*****
xyz
abc
1 1
*****abc:xyz:*****
xyz
abc
0 1
The problem lies in the meaning of first and last after your call to tokenize_and_parse. first==last checks if your string has been completely tokenized, you can't infer anything about grammar. If you isolate the parsing like this, you obtain the expected result:
PathTokens<LexerType> tokens;
PathGrammar<TokensIterator> grammar(tokens);
BaseIteratorType first = str.begin();
BaseIteratorType last = str.end();
LexerType::iterator_type lexfirst = tokens.begin(first,last);
LexerType::iterator_type lexlast = tokens.end();
bool r = parse(lexfirst, lexlast, grammar);
std::cerr << r << " " << (lexfirst==lexlast) << "\n";
This is what I finally ended up with. It uses the suggestions from both #sehe and #llonesmiz. Note the conversion to std::wstring and the use of actions in the grammar definition, which were not present in the original post.
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/bind.hpp>
#include <iostream>
#include <string>
//
// This example uses boost spirit to parse a simple
// colon-delimited grammar.
//
// The grammar we want to recognize is:
// identifier := [a-z]+
// separator = :
// path= (identifier separator path) | identifier
//
// From the boost spirit perspective this example shows
// a few things I found hard to come by when building my
// first parser.
// 1. How to flag an incomplete token at the end of input
// as an error. (use of boost::spirit::eoi)
// 2. How to bind an action on an instance of an object
// that is taken as input to the parser.
// 3. Use of std::wstring.
// 4. Use of the lexer iterator.
//
// This using directive will cause issues with boost::bind
// when referencing placeholders such as _1.
// using namespace boost::spirit;
//! A class that tokenizes our input.
template<typename Lexer>
struct Tokens : boost::spirit::lex::lexer<Lexer>
{
Tokens()
{
identifier = L"[a-z]+";
separator = L":";
this->self.add
(identifier)
(separator)
;
}
boost::spirit::lex::token_def<std::wstring, wchar_t> identifier, separator;
};
//! This class provides a callback that echoes strings to stderr.
struct Echo
{
void echo(boost::fusion::vector<std::wstring> const& t) const
{
using namespace boost::fusion;
std::wcerr << at_c<0>(t) << "\n";
}
};
//! The definition of our grammar, as described above.
template <typename Iterator>
struct Grammar : boost::spirit::qi::grammar<Iterator>
{
template <typename TokenDef>
Grammar(TokenDef const& tok, Echo const& e)
: Grammar::base_type(path)
{
using boost::spirit::_val;
path
=
((token >> tok.separator >> path)[boost::bind(&Echo::echo, e,::_1)]
|
(token)[boost::bind(&Echo::echo, &e, ::_1)]
) >> boost::spirit::eoi; // Look for end of input.
token
= (tok.identifier) [_val=boost::spirit::qi::_1]
;
}
boost::spirit::qi::rule<Iterator> path;
boost::spirit::qi::rule<Iterator, std::wstring()> token;
};
int main()
{
// A set of typedefs to make things a little clearer. This stuff is
// well described in the boost spirit documentation/examples.
typedef std::wstring::iterator BaseIteratorType;
typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::wstring> > TokenType;
typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
typedef Tokens<LexerType>::iterator_type TokensIterator;
typedef LexerType::iterator_type LexerIterator;
// Define some paths to parse.
typedef std::vector<std::wstring> Tests;
Tests paths;
paths.push_back(L"abc");
paths.push_back(L"abc:xyz");
paths.push_back(L"abc:xyz:");
paths.push_back(L":");
// Parse 'em.
for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
{
std::wstring str = *iter;
std::wcerr << L"*****" << str << L"*****\n";
Echo e;
Tokens<LexerType> tokens;
Grammar<TokensIterator> grammar(tokens, e);
BaseIteratorType first = str.begin();
BaseIteratorType last = str.end();
// Have the lexer consume our string.
LexerIterator lexFirst = tokens.begin(first, last);
LexerIterator lexLast = tokens.end();
// Have the parser consume the output of the lexer.
bool r = boost::spirit::qi::parse(lexFirst, lexLast, grammar);
// Print the status and whether or note all output of the lexer
// was processed.
std::wcerr << r << L" " << (lexFirst==lexLast) << L"\n";
}
}