Boost.Spirit X3 Alternative Operator - c++

I have the following code:
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
struct printer {
template <typename int_type>
void operator()(std::vector<int_type> &vec) {
std::cout << "vec(" << sizeof(int_type) << "): { ";
for( auto const &elem : vec ){
std::cout << elem << ", ";
}
std::cout << "}\n";
}
};
template <typename Iterator>
void parse_int_list(Iterator first, Iterator last) {
namespace x3 = boost::spirit::x3;
x3::variant<vector<uint32_t>, vector<uint64_t>> vecs;
x3::parse( first, last,
(x3::uint32 % '|') | (x3::uint64 % '|'), vecs );
boost::apply_visitor(printer{}, vecs);
}
I expected this to first try parsing input into a 32 bit uint vector, then if that failed into a 64 bit uint vector. This works great if the first integer in the list matches a type that is large enough for anything else in the list. I.e.,
string ints32 = "1|2|3";
parse_int_list(being(ints32), end(ints32))
// prints vec(4): { 1, 2, 3, }
string ints64 = "10000000000|20000000000|30000000000";
parse_int_list(being(ints64), end(ints64))
// prints vec(8): { 10000000000, 20000000000, 30000000000, }
However it does not work when the first number is a 32 bit and a later number is a 64 bit.
string ints_mixed = "1|20000000000|30000000000";
parse_int_list(being(ints_mixed), end(ints_mixed))
// prints vec(4): { 1, }
The return value of x3::parse indicates a parse failure. But according to my read of the documentation it should try the second alternative if it can't parse the with the first.
Any pointers on how I'm reading this incorrectly, and how the alternative parser actually works?
Edit: After seeing the responses, I realized that x3::parse was actually returning a parse success. I was checking that it had parsed the entire stream, first == last, to determine success, as demonstrated in the documentation. However, this hides the fact that due to the greedy nature of klean star and not anchoring to the end of stream, it was successfully able to parse a portion of the input. Thanks all.

The issue here is that "3" is a valid input for the (x3::uint32 % '|') parser, so the first branch of the alternative passes, consuming only the 3.
The cleanest way for you to fix this would be to have a list of alternatives instead of an alternative of lists.
i.e.:
(x3::uint32 | x3::uint64) % '|'
However, that would mean you would have to parse in a different structure.
vector<x3::variant<uint32_t,uint64_t>> vecs;
Edit:
Alternatively, if you do not intend to use this parser as a sub-parser, you can force a end-of-input in each branch.
(x3::uint32 % '|' >> x3::eoi) | (x3::uint64 % '|' >> x3::eoi)
This would force the first branch to fail if it does not reach the end of the stream, dropping into the alternative.

As Frank commented, the issue with the Kleene list operator being greedy, accepting as many elements as will match, and considering that a "match".
If you want it to reject input if "some elements have not been parsed", make it so:
parse(first, last, x3::uint32 % '|' >> x3::eoi | x3::uint64 % '|' >> x3::eoi, vecs);
Demo
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iostream>
struct printer {
template <typename int_type> void operator()(std::vector<int_type> &vec) const {
std::cout << "vec(" << sizeof(int_type) << "): { ";
for (auto const &elem : vec) {
std::cout << elem << ", ";
}
std::cout << "}\n";
}
};
template <typename Iterator> void parse_int_list(Iterator first, Iterator last) {
namespace x3 = boost::spirit::x3;
boost::variant<std::vector<uint32_t>, std::vector<uint64_t> > vecs;
parse(first, last, x3::uint32 % '|' >> x3::eoi | x3::uint64 % '|' >> x3::eoi, vecs);
apply_visitor(printer{}, vecs);
}
int main() {
for (std::string const input : {
"1|2|3",
"4294967295",
"4294967296",
"4294967295|4294967296",
}) {
parse_int_list(input.begin(), input.end());
}
}
Prints
vec(4): { 1, 2, 3, }
vec(4): { 4294967295, }
vec(8): { 4294967296, }
vec(8): { 4294967295, 4294967296, }

Related

Boost Spirit X3: skip parser that would do nothing

I'm getting myself familiarized with boost spirit v3. The question I want to ask is how to state the fact that you don't want to use skip parser in any way.
Consider a simple example of parsing comma-separated sequence of integers:
#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/home/x3.hpp>
int main()
{
using namespace boost::spirit::x3;
const std::string input{"2,4,5"};
const auto parser = int_ % ',';
std::vector<int> numbers;
auto start = input.cbegin();
auto r = phrase_parse(start, input.end(), parser, space, numbers);
if(r && start == input.cend())
{
// success
for(const auto &item: numbers)
std::cout << item << std::endl;
return 0;
}
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
This works totally fine. However, I would like to forbid having spaces in between (i.e. "2, 4,5" should not be parsed well).
I tried using eps as a skip parser in phrase_parse, but as you can guess, the program ended up in the infinite loop because eps matches to an empty string.
Solution I found is to use no_skip directive (https://www.boost.org/doc/libs/1_75_0/libs/spirit/doc/html/spirit/qi/reference/directive/no_skip.html). So the parser now becomes:
const auto parser = no_skip[int_ % ','];
This works fine, but I don't find it to be an elegant solution (especially providing "space" parser in phrase_parse when I want no whitespace skips). Are there no skip parsers that would simply do nothing? Am I missing something?
Thanks for Your time. Looking forward to any replies.
You can use either no_skip[] or lexeme[]. They're almost identical, except for pre-skip (Boost Spirit lexeme vs no_skip).
Are there no skip parsers that would simply do nothing? Am I missing something?
A wild guess, but you might be missing the parse API that doesn't accept a skipper in the first place
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
std::string const input{ "2,4,5" };
auto f = begin(input), l = end(input);
const auto parser = x3::int_ % ',';
std::vector<int> numbers;
auto r = parse(f, l, parser, numbers);
if (r) {
// success
for (const auto& item : numbers)
std::cout << item << std::endl;
} else {
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
if (f!=l) {
std::cout << "Remaining input " << std::quoted(std::string(f,l)) << "\n";
return 2;
}
}
Prints
2
4
5

Unintelligible compilation error for a simple X3 grammar

I have a quite simple grammar I try to implement using boost spirit x3, without success.
It does not compile, and due to all the templates and complex concepts used in the library (I know, it is rather a "header"), the compilation error message is way too long to be intelligible.
I tried to comment part of the code in order narrow down the culprit, without success as it comes down to several parts, for which I don't see any error anyway.
Edit2: the first error message is in indeed in push_front_impl.hpp highlighting that:
::REQUESTED_PUSH_FRONT_SPECIALISATION_FOR_SEQUENCE_DOES_NOT_EXIST::*
I suspect the keyword auto or maybe the p2 statement with ulong_long...but with no faith.
Need the help of you guys...spirit's elites !
Below a minimal code snippet reproducing the compilation error.
Edit: using boost 1.70 and visual studio 2019 v16.1.6
#include <string>
#include <iostream>
#include "boost/spirit/home/x3.hpp"
#include "boost/spirit/include/support_istream_iterator.hpp"
int main(void)
{
       std::string input = \
             "\"nodes\":{ {\"type\":\"bb\", \"id\" : 123456567, \"label\" : \"0x12023049\"}," \
                         "{\"type\":\"bb\", \"id\" : 123123123, \"label\" : \"0x01223234\"}," \
                         "{\"type\":\"ib\", \"id\" : 223092343, \"label\" : \"0x03020343\"}}";
       std::istringstream iss(input);
       namespace x3 = boost::spirit::x3;
       using x3::char_;
       using x3::ulong_long;
       using x3::lit;
 
       auto q = lit('\"'); /* q => quote */
 
       auto p1 = q >> lit("type") >> q >> lit(':') >> q >> (lit("bb") | lit("ib")) >> q;
       auto p2 = q >> lit("id") >> q >> lit(':') >> ulong_long;
       auto p3 = q >> lit("label") >> q >> lit(':') >> q >> (+x3::alpha) >> q;
       auto node =  lit('{') >> p1 >> lit(',') >> p2 >> lit(',') >> p3 >> lit('}');
       auto nodes = q >> lit("nodes") >> q >> lit(':') >> lit('{') >> node % lit(',') >> lit('}');
 
       boost::spirit::istream_iterator f(iss >> std::noskipws), l{};
       bool b = x3::phrase_parse(f, l, nodes, x3::space);
 
       return 0;
}
It is an known MPL limitation (Issue with X3 and MS VS2017, https://github.com/boostorg/spirit/issues/515) + bug/difference of implementation for MSVC/ICC compilers (https://github.com/boostorg/mpl/issues/43).
I rewrote an offending part without using MPL (https://github.com/boostorg/spirit/pull/607), it will be released in Boost 1.74, until then you should be able to workaround with:
#define BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
#define BOOST_MPL_LIMIT_VECTOR_SIZE 50
Alternatively you could wrap different parts of your grammar into rules, what will reduce sequence parser chain.
Note that q >> lit("x") >> q >> lit(':') >> ... probably is not what you really want, it (with a skipper) will allow " x ": to be parsed. If you do not want that use simply lit("\"x\"") >> lit(':') >> ...
There's a chance that there might be a missing indirect include for your specific platform/version (if I had to guess it might be caused by using the istream iterator support header from Qi).
If that's not the issue, my attention is drawn by the where T = boost::mpl::aux::vector_tag<20> (/HT #Rup - number 20 seems suspiciously like it might be some kind of limit.
Either we can find what trips the limit and see if we can raise it, but I'll do the "unscientific" approach in the interest of helping you along with the parser.
Simplifying The Expressions
I see a lot (lot) of lit() nodes in your parser expressions that you don't need. I suspect all the quoted constructs need to be lexemes, and instead of painstakingly repeating the quote symbol all over the place, perhaps package it as follows:
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
Notes:
I improved the naming so it's more natural to read:
auto node = '{' >> type >> ',' >> id >> ',' >> label >> '}';
I changed alpha to alnum so it would actually match your sample input
Hypothesis: The expressions are structurally simplified to be more hierarchical - the sequences consist of fewer >>-ed terms - the hope is that this removes a potential mpl::vector size limit.
There's one missing piece, bb_ib that I left out because it changes when you want to actually assign parsed values to attributes. Let's do that:
Attributes
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
As you can see I opted for an enum to represent type. The most natural way to parse that would be using symbols<>
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
Now you can parse into a vector of Node:
Demo
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
namespace { // debug output
inline std::ostream& operator<<(std::ostream& os, Node::Type t) {
switch (t) {
case Node::bb: return os << "bb";
case Node::ib: return os << "ib";
}
return os << "?";
}
inline std::ostream& operator<<(std::ostream& os, Node const& n) {
return os << "Node{" << n.type << ", " << n.id << ", " << std::quoted(n.label) << "}";
}
}
// attribute propagation
BOOST_FUSION_ADAPT_STRUCT(Node, type, id, label)
int main() {
std::string input = R"("nodes": {
{
"type": "bb",
"id": 123456567,
"label": "0x12023049"
},
{
"type": "bb",
"id": 123123123,
"label": "0x01223234"
},
{
"type": "ib",
"id": 223092343,
"label": "0x03020343"
}
})";
namespace x3 = boost::spirit::x3;
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
auto node
= x3::rule<Node, Node> {"node"}
= '{' >> type >> ',' >> id >> ',' >> label >> '}';
auto nodes = q("nodes") >> ':' >> '{' >> node % ',' >> '}';
std::vector<Node> parsed;
auto f = begin(input);
auto l = end(input);
if (x3::phrase_parse(f, l, nodes, x3::space, parsed)) {
for (Node& node : parsed) {
std::cout << node << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
Prints
Node{bb, 123456567, "0x12023049"}
Node{bb, 123123123, "0x01223234"}
Node{ib, 223092343, "0x03020343"}

X3: Is this parser, *(char - eol), consuming any and all lines?

I'm basing my app off this example and getting the exact same results. For some reason, the contents of the input string are all parsed into the fusion struct 'comments', and nothing is parsed into the fusion struct 'numbers'. So not sure where I'm going wrong here.
namespace client {
namespace ast {
struct number {
int num1;
int num2;
};
struct comment {
std::string text;
bool dummy;
};
struct input {
std::vector<comment> comments;
std::vector<number> numbers;
};
}
}
BOOST_FUSION_ADAPT_STRUCT(client::ast::comment, text, dummy)
BOOST_FUSION_ADAPT_STRUCT(client::ast::number, num1, num2)
BOOST_FUSION_ADAPT_STRUCT(client::ast::input, comments, numbers)
namespace client {
namespace parser {
namespace x3 = boost::spirit::x3;
using namespace x3;
x3::attr_gen dummy;
typedef std::string::const_iterator It;
using namespace x3;
auto const comment = *(char_ - eol) >> dummy(false);
auto const number = int_ >> int_;
auto lines = [](auto p) { return *(p >> eol); };
auto const input =
lines(comment) >>
lines(number);
}
}
int main()
{
namespace x3 = boost::spirit::x3;
std::string const iss("any char string here\n1 2\n");
auto iter = iss.begin(), eof = iss.end();
client::ast::input types;
bool ok = parse(iter, eof, client::parser::input, types);
if (iter != eof) {
std::cout << "Remaining unparsed: '" << std::string(iter, eof) << "'\n";
}
std::cout << "Parsed: " << (100.0 * std::distance(iss.begin(), iter) / iss.size()) << "%\n";
std::cout << "ok = " << ok << std::endl;
// This range loop prints all contents if input.
for (auto& item : types.comments) { std::cout << "comment: " << boost::fusion::as_deque(item) << "\n"; }
// This loop prints nothing.
for (auto& item : types.numbers) { std::cout << "number: " << boost::fusion::as_deque(item) << "\n"; }
}
My larger application does the same with a large input file and several more AST's, yet it would seem all my examples are consumed by the comment parser.
Here's the complete running example.
http://coliru.stacked-crooked.com/a/f983b26d673305a0
Thoughts?
You took the grammar idea from my answer here: X3, how to populate a more complex AST?
There it worked because the line formats are not ambiguous. In fact the "variant" approach you had required special attention, and I noted that in this bullet:
departments need to be ordered before teams, or you get "team" matched instead of departments
The same kind of ambiguity exists in your grammar. *(char_ - eol) matches "1 2" just fine, so obviously it is added as a comment. You will have to disambiguate the grammar or somehow force the switch to "parse number lines now" mode.
If you wholly don't care what precedes the number lines, just use x3::seek [ lines(number) ].

qi % operator consumes (1) delimiter attributes and (2) accepts a trailing delimiter

Fiddling around to find out about some strange behaviours of my parsers I finally ended with the finding that qi % does not exactly behave as I expect.
First issue: In the verbous documentation a % b gets described as a shortcut for a >> *(b >> a). But it is actually not. This holds only, if you accept the b's to be discarded.
Say simple_id was any parser. Then actually
simple_id % lit(";")
is the same as
simple_id % some_sophisticated_attribute_emitting_parser_expression
because the right hand side expression will be discarded in any case (i.e. does not contribute to any attributes). In detail: The first expression behaves exactly as (for example):
simple_id % string(";")
So string() is semantically equivalent to lit() if certain constraints hold, i.e. both live in the domain of being rh-operands of %. Here is my first question: Do you consider this to be a bug? Or is it a feature? I discussed that on the mailing list and got the answer that it's a feature, because this behaviour is documented (if you go into the very details of the doc). If you do so you find they are right.
I want to be a user of this library. I found that things go easy with qi on higher levels of grammar. But if you get down to bits and bytes and iterator positions, life gets hard. At a point I decided not to trust any longer and to track down into qi code.
It took me just a few minutes to track down my issue inside qi. Once having the responsible code (list.hpp) on the screen, it was obvious to me, that qi % has another issue. This here is the exact semantic of qi %
a % b <- a >> *(b >> a) >> -(b)
In words: It accepts a trailing b (and consumes it) even if it is not followed by an a. This is definitely not documented. Just for fun I looked into the X3 implementation of %. The bug has been migrated and occurs there as well.
Here are my questions: Is my analysis correct? If so, what parser library do you use? Can you recommend one? If I am wrong, where did I fail?
I post these questions because I am not the only one struggling. I hope the infos provided here are helpful.
Below is a self-contained working example demonstrating the issue(s) and the solution for both problems. If you run the example, have a look at the second test in particular. It shows the % consuming the trailing ; (what it should not, I think).
My env: MSVC 2015, Target: Win32 Console, Boost 1.6.1
///////////////////////////////////////////////////////////////////////////
// This is a self-contained demo which compiles with MSVC 2015 to Win32
// console. Therefore it should compile with any modern compiler. :)
//
//
// This demo implements a new qi operator != which does the same as %
// does but without eating up the delimiters (unless they are non-output
// i.e. lit).
//
// The implementation also shows how to fix a bug which makes the current
// qi % operator eat a trailing b. The current implementation accepts
// a >> *(b >> a) >> -(b).
//
//
// I utilize the not_equal_to proto::tag for the alternative % operation
// See the simple rules to compare both operators.
///////////////////////////////////////////////////////////////////////////
//#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include <map>
#include <boost/spirit/repository/include/qi_confix.hpp>
#include <boost/spirit/include/qi.hpp>
// Change the result type to test containers etc.
// You may need to provide an << ostream operator to have output work
using result_type = std::string;
using iterator_type = std::string::const_iterator;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
namespace proto = boost::proto;
namespace maxence { namespace parser {
///////////////////////////////////////////////////////////////////////////////
// The skipper grammar (just skip this section while reading ;)
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator>
struct skipper : qi::grammar<Iterator>
{
skipper() : skipper::base_type(start)
{
qi::char_type char_;
using boost::spirit::eol;
using boost::spirit::repository::confix;
ascii::space_type space;
start =
space // tab/space/cr/lf
| confix("/*", "*/")[*(char_ - "*/")] // C-style comments
| confix("//", eol)[*(char_ - eol)] // C++-style comments
;
}
qi::rule<Iterator> start;
};
}}
namespace boost { namespace spirit {
///////////////////////////////////////////////////////////////////////////
// Enablers
///////////////////////////////////////////////////////////////////////////
template <>
struct use_operator<qi::domain, proto::tag::not_equal_to> // enables p != d
: mpl::true_ {};
}}
namespace ascii = boost::spirit::ascii;
namespace boost { namespace spirit { namespace qi
{
template <typename Left, typename Right>
struct list_ex : binary_parser<list_ex<Left, Right> >
{
typedef Left left_type;
typedef Right right_type;
template <typename Context, typename Iterator>
struct attribute
{
// Build a std::vector from the LHS's attribute. Note
// that build_std_vector may return unused_type if the
// subject's attribute is an unused_type.
typedef typename
traits::build_std_vector<
typename traits::
attribute_of<Left, Context, Iterator>::type
>::type
type;
};
list_ex(Left const& left_, Right const& right_)
: left(left_), right(right_) {}
/////////////////////////////////////////////////////////////////////////
// code from qi % operator
//
// Note: The original qi code accepts a >> *(b >> a) >> -(b)
// That means a trailing delimiter gets consumed
//
// template <typename F>
// bool parse_container(F f) const
// {
// // in order to succeed we need to match at least one element
// if (f(left)) return false;
// typename F::iterator_type save = f.f.first;
//
// // The while clause below is wrong
// // To correct that (not eat trailing delimiters) it should read:
// // while (!(!right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused) && f(left)))
//
// while (right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused) <--- issue!
// && !f(left))
// {
// save = f.f.first;
// }
//
// f.f.first = save;
// return true;
//
/////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////
// replacement to allow operator not to "eat up" the "delimiter"
//
template <typename F>
bool parse_container(F f) const
{
// in order to succeed we need to match at least one element
if (f(left)) return false;
while (!(f(right) && f(left)));
return true;
}
//
/////////////////////////////////////////////////////////////////////////
template <typename Iterator, typename Context
, typename Skipper, typename Attribute>
bool parse(Iterator& first, Iterator const& last
, Context& context, Skipper const& skipper
, Attribute& attr_) const
{
typedef detail::fail_function<Iterator, Context, Skipper>
fail_function;
// ensure the attribute is actually a container type
traits::make_container(attr_);
Iterator iter = first;
fail_function f(iter, last, context, skipper);
if (!parse_container(detail::make_pass_container(f, attr_)))
return false;
first = f.first;
return true;
}
template <typename Context>
info what(Context& context) const
{
return info("list_ex",
std::make_pair(left.what(context), right.what(context)));
}
Left left;
Right right;
};
///////////////////////////////////////////////////////////////////////////
// Parser generators: make_xxx function (objects)
///////////////////////////////////////////////////////////////////////////
template <typename Elements, typename Modifiers>
struct make_composite<proto::tag::not_equal_to, Elements, Modifiers>
: make_binary_composite<Elements, list_ex>
{};
}}}
namespace boost { namespace spirit { namespace traits {
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right>
struct has_semantic_action<qi::list_ex<Left, Right> >
: binary_has_semantic_action<Left, Right> {};
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right, typename Attribute
, typename Context, typename Iterator>
struct handles_container<qi::list_ex<Left, Right>, Attribute, Context
, Iterator>
: mpl::true_ {};
}}}
using rule_type = qi::rule <iterator_type, result_type(), maxence::parser::skipper<iterator_type>>;
namespace maxence { namespace parser {
template <typename Iterator>
struct ident : qi::grammar < Iterator, result_type() , skipper<Iterator >>
{
ident();
rule_type not_equal_to, modulus, not_used;
};
// we actually don't need the start rule (see below)
template <typename Iterator>
ident<Iterator>::ident() : ident::base_type(not_equal_to)
{
not_equal_to = (qi::alpha | '_') >> *(qi::alnum | '_') != qi::char_(";");
modulus = (qi::alpha | '_') >> *(qi::alnum | '_') % qi::char_(";");
modulus.name("qi modulus operator");
BOOST_SPIRIT_DEBUG_NODES(
(not_equal_to)
)
}
}}
int main()
{
namespace parser = maxence::parser;
using rule_map_type = std::map<std::string, rule_type&>;
using rule_iterator_type = std::map<std::string, rule_type&>::const_iterator;
using ss_map_type = std::map<std::string, std::string>;
using ss_iterator_type = ss_map_type::const_iterator;
parser::ident<iterator_type> ident;
parser::skipper<iterator_type> skipper;
ss_map_type parser_input =
{
{ "; delimited list without trailing delimiter \n(expected result: success, EOI reached)", "willy; anton" },
{ "; delimited list with trailing delimiter \n(expected result: success, EOI not reached)", "willy; anton;" }
};
rule_map_type rules =
{
{ "E1", ident.not_equal_to },
{ "E2", ident.modulus }
};
for (ss_iterator_type input = parser_input.begin(); input != parser_input.end(); input++) {
for (rule_iterator_type example = rules.begin(); example != rules.end(); example++) {
std::string to_parse = input->second;
::result_type result;
std::string parser_name = (example->second).name();
std::cout << "--------------------------------------------" << std::endl;
std::cout << "Description: " << input->first << std::endl;
std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
auto b(to_parse.begin()), e(to_parse.end());
bool success = qi::phrase_parse(b, e, (example)->second, skipper, result);
// --- test for parser success
if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
else std::cout << " Parser failed. " << std::endl;
//--- test for EOI
if (b == e) {
std::cout << "EOI reached.";
} else {
std::cout << "Failure: EOI not reached. Remaining: [";
while (b != e) std::cout << *b++; std::cout << "]";
}
std::cout << std::endl << "--------------------------------------------" << std::endl;
}
}
return 0;
}
Extension: Because of the comments I extend my post:
My != operator is different from the % operator . The != operator would add all the 'delimiters' found to the result vector. (a != qi::char_(";,")). To introduce my proposal to % would discard useful functionality.
Maybe there is a justification to introduce an additional operator. I think I should use another operator for that, != hurts my eyes. Anyway, the != operator has nice applications also. For example:
settings_list = name != expression;
I thought it is wrong that % does not eat trailing 'delimiters'. My code example above seemed to demonstrate that. Anyway, I stripped the example down to focus on that issue only. Now I know that missing ; are sitting happily somewhere in the Carribean having a Caipirinha. Better than being eaten. :)
The example below eats the trailing 'delimiter', because it's not really trailing. The issue was my test string. The Kleene star has a zero match after the last ;. Therefore it gets eaten which is correct behaviour.
I learned much about qi during this 'trip'. More than from the docs. Most important lesson learned: Shape your test cases carefully. A did a quick copy&paste from some example without thought. That introduced the problems.
#include <iostream>
#include <map>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
using iterator_type = std::string::const_iterator;
using result_type = std::string;
template <typename Parser>
void parse(const std::string message, const std::string& input, const Parser& parser)
{
iterator_type iter = input.begin(), end = input.end();
std::vector<result_type> parsed_result;
std::cout << "-------------------------\n";
std::cout << message << "\n";
std::cout << "Parsing: \"" << input << "\"\n";
bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
if (result)
{
std::cout << "Parser succeeded.\n";
std::cout << "Parsed " << parsed_result.size() << " elements:";
for (const auto& str : parsed_result)
std::cout << "[" << str << "]";
std::cout << std::endl;
}
else
{
std::cout << "Something failed. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
}
if (iter == end) {
std::cout << "EOI reached." << std::endl;
}
else {
std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
}
std::cout << "-------------------------\n";
}
int main()
{
auto r1 = (*(qi::alpha | '_')) % qi::char_(";");
auto r2 = qi::as_string[*(qi::alpha | '_')] % qi::char_(";");
parse("% eating the trailing delimiter 'delimiter'",
"willy; anton; 1234", r1);
parse("% eating the trailing 'delimiter' (limited as_string edition)'",
"willy; anton; 1234", r2);
return 0;
}
Here are the answers to all of the questions.
(1) My analysis was incorrect. The % operator does not eat trailing 'delimiters'. The real problem was the parsing rule beeing a Kleene star rule. This rule matched did not find an identifier after the last 'delimiter', but it matched zero. So it is perfectly ok that % consumes the 'delimiter'.
(2) I am not looking for a qi alternative currently.
(3) The current implementation of % does not 'discard' the b of a % b. If in deed you have
simple_id % some_sophisticated_attribute_emitting_parser_expression
then the sophisticated thingy (which may be dynamic (like char_("+-*/")) must match for % to continue. My proposed change to % would break this feature.
To have %= (see below) operate like % you'd have to use (a %= qi::omit[b]). This mimics a % b almost completely. The difference remains that %= intentionally eats the 'trailing delimiter'. There is an example for that in the code below.
Therefore %= can not be taken as a superset of %.
If qi should be extended by an operator which provides the functionality I requested is a discussion I do not want to promote. Regarding parser functionality qi is easy extensible, so that you can produce additional parsers to your liking.
That compilers are allergic to qi 2.x with auto is another topic. More complex. I never thought, that in particular I with my MSVC 2015 environment would ever be on the non-crashing side of life.
Anyway, I owe you what for having me insisting so much so stupidly. The code below provides an implementation of a %= operator (modulus_assign) for qi. It is implemented as list2 living in the mxc::qitoo namespace. I marked the header start and end if somebody finds it valuable and wants to use it.
The main function is a show case demonstrating the commons and differences between the two operators. And showing once more that Kleene star is wild creature.
#include <iostream>
#include <map>
///////////////////////////
// start: header list2.hpp
///////////////////////////
#pragma once
#include <boost/spirit/include/qi.hpp>
namespace boost {
namespace spirit {
///////////////////////////////////////////////////////////////////////////
// Enablers
///////////////////////////////////////////////////////////////////////////
template <>
struct use_operator<qi::domain, proto::tag::modulus_assign> // enables p %= d
: mpl::true_ {};
}
}
namespace mxc {
namespace qitoo {
namespace spirit = boost::spirit;
namespace qi = spirit::qi;
template <typename Left, typename Right>
struct list2 : qi::binary_parser<list2<Left, Right> >
{
typedef Left left_type;
typedef Right right_type;
template <typename Context, typename Iterator>
struct attribute
{
// Build a std::vector from the LHS's and RHS's attribute. Note
// that build_std_vector may return unused_type if the
// subject's attribute is an unused_type.
typedef typename
spirit::traits::build_std_vector<
typename spirit::traits::attribute_of<Left, Context, Iterator>::type>::type type;
};
list2(Left const& left_, Right const& right_) : left(left_), right(right_) {}
template <typename F>
bool parse_container(F f) const
{
typename F::iterator_type save = f.f.first;
// we need a first left match at least
if (f(left)) return false;
// if right does not match rewind iterator and fail
if (f(right)) {
f.f.first = save;
return false;
}
// easy going
while (!f(left) && !f(right))
{
save = f.f.first;
}
f.f.first = save;
return true;
}
template <typename Iterator, typename Context, typename Skipper, typename Attribute>
bool parse(Iterator& first, Iterator const& last, Context& context, Skipper const& skipper, Attribute& attr_) const
{
typedef qi::detail::fail_function<Iterator, Context, Skipper>
fail_function;
// ensure the attribute is actually a container type
spirit::traits::make_container(attr_);
Iterator iter = first;
fail_function f(iter, last, context, skipper);
if (!parse_container(qi::detail::make_pass_container(f, attr_)))
return false;
first = f.first;
return true;
}
template <typename Context>
qi::info what(Context& context) const
{
return qi::info("list2",
std::make_pair(left.what(context), right.what(context)));
}
Left left;
Right right;
};
}
}
namespace boost {
namespace spirit {
namespace qi {
///////////////////////////////////////////////////////////////////////////
// Parser generators: make_xxx function (objects)
///////////////////////////////////////////////////////////////////////////
template <typename Elements, typename Modifiers>
struct make_composite<proto::tag::modulus_assign, Elements, Modifiers>
: make_binary_composite<Elements, mxc::qitoo::list2>
{};
}
namespace traits
{
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right>
struct has_semantic_action<mxc::qitoo::list2<Left, Right> >
: binary_has_semantic_action<Left, Right> {};
///////////////////////////////////////////////////////////////////////////
template <typename Left, typename Right, typename Attribute
, typename Context, typename Iterator>
struct handles_container<mxc::qitoo::list2<Left, Right>, Attribute, Context
, Iterator>
: mpl::true_ {};
}
}
}
///////////////////////////
// end: header list2.hpp
///////////////////////////
namespace qi = boost::spirit::qi;
namespace qitoo = mxc::qitoo;
using iterator_type = std::string::const_iterator;
using result_type = std::string;
template <typename Parser>
void parse(const std::string message, const std::string& input, const std::string& rule, const Parser& parser)
{
iterator_type iter = input.begin(), end = input.end();
std::vector<result_type> parsed_result;
std::cout << "-------------------------\n";
std::cout << message << "\n";
std::cout << "Rule: " << rule << std::endl;
std::cout << "Parsing: \"" << input << "\"\n";
bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
if (result)
{
std::cout << "Parser succeeded.\n";
std::cout << "Parsed " << parsed_result.size() << " elements:";
for (const auto& str : parsed_result)
std::cout << "[" << str << "]";
std::cout << std::endl;
}
else
{
std::cout << "Parser failed" << std::endl;
}
if (iter == end) {
std::cout << "EOI reached." << std::endl;
}
else {
std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
}
std::cout << "-------------------------\n";
}
int main()
{
parse("Modulus-Assign Operator (%), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "(+(qi::alpha | qi::char_('_'))) % qi::char_(\";,\"))"
, (+(qi::alpha | qi::char_('_'))) % qi::char_(";,"));
parse("Modulus-Assign Operator (%=), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "(+(qi::alpha | qi::char_('_'))) %= qi::char_(\";,\"))"
, (+(qi::alpha | qi::char_('_'))) %= qi::char_(";,"));
parse("Modulus-Assign Operator (%), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(\";,\"))"
, ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(";,"));
parse("Modulus-Assign Operator (%=), list with several different 'delimiters' "
, "willy; anton; frank, joel, 1234"
, "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(\";,\"))"
, ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(";,"));
std::cout << std::endl << "Note that %= exposes the trailing 'delimiter' and it has to to enable this usage:" << std::endl;
parse("Modulus-Assign Operator (%=), list with several different 'delimiters'\n using omit to mimic %"
, "willy; anton; frank, joel, 1234"
, "+(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(\";,\"))]"
, +(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(";,")]);
parse("Modulus Operator (%), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
, "x = 5; y = 7; z = 10; = 7;"
, "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
, *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
parse("Modulus-Assign Operator (%=), list of assignments (*bio hazard edition*)\nBe careful with the Kleene star, Eugene!"
, "x = 5; y = 7; z = 10; = 7;"
, "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
, *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
parse("Modulus-Assign Operator (%=), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
, "x = 5; y = 7; z = 10; = 7;"
, "+(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
, +(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
return 0;
}

How to deal with last comma, when making comma separated string? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicates:
Don't print space after last number
Printing lists with commas C++
#include <vector>
#include <iostream>
#include <sstream>
#include <boost/foreach.hpp>
using namespace std;
int main()
{
vector<int> VecInts;
VecInts.push_back(1);
VecInts.push_back(2);
VecInts.push_back(3);
VecInts.push_back(4);
VecInts.push_back(5);
stringstream ss;
BOOST_FOREACH(int i, VecInts)
{
ss << i << ",";
}
cout << ss.str();
return 0;
}
This prints out: 1,2,3,4,5,
However I want: 1,2,3,4,5
How can I achieve that in an elegant way?
I see there is some confusion about what I mean with "elegant": E.g. no slowing down "if-clause" in my loop. Imagine 100.000 entries in the vector! If that is all you have to offer, I'd rather remove the last comma after I have gone through the loop.
How about this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <string>
#include <sstream>
int main()
{
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
v.push_back(5);
std::ostringstream ss;
if(!v.empty()) {
std::copy(v.begin(), std::prev(v.end()), std::ostream_iterator<int>(ss, ", "));
ss << v.back();
}
std::cout << ss.str() << "\n";
}
No need to add extra variables and doesn't even depend on boost! Actually, in addition to the "no additional variable in the loop" requirement, one could say that there is not even a loop :)
Detecting the one before last is always tricky, detecting the first is very easy.
bool first = true;
stringstream ss;
BOOST_FOREACH(int i, VecInts)
{
if (!first) { ss << ","; }
first = false;
ss << i;
}
Using Karma from Boost Spirit - has a reputation for being fast.
#include <iostream>
#include <vector>
#include <boost/spirit/include/karma.hpp>
int main()
{
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
using namespace boost::spirit::karma;
std::cout << format(int_ % ',', v) << std::endl;
}
Try:
if (ss.tellp ())
{
ss << ",";
}
ss << i;
Alternatively, if the "if" is making you worried:
char *comma = "";
BOOST_FOREACH(int i, VecInts)
{
ss << comma << i;
comma = ",";
}
Personally, I like a solution that does not cause potential memory allocations (because the string grows larger than needed). An extra-if within the loop body should be tractable thanks to branch target buffering, but I would do so:
#include <vector>
#include <iostream>
int main () {
using std::cout;
typedef std::vector<int>::iterator iterator;
std::vector<int> ints;
ints.push_back(5);
ints.push_back(1);
ints.push_back(4);
ints.push_back(2);
ints.push_back(3);
if (!ints.empty()) {
iterator it = ints.begin();
const iterator end = ints.end();
cout << *it;
for (++it; it!=end; ++it) {
cout << ", " << *it;
}
cout << std::endl;
}
}
Alternatively, BYORA (bring your own re-usable algorithm):
// Follow the signature of std::getline. Allows us to stay completely
// type agnostic.
template <typename Stream, typename Iter, typename Infix>
inline Stream& infix (Stream &os, Iter from, Iter to, Infix infix_) {
if (from == to) return os;
os << *from;
for (++from; from!=to; ++from) {
os << infix_ << *from;
}
return os;
}
template <typename Stream, typename Iter>
inline Stream& comma_seperated (Stream &os, Iter from, Iter to) {
return infix (os, from, to, ", ");
}
so that
...
comma_seperated(cout, ints.begin(), ints.end()) << std::endl;
infix(cout, ints.begin(), ints.end(), "-") << std::endl;
infix(cout, ints.begin(), ints.end(), "> <") << std::endl;
...
output:
5, 1, 4, 2, 3
5-1-4-2-3
5> <1> <4> <2> <3
The neat thing is it works for every output stream, any container that has forward iterators, with any infix, and with any infix type (interesting e.g. when you use wide strings).
I like moving the test outside the loop.
It only needs to be done once. So do it first.
Like this:
if (!VecInts.empty())
{
ss << VecInts[0]
for(any loop = ++(VecInts.begin()); loop != VecInts.end(); ++loop)
{
ss << "," << *loop;
}
}
You can either trim the string at the end, or using single for loop instead of foreach and dont concatenate at the last iteration
Well, if you format into a stringstream anyway, you can just trim the resulting string by one character:
cout << ss.str().substr(0, ss.str().size() - 1);
If the string is empty, than the second argument says -1, which means everything and does not crash and if the string is non-empty, it always ends with a comma.
But if you write to an output stream directly, I never found anything better than the first flag.
That is unless you want to use join from boost.string algo.
This would work
stringstream ss;
BOOST_FOREACH(int const& i, VecInts)
{
if(&i != &VecInts[0])
ss << ", ";
ss << i;
}
I suspect with "elegant" you mean "without introducing a new variable". But I think I would just do it "non-elegant" if I couldn't find anything else. It's still clear
stringstream ss;
bool comma = false;
BOOST_FOREACH(int i, VecInts)
{
if(comma)
ss << ", ";
ss << i;
comma = true;
}
Imagine 100.000 entries in the vector! If that is all you have to offer, I'd rather remove the last comma after I have gone thorough the loop.
You are saying that as if printing ss << i is one machine instruction. Come on, executing that expression will execute lots of if's and loops inside. Your if will be nothing compared to that.
cout << ss.str()<<"\b" <<" ";
You can add the "\b" backspace
This will overwrite the extra "," .
for Example :
int main()
{
cout<<"Hi";
cout<<'\b'; //Cursor moves 1 position backwards
cout<<" "; //Overwrites letter 'i' with space
}
So the output would be
H