Understanding the List Operator (%) in Boost.Spirit - c++

Can you help me understand the difference between the a % b parser and its expanded a >> *(b >> a) form in Boost.Spirit? Even though the reference manual states that they are equivalent,
The list operator, a % b, is a binary operator that matches a list of one or more repetitions of a separated by occurrences of b. This is equivalent to a >> *(b >> a).
the following program produces different results depending on which is used:
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
struct Record {
int id;
std::vector<int> values;
};
BOOST_FUSION_ADAPT_STRUCT(Record,
(int, id)
(std::vector<int>, values)
)
int main() {
namespace qi = boost::spirit::qi;
const auto str = std::string{"1: 2, 3, 4"};
const auto rule1 = qi::int_ >> ':' >> (qi::int_ % ',') >> qi::eoi;
const auto rule2 = qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_)) >> qi::eoi;
Record record1;
if (qi::phrase_parse(str.begin(), str.end(), rule1, qi::space, record1)) {
std::cout << record1.id << ": ";
for (const auto& value : record1.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
Record record2;
if (qi::phrase_parse(str.begin(), str.end(), rule2, qi::space, record2)) {
std::cout << record2.id << ": ";
for (const auto& value : record2.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
Live on Coliru
1: 2, 3, 4,
1: 2,
rule1 and rule2 are different only in that rule1 uses the list operator ((qi::int_ % ',')) and rule2 uses its expanded form ((qi::int_ >> *(',' >> qi::int_))). However, rule1 produced 1: 2, 3, 4, (as expected) and rule2 produced 1: 2,. I cannot understand the result of rule2: 1) why is it different from that of rule1 and 2) why were 3 and 4 not included in record2.values even though phrase_parse returned true somehow?

Update X3 version added
First off, you fallen into a deep trap here:
Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).
So, first off:
const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',') >> qi::eoi);
Now, when you delete the redundancy in the program, you get:
Reproducing the problem
Live On Coliru
int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
}
Printing
1: 2, 3, 4,
1: 2,
The cause and the fix
What happened to 3, 4 which was successfully parsed?
Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.
If you want to make container attributes parse as "an atomic group", use qi::as<>:
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};
BOOST_FUSION_ADAPT_STRUCT(Record, id, values)
namespace qi = boost::spirit::qi;
template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";
Record record;
if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
}
Prints
1: 2, 3, 4,
1: 2,
1: 2, 3, 4,

Because it's time to get people started with X3 (the new version of Spirit), and because I like to challenge msyelf to do the corresponding tasks in Spirit X3, here is the Spirit X3 version.
There's no problem with auto in X3.
The "broken" case also behaves much better, triggering this static assertion:
// If you got an error here, then you are trying to pass
// a fusion sequence with the wrong number of elements
// as that expected by the (sequence) parser.
static_assert(
fusion::result_of::size<Attribute>::value == (l_size + r_size)
, "Attribute does not have the expected size."
);
That's nice, right?
The workaround seems a bit less readable:
test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));
But it would be trivial to write your own as<> "directive" (or just a function), if you wanted:
namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};
template <typename T> static const as_type<T> as = {};
}
DEMO
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/home/x3.hpp>
struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};
namespace x3 = boost::spirit::x3;
template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";
Record record;
auto attr = std::tie(record.id, record.values);
if (x3::phrase_parse(str.begin(), str.end(), rule >> x3::eoi, x3::space, attr)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};
template <typename T> static const as_type<T> as = {};
}
int main() {
using namespace x3;
test(int_ >> ':' >> (int_ % ','));
//test(int_ >> ':' >> (int_ >> *(',' >> int_))); // COMPILER asserts "Attribute does not have the expected size."
// "clumsy" x3 style workaround
test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));
// using an ad-hoc `as<>` implementation:
test(int_ >> ':' >> as<Record::values_t>[int_ >> *(',' >> int_)]);
}
Prints
1: 2, 3, 4,
1: 2, 3, 4,
1: 2, 3, 4,

Related

Boost X3: Can a variant member be avoided in disjunctions?

I'd like to parse string | (string, int) and store it in a structure that defaults the int component to some value. The attribute of such a construction in X3 is a variant<string, tuple<string, int>>. I was thinking I could have a struct that takes either a string or a (string, int) to automagically be populated:
struct bar
{
bar (std::string x = "", int y = 0) : baz1 {x}, baz2 {y} {}
std::string baz1;
int baz2;
};
BOOST_FUSION_ADAPT_STRUCT (disj::ast::bar, baz1, baz2)
and then simply have:
const x3::rule<class bar, ast::bar> bar = "bar";
using x3::int_;
using x3::ascii::alnum;
auto const bar_def = (+(alnum) | ('(' >> +(alnum) >> ',' >> int_ >> ')')) >> ';';
BOOST_SPIRIT_DEFINE(bar);
However this does not work:
/usr/include/boost/spirit/home/x3/core/detail/parse_into_container.hpp:139:59: error: static assertion failed: Expecting a single element fusion sequence
139 | static_assert(traits::has_size<Attribute, 1>::value,
Setting baz2 to an optional does not help. One way to solve this is to have a variant field or inherit from that type:
struct string_int {
std::string s;
int i;
};
struct foo {
boost::variant<std::string, string_int> var;
};
BOOST_FUSION_ADAPT_STRUCT (disj::ast::string_int, s, i)
BOOST_FUSION_ADAPT_STRUCT (disj::ast::foo, var)
(For some reason, I have to use boost::variant instead of x3::variant for operator<< to work; also, using std::pair or tuple for string_int does not work, but boost::fusion::deque does.) One can then equip foo somehow to get the string and integer.
Question: What is the proper, clean way to do this in X3? Is there a more natural way than this second option and equipping foo with accessors?
Live On Coliru
Sadly the wording in the x3 section is exceedingly sparse and allows it (contrast the Qi section). A quick test confirms it:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
template <typename Expr>
std::string inspect(Expr const& expr) {
using A = typename x3::traits::attribute_of<Expr, x3::unused_type>::type;
return boost::core::demangle(typeid(A).name());
}
int main()
{
std::cout << inspect(x3::double_ | x3::int_) << "\n"; // variant expected
std::cout << inspect(x3::int_ | "bla" >> x3::int_) << "\n"; // variant "understandable"
std::cout << inspect(x3::int_ | x3::int_) << "\n"; // variant suprising:
}
Prints
boost::variant<double, int>
boost::variant<int, int>
boost::variant<int, int>
All Hope Is Not Lost
In your specific case you could trick the system:
auto const bar_def = //
(+x3::alnum >> x3::attr(-1) //
| '(' >> +x3::alnum >> ',' >> x3::int_ >> ')' //
) >> ';';
Note how we "inject" an int value for the first branch. That satisfies the attribute propagation gods:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iomanip>
namespace x3 = boost::spirit::x3;
namespace disj::ast {
struct bar {
std::string x;
int y;
};
using boost::fusion::operator<<;
} // namespace disj::ast
BOOST_FUSION_ADAPT_STRUCT(disj::ast::bar, x, y)
namespace disj::parser {
const x3::rule<class bar, ast::bar> bar = "bar";
auto const bar_def = //
(+x3::alnum >> x3::attr(-1) //
| '(' >> +x3::alnum >> ',' >> x3::int_ >> ')' //
) >> ';';
BOOST_SPIRIT_DEFINE(bar)
}
namespace disj {
void run_tests() {
for (std::string const input : {
"",
";",
"bla;",
"bla, 42;",
"(bla, 42);",
}) {
ast::bar val;
auto f = begin(input), l = end(input);
std::cout << "\n" << quoted(input) << " -> ";
if (phrase_parse(f, l, parser::bar, x3::space, val)) {
std::cout << "Parsed: " << val << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << " -- Remaining " << quoted(std::string_view(f, l)) << "\n";
}
}
}
}
int main()
{
disj::run_tests();
}
Prints
"" -> Failed
";" -> Failed
-- Remaining ";"
"bla;" -> Parsed: (bla -1)
"bla, 42;" -> Failed
-- Remaining "bla, 42;"
"(bla, 42);" -> Parsed: (bla 42)
¹ just today

Unintelligible compilation error for a simple X3 grammar

I have a quite simple grammar I try to implement using boost spirit x3, without success.
It does not compile, and due to all the templates and complex concepts used in the library (I know, it is rather a "header"), the compilation error message is way too long to be intelligible.
I tried to comment part of the code in order narrow down the culprit, without success as it comes down to several parts, for which I don't see any error anyway.
Edit2: the first error message is in indeed in push_front_impl.hpp highlighting that:
::REQUESTED_PUSH_FRONT_SPECIALISATION_FOR_SEQUENCE_DOES_NOT_EXIST::*
I suspect the keyword auto or maybe the p2 statement with ulong_long...but with no faith.
Need the help of you guys...spirit's elites !
Below a minimal code snippet reproducing the compilation error.
Edit: using boost 1.70 and visual studio 2019 v16.1.6
#include <string>
#include <iostream>
#include "boost/spirit/home/x3.hpp"
#include "boost/spirit/include/support_istream_iterator.hpp"
int main(void)
{
       std::string input = \
             "\"nodes\":{ {\"type\":\"bb\", \"id\" : 123456567, \"label\" : \"0x12023049\"}," \
                         "{\"type\":\"bb\", \"id\" : 123123123, \"label\" : \"0x01223234\"}," \
                         "{\"type\":\"ib\", \"id\" : 223092343, \"label\" : \"0x03020343\"}}";
       std::istringstream iss(input);
       namespace x3 = boost::spirit::x3;
       using x3::char_;
       using x3::ulong_long;
       using x3::lit;
 
       auto q = lit('\"'); /* q => quote */
 
       auto p1 = q >> lit("type") >> q >> lit(':') >> q >> (lit("bb") | lit("ib")) >> q;
       auto p2 = q >> lit("id") >> q >> lit(':') >> ulong_long;
       auto p3 = q >> lit("label") >> q >> lit(':') >> q >> (+x3::alpha) >> q;
       auto node =  lit('{') >> p1 >> lit(',') >> p2 >> lit(',') >> p3 >> lit('}');
       auto nodes = q >> lit("nodes") >> q >> lit(':') >> lit('{') >> node % lit(',') >> lit('}');
 
       boost::spirit::istream_iterator f(iss >> std::noskipws), l{};
       bool b = x3::phrase_parse(f, l, nodes, x3::space);
 
       return 0;
}
It is an known MPL limitation (Issue with X3 and MS VS2017, https://github.com/boostorg/spirit/issues/515) + bug/difference of implementation for MSVC/ICC compilers (https://github.com/boostorg/mpl/issues/43).
I rewrote an offending part without using MPL (https://github.com/boostorg/spirit/pull/607), it will be released in Boost 1.74, until then you should be able to workaround with:
#define BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
#define BOOST_MPL_LIMIT_VECTOR_SIZE 50
Alternatively you could wrap different parts of your grammar into rules, what will reduce sequence parser chain.
Note that q >> lit("x") >> q >> lit(':') >> ... probably is not what you really want, it (with a skipper) will allow " x ": to be parsed. If you do not want that use simply lit("\"x\"") >> lit(':') >> ...
There's a chance that there might be a missing indirect include for your specific platform/version (if I had to guess it might be caused by using the istream iterator support header from Qi).
If that's not the issue, my attention is drawn by the where T = boost::mpl::aux::vector_tag<20> (/HT #Rup - number 20 seems suspiciously like it might be some kind of limit.
Either we can find what trips the limit and see if we can raise it, but I'll do the "unscientific" approach in the interest of helping you along with the parser.
Simplifying The Expressions
I see a lot (lot) of lit() nodes in your parser expressions that you don't need. I suspect all the quoted constructs need to be lexemes, and instead of painstakingly repeating the quote symbol all over the place, perhaps package it as follows:
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
Notes:
I improved the naming so it's more natural to read:
auto node = '{' >> type >> ',' >> id >> ',' >> label >> '}';
I changed alpha to alnum so it would actually match your sample input
Hypothesis: The expressions are structurally simplified to be more hierarchical - the sequences consist of fewer >>-ed terms - the hope is that this removes a potential mpl::vector size limit.
There's one missing piece, bb_ib that I left out because it changes when you want to actually assign parsed values to attributes. Let's do that:
Attributes
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
As you can see I opted for an enum to represent type. The most natural way to parse that would be using symbols<>
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
Now you can parse into a vector of Node:
Demo
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
struct Node {
enum Type { bb, ib } type;
uint64_t id;
std::string label;
};
namespace { // debug output
inline std::ostream& operator<<(std::ostream& os, Node::Type t) {
switch (t) {
case Node::bb: return os << "bb";
case Node::ib: return os << "ib";
}
return os << "?";
}
inline std::ostream& operator<<(std::ostream& os, Node const& n) {
return os << "Node{" << n.type << ", " << n.id << ", " << std::quoted(n.label) << "}";
}
}
// attribute propagation
BOOST_FUSION_ADAPT_STRUCT(Node, type, id, label)
int main() {
std::string input = R"("nodes": {
{
"type": "bb",
"id": 123456567,
"label": "0x12023049"
},
{
"type": "bb",
"id": 123123123,
"label": "0x01223234"
},
{
"type": "ib",
"id": 223092343,
"label": "0x03020343"
}
})";
namespace x3 = boost::spirit::x3;
struct bb_ib_sym : x3::symbols<Node::Type> {
bb_ib_sym() { this->add("bb", Node::bb)("ib", Node::ib); }
} bb_ib;
auto q = [](auto p) { return x3::lexeme['"' >> x3::as_parser(p) >> '"']; };
auto type = q("type") >> ':' >> q(bb_ib);
auto id = q("id") >> ':' >> x3::ulong_long;
auto label = q("label") >> ':' >> q(+x3::alnum);
auto node
= x3::rule<Node, Node> {"node"}
= '{' >> type >> ',' >> id >> ',' >> label >> '}';
auto nodes = q("nodes") >> ':' >> '{' >> node % ',' >> '}';
std::vector<Node> parsed;
auto f = begin(input);
auto l = end(input);
if (x3::phrase_parse(f, l, nodes, x3::space, parsed)) {
for (Node& node : parsed) {
std::cout << node << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
Prints
Node{bb, 123456567, "0x12023049"}
Node{bb, 123123123, "0x01223234"}
Node{ib, 223092343, "0x03020343"}

How to set max recursion in boost spirit

Using boost::spirit, if I have a recursive rule to parse parentheses
rule<std::string::iterator, std::string()> term;
term %= string("(") >> *term >> string(")");
how do I limit the maximum amount of recursion? For example, if I try to parse a million nested parentheses, I get a segfault because the stack size has been exceeded. To be concrete, here is a complete sample.
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
int main(void)
{
using namespace boost::spirit;
using namespace boost::spirit::qi;
const size_t string_size = 1000000;
std::string str;
str.resize(string_size);
for (size_t s=0; s<str.size()/2; ++s)
{
str[s]='(';
str[str.size() - s -1] = ')';
}
rule<std::string::iterator, std::string()> term;
term %= string("(") >> *term >> string(")");
std::string h;
parse(str.begin(), str.end(), term, h);
}
I compiled it with the command
g++ simple.cxx -o simple -std=c++11
It works fine if I set string_size to 1000 instead of 1000000.
Keep track of the depth in a qi::local<> or a phx::ref().
In this case an inherited attribute can take the role of the qi::local quite naturally:
qi::rule<std::string::const_iterator, std::string(size_t depth)> term;
qi::_r1_type _depth;
term %=
qi::eps(_depth < 32) >>
qi::string("(") >> *term(_depth + 1) >> qi::string(")");
term will now fail when depth exceeds 32.
Full Sample
Live On Coliru
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
int main(void) {
for (size_t n : { 2, 4, 8, 16, 32, 64 }) {
auto const str = [&n] {
std::string str;
str.reserve(n);
while (n--) { str.insert(str.begin(), '('); str.append(1, ')'); }
return str;
}();
std::cout << "Input length " << str.length() << "\n";
qi::rule<std::string::const_iterator, std::string(size_t depth)> term;
qi::_r1_type _depth;
term %=
qi::eps(_depth < 32) >>
qi::string("(") >> *term(_depth + 1) >> qi::string(")");
std::string h;
auto f = str.begin(), l = str.end();
bool ok = qi::parse(f, l, term(0u), h);
if (ok)
std::cout << "Ok: " << h << "\n";
else
std::cout << "Fail\n";
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f, std::min(f + 40, l)) << "'...\n";
}
}
Output:
Input length 4
Ok: (())
Input length 8
Ok: (((())))
Input length 16
Ok: (((((((())))))))
Input length 32
Ok: (((((((((((((((())))))))))))))))
Input length 64
Ok: (((((((((((((((((((((((((((((((())))))))))))))))))))))))))))))))
Input length 128
Fail
Remaining unparsed: '(((((((((((((((((((((((((((((((((((((((('...

Using the auto_ expression in boost::spirit with std::vectors

I'm pretty new to boost::spirit. I would like to parse a string of comma separated objects into an std::vector (similarly as in the tutorials). The string could be of different types (known at compile time): integers, like "1,2,3", strings "Apple, Orange, Banana", etc. etc.
I would like to have a unified interface for all types.
If I parse a single element I can use the auto_ expression.
Is it possible to have a similar interface with vectors?
Can I define a rule that, given a template parameter, can actually parse this vector?
Here is a simple sample code (which does not compile due to the last call to phrase_parse):
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <iostream>
#include <vector>
#include <boost/spirit/include/qi_auto.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
using qi::auto_;
using qi::phrase_parse;
using ascii::space;
using phoenix::push_back;
int main()
{
std::string line1 = "3";
std::string line2 = "1, 2, 3";
int v;
std::vector<int> vector;
typedef std::string::iterator stringIterator;
stringIterator first = line1.begin();
stringIterator last = line1.end();
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
bool r1 = qi::phrase_parse( first,
last,
qi::auto_,
ascii::space,
v );
first = line2.begin();
last = line2.end();
//The following call is wrong!
bool r2 = qi::phrase_parse( first,
last,
// Begin grammar
(
qi::auto_[push_back(phoenix::ref(vector), qi::_1)]
>> *(',' >> qi::auto_[push_back(phoenix::ref(vector),qi::_1)])
),
// End grammar
ascii::space,
vector);
return 0;
}
UPDATE
I found a solution, in the case the size of the vector is known before parsing. On the other hand I cannot use the syntax *( ',' >> qi::auto_ ).
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
std::string s = "1, 2, 3";
std::vector<int> vector;
//This works
qi::phrase_parse(s.begin(), s.end(), qi::auto_ >> ',' >> qi::auto_ >> ',' >> qi::auto_ , qi::blank, vector);
//This does not compile
qi::phrase_parse(s.begin(), s.end(), qi::auto_ >> *( ',' >> qi::auto_ ) , qi::blank, vector);
for(int i = 0; i < vector.size() ; i++)
std::cout << i << ": " << vector[i] << std::endl;
return 0;
}
Moreover using auto_, I cannot parse a a string. Is it possible to define e template function, where the grammar can be deduced by the template parameter?
template< typename T >
void MyParse(std::string& line, std::vector<T> vec)
{
qi::phrase_parse( line.begin(),
line.end(),
/*
How do I define a grammar based on T
such as:
double_ >> *( ',' >> double_ ) for T = double
+qi::alnum >> *( ',' >> +qi::alnum ) for T = std::string
*/,
qi::blank,
vec);
}
auto_ has support for container attributes out of the box:
Live On Coliru
std::istringstream iss("1 2 3 4 5; 6 7 8 9;");
iss.unsetf(std::ios::skipws);
std::vector<int> i;
std::vector<double> d;
if (iss >> qi::phrase_match(qi::auto_ >> ";" >> qi::auto_, qi::space, i, d))
{
for (auto e:i) std::cout << "int: " << e << "\n";
for (auto e:d) std::cout << "double: " << e << "\n";
}
Prints
int: 1
int: 2
int: 3
int: 4
int: 5
double: 6
double: 7
double: 8
double: 9
So you could basically write your template function by using ',' as the skipper. I'd prefer the operator% variant though.
Simple Take
template<typename Container>
void MyParse(std::string const& line, Container& container)
{
auto f(line.begin()), l(line.end());
bool ok = qi::phrase_parse(
f, l,
qi::auto_ % ',', qi::blank, container);
if (!ok || (f!=l))
throw "parser error: '" + std::string(f,l) + "'"; // FIXME
}
Variant 2
template<typename Container>
void MyParse(std::string const& line, Container& container)
{
auto f(line.begin()), l(line.end());
bool ok = qi::phrase_parse(
f, l,
qi::auto_, qi::blank | ',', container);
if (!ok || (f!=l))
throw "parser error: '" + std::string(f,l) + "'"; // FIXME
}
Solving the string case (and others):
If the element type is not 'deducible' by Spirit (anything could be parsed into a string), just take an optional parser/grammar that knows how to parse the element type?
template<typename Container, typename ElementParser = qi::auto_type>
void MyParse(std::string const& line, Container& container, ElementParser const& elementParser = ElementParser())
{
auto f(line.begin()), l(line.end());
bool ok = qi::phrase_parse(
f, l,
elementParser % ",", qi::blank, container);
if (!ok || (f!=l))
throw "parser error: '" + std::string(f,l) + "'"; // FIXME
}
Now, it parses strings just fine:
std::vector<int> i;
std::set<std::string> s;
MyParse("1,22,33,44,15", i);
MyParse("1,22,33,44,15", s, *~qi::char_(","));
for(auto e:i) std::cout << "i: " << e << "\n";
for(auto e:s) std::cout << "s: " << e << "\n";
Prints
i: 1
i: 22
i: 33
i: 44
i: 15
s: 1
s: 15
s: 22
s: 33
s: 44
Full Listing
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
template<typename Container, typename ElementParser = qi::auto_type>
void MyParse(std::string const& line, Container& container, ElementParser const& elementParser = ElementParser())
{
auto f(line.begin()), l(line.end());
bool ok = qi::phrase_parse(
f, l,
elementParser % ",", qi::blank, container);
if (!ok || (f!=l))
throw "parser error: '" + std::string(f,l) + "'"; // FIXME
}
#include <set>
int main()
{
std::vector<int> i;
std::set<std::string> s;
MyParse("1,22,33,44,15", i);
MyParse("1,22,33,44,15", s, *~qi::char_(","));
for(auto e:i) std::cout << "i: " << e << "\n";
for(auto e:s) std::cout << "s: " << e << "\n";
}

Parsing a number of named sets of other named sets

So I want to write a... well... not-so-simple parser with boost::spirit::qi. I know the bare basics of boost spirit, having gotten acquainted with it for the first time in the past couple of hours.
Basically I need to parse this:
# comment
# other comment
set "Myset A"
{
figure "AF 1"
{
i 0 0 0
i 1 2 5
i 1 1 1
f 3.1 45.11 5.3
i 3 1 5
f 1.1 2.33 5.166
}
figure "AF 2"
{
i 25 5 1
i 3 1 3
}
}
# comment
set "Myset B"
{
figure "BF 1"
{
f 23.1 4.3 5.11
}
}
set "Myset C"
{
include "Myset A" # includes all figures from Myset A
figure "CF"
{
i 1 1 1
f 3.11 5.33 3
}
}
Into this:
struct int_point { int x, y, z; };
struct float_point { float x, y, z; };
struct figure
{
string name;
vector<int_point> int_points;
vector<float_point> float_points;
};
struct figure_set
{
string name;
vector<figure> figures
};
vector<figure_set> figure_sets; // fill with the data of the input
Now, obviously having somebody write it for me would be too much, but can you please provide some tips on what to read and how to structure the grammar and parsers for this task?
And also... it may be the case that boost::spirit is not the best library I could use for the task. If so, which one is?
EDIT:
Here's where I've gotten so far. But I'm not yet sure how to go on: http://liveworkspace.org/code/212c31dfc0b6fbdf6c462d8d931c0e9f
I am able to read a single figure but, I don't yet have an idea how to parse a set of figures.
Here's my take on it
I believe the rule that will have been the blocker for you would be
figure = eps >> "figure"
>> name [ at_c<0>(_val) = _1 ] >> '{' >>
*(
ipoints [ push_back(at_c<1>(_val), _1) ]
| fpoints [ push_back(at_c<2>(_val), _1) ]
) >> '}';
This is actually a symptom of the fact that you parse inter-mixed i and f lines into separate containers.
See below for an alternative.
Here's my full code: test.cpp
//#define BOOST_SPIRIT_DEBUG // before including Spirit
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <fstream>
namespace Format
{
struct int_point { int x, y, z; };
struct float_point { float x, y, z; };
struct figure
{
std::string name;
std::vector<int_point> int_points;
std::vector<float_point> float_points;
friend std::ostream& operator<<(std::ostream& os, figure const& o);
};
struct figure_set
{
std::string name;
std::set<std::string> includes;
std::vector<figure> figures;
friend std::ostream& operator<<(std::ostream& os, figure_set const& o);
};
typedef std::vector<figure_set> file_data;
}
BOOST_FUSION_ADAPT_STRUCT(Format::int_point,
(int, x)(int, y)(int, z))
BOOST_FUSION_ADAPT_STRUCT(Format::float_point,
(float, x)(float, y)(float, z))
BOOST_FUSION_ADAPT_STRUCT(Format::figure,
(std::string, name)
(std::vector<Format::int_point>, int_points)
(std::vector<Format::float_point>, float_points))
BOOST_FUSION_ADAPT_STRUCT(Format::figure_set,
(std::string, name)
(std::set<std::string>, includes)
(std::vector<Format::figure>, figures))
namespace Format
{
std::ostream& operator<<(std::ostream& os, figure const& o)
{
using namespace boost::spirit::karma;
return os << format_delimited(
"\n figure" << no_delimit [ '"' << string << '"' ] << "\n {"
<< *("\n i" << int_ << int_ << int_)
<< *("\n f" << float_ << float_ << float_)
<< "\n }"
, ' ', o);
}
std::ostream& operator<<(std::ostream& os, figure_set const& o)
{
using namespace boost::spirit::karma;
return os << format_delimited(
"\nset" << no_delimit [ '"' << string << '"' ] << "\n{"
<< *("\n include " << no_delimit [ '"' << string << '"' ])
<< *stream
<< "\n}"
, ' ', o);
}
}
namespace /*anon*/
{
namespace phx=boost::phoenix;
namespace qi =boost::spirit::qi;
template <typename Iterator> struct skipper
: public qi::grammar<Iterator>
{
skipper() : skipper::base_type(start, "skipper")
{
using namespace qi;
comment = '#' >> *(char_ - eol) >> (eol|eoi);
start = comment | qi::space;
BOOST_SPIRIT_DEBUG_NODE(start);
BOOST_SPIRIT_DEBUG_NODE(comment);
}
private:
qi::rule<Iterator> start, comment;
};
template <typename Iterator> struct parser
: public qi::grammar<Iterator, Format::file_data(), skipper<Iterator> >
{
parser() : parser::base_type(start, "parser")
{
using namespace qi;
using phx::push_back;
using phx::at_c;
name = eps >> lexeme [ '"' >> *~char_('"') >> '"' ];
include = eps >> "include" >> name;
ipoints = eps >> "i" >> int_ >> int_ >> int_;
fpoints = eps >> "f" >> float_ >> float_ >> float_;
figure = eps >> "figure"
>> name [ at_c<0>(_val) = _1 ] >> '{' >>
*(
ipoints [ push_back(at_c<1>(_val), _1) ]
| fpoints [ push_back(at_c<2>(_val), _1) ]
) >> '}';
set = eps >> "set" >> name >> '{' >> *include >> *figure >> '}';
start = *set;
}
private:
qi::rule<Iterator, std::string() , skipper<Iterator> > name, include;
qi::rule<Iterator, Format::int_point() , skipper<Iterator> > ipoints;
qi::rule<Iterator, Format::float_point(), skipper<Iterator> > fpoints;
qi::rule<Iterator, Format::figure() , skipper<Iterator> > figure;
qi::rule<Iterator, Format::figure_set() , skipper<Iterator> > set;
qi::rule<Iterator, Format::file_data() , skipper<Iterator> > start;
};
}
namespace Parser {
bool parsefile(const std::string& spec, Format::file_data& data)
{
std::ifstream in(spec.c_str());
in.unsetf(std::ios::skipws);
std::string v;
v.reserve(4096);
v.insert(v.end(), std::istreambuf_iterator<char>(in.rdbuf()), std::istreambuf_iterator<char>());
if (!in)
return false;
typedef char const * iterator_type;
iterator_type first = &v[0];
iterator_type last = first+v.size();
try
{
parser<iterator_type> p;
skipper<iterator_type> s;
bool r = qi::phrase_parse(first, last, p, s, data);
r = r && (first == last);
if (!r)
std::cerr << spec << ": parsing failed at: \"" << std::string(first, last) << "\"\n";
return r;
}
catch (const qi::expectation_failure<char const *>& e)
{
std::cerr << "FIXME: expected " << e.what_ << ", got '" << std::string(e.first, e.last) << "'" << std::endl;
return false;
}
}
}
int main()
{
Format::file_data data;
bool ok = Parser::parsefile("input.txt", data);
std::cerr << "Parse " << (ok?"success":"failed") << std::endl;
std::cout << "# figure sets exported automatically by karma\n\n";
for (auto& set : data)
std::cout << set;
}
It outputs the parsed data as a verification: output.txt
Parse success
# figure sets exported automatically by karma
set "Myset A"
{
figure "AF 1"
{
i 0 0 0
i 1 2 5
i 1 1 1
i 3 1 5
f 3.1 45.11 5.3
f 1.1 2.33 5.166
}
figure "AF 2"
{
i 25 5 1
i 3 1 3
}
}
set "Myset B"
{
figure "BF 1"
{
f 23.1 4.3 5.11
}
}
set "Myset C"
{
include "Myset A"
figure "CF"
{
i 1 1 1
f 3.11 5.33 3.0
}
}
You will note that
the order of the point lines are changed (all int_points precede all float_points)
also, non-significant digits are added, e.g. in the last line 3.0 instead of 3 to show that the type if float.
you had 'forgotten' (?) about the includes in your question
Alternative
Have something that keeps the actual point lines in original order:
typedef boost::variant<int_point, float_point> if_point;
struct figure
{
std::string name;
std::vector<if_point> if_points;
}
Now the rules become simply:
name = eps >> lexeme [ '"' >> *~char_('"') >> '"' ];
include = eps >> "include" >> name;
ipoints = eps >> "i" >> int_ >> int_ >> int_;
fpoints = eps >> "f" >> float_ >> float_ >> float_;
figure = eps >> "figure" >> name >> '{' >> *(ipoints | fpoints) >> '}';
set = eps >> "set" >> name >> '{' >> *include >> *figure >> '}';
start = *set;
Note the elegance in
figure = eps >> "figure" >> name >> '{' >> *(ipoints | fpoints) >> '}';
And the output stays in the exact order of the input: output.txt
Once again, full demo code (on github only): test.cpp
Bonus update
Finally, I made my first proper Karma grammar to output the results:
name = no_delimit ['"' << string << '"'];
include = "include" << name;
ipoints = "\n i" << int_ << int_ << int_;
fpoints = "\n f" << float_ << float_ << float_;
figure = "figure" << name << "\n {" << *(ipoints | fpoints) << "\n }";
set = "set" << name << "\n{"
<< *("\n " << include)
<< *("\n " << figure) << "\n}";
start = "# figure sets exported automatically by karma\n\n"
<< set % eol;
That was actually considerably more comfortable than I had expected. See it in the lastest version of the fully updated gist: test.hpp