Parsing nested key value pairs in Boost Spirit - c++

I am having trouble writing what I think should be a simple parser using Boost::Spirit. (I'm using Spirit instead of just using string functions as this is partly a learning exercise for me).
Data
The data to parse takes the form of key value pairs, where a value can itself be a key value pair. Keys are alphanumeric (with underscores and no digit as first character); values are alphanumeric plus .-_ - the values can be dates in the format DD-MMM-YYYY e.g. 01-Jan-2015 and floating point numbers like 3.1415 in addition to plain old alphanumeric strings. Keys and values are separated with =; pairs are separated with ;; structured values are delimited with {...}. At the moment I am erasing all spaces from the user input before passing it to Spirit.
Example input:
Key1 = Value1; Key2 = { NestedKey1=Alan; NestedKey2 = 43.1232; }; Key3 = 15-Jul-1974 ;
I would then strip all spaces to give
Key1=Value1;Key2={NestedKey1=Alan;NestedKey2=43.1232;};Key3=15-Jul-1974;
and then I actually pass it to Spirit.
Problem
What I currently have works just dandy when values are simply values. When I start encoding structured values in the input then Spirit stops after the first structured value. A workaround if there is only one structured value is to put it at the end of the input... but I will need two or more structured values on occasion.
The code
The below compiles in VS2013 and illustrates the errors:
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/pair.hpp>
#include <boost/fusion/adapted.hpp>
#include <map>
#include <string>
#include <iostream>
typedef std::map<std::string, std::string> ARGTYPE;
#define BOOST_SPIRIT_DEBUG
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
template < typename It, typename Skipper>
struct NestedGrammar : qi::grammar < It, ARGTYPE(), Skipper >
{
NestedGrammar() : NestedGrammar::base_type(Sequence)
{
using namespace qi;
KeyName = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
Value = +qi::char_("-.a-zA-Z_0-9");
Pair = KeyName >> -(
'=' >> ('{' >> raw[Sequence] >> '}' | Value)
);
Sequence = Pair >> *((qi::lit(';') | '&') >> Pair);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Pair);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, ARGTYPE(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName;
qi::rule<It, std::string(), Skipper> Value;
qi::rule<It, std::pair < std::string, std::string>(), Skipper> Pair;
};
template <typename Iterator>
ARGTYPE Parse2(Iterator begin, Iterator end)
{
NestedGrammar<Iterator, qi::space_type> p;
ARGTYPE data;
qi::phrase_parse(begin, end, p, qi::space, data);
return data;
}
// ARGTYPE is std::map<std::string,std::string>
void NestedParse(std::string Input, ARGTYPE& Output)
{
Input.erase(std::remove_if(Input.begin(), Input.end(), isspace), Input.end());
Output = Parse2(Input.begin(), Input.end());
}
int main(int argc, char** argv)
{
std::string Example1, Example2, Example3;
ARGTYPE Out;
Example1 = "Key1=Value1 ; Key2 = 01-Jan-2015; Key3 = 2.7181; Key4 = Johnny";
Example2 = "Key1 = Value1; Key2 = {InnerK1 = one; IK2 = 11-Nov-2011;};";
Example3 = "K1 = V1; K2 = {IK1=IV1; IK2=IV2;}; K3=V3; K4 = {JK1=JV1; JK2=JV2;};";
NestedParse(Example1, Out);
for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
std::cout << i->first << "|" << i->second << std::endl;
std::cout << "=====" << std::endl;
/* get the following, as expected:
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
*/
NestedParse(Example2, Out);
for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
std::cout << i->first << "|" << i->second << std::endl;
std::cout << "=====" << std::endl;
/* get the following, as expected:
Key1|Value1
key2|InnerK1=one;IK2=11-Nov-2011
*/
NestedParse(Example3, Out);
for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
std::cout << i->first << "|" << i->second << std::endl;
/* Only get the first two lines of the expected output:
K1|V1
K2|IK1=IV1;IK2=IV2
K3|V3
K4|JK1=JV1;JK2=JV2
*/
return 0;
}
I'm not sure if the problem is down to my ignorance of BNF, my ignorance of Spirit, or perhaps my ignorance of both at this point.
Any help appreciated. I've read e.g. Spirit Qi sequence parsing issues and links therein but I still can't figure out what I am doing wrong.

Indeed this precisely a simple grammar that Spirit excels at.
Moreover there is absolutely no need to skip whitespace up front: Spirit has skippers built in for the purpose.
To your explicit question, though:
The Sequence rule is overcomplicated. You could just use the list operator (%):
Sequence = Pair % char_(";&");
Now your problem is that you end the sequence with a ; that isn't expected, so both Sequence and Value fail the parse eventually. This isn't very clear unless you #define BOOST_SPIRIT_DEBUG¹ and inspect the debug output.
So to fix it use:
Sequence = Pair % char_(";&") >> -omit[char_(";&")];
Fix Live On Coliru (or with debug info)
Prints:
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
=====
Key1|Value1
Key2|InnerK1=one;IK2=11-Nov-2011;
=====
K1|V1
K2|IK1=IV1;IK2=IV2;
K3|V3
K4|JK1=JV1;JK2=JV2;
Bonus Cleanup
Actually, that was simple. Just remove the redundant line removing whitespace. The skipper was already qi::space.
(Note though that the skipper doesn't apply to your Value rule, so values cannot contain whitespace but the parser will not silently skip it either; I suppose this is likely what you want. Just be aware of it).
Recursive AST
You would actually want to have a recursive AST, instead of parsing into a flat map.
Boost recursive variants make this a breeze:
namespace ast {
typedef boost::make_recursive_variant<std::string, std::map<std::string, boost::recursive_variant_> >::type Value;
typedef std::map<std::string, Value> Sequence;
}
To make this work you just change the declared attribute types of the rules:
qi::rule<It, ast::Sequence(), Skipper> Sequence;
qi::rule<It, std::pair<std::string, ast::Value>(), Skipper> Pair;
qi::rule<It, std::string(), Skipper> String;
qi::rule<It, std::string()> KeyName;
The rules themselves don't even have to change at all. You will need to write a little visitor to stream the AST:
static inline std::ostream& operator<<(std::ostream& os, ast::Value const& value) {
struct vis : boost::static_visitor<> {
vis(std::ostream& os, std::string indent = "") : _os(os), _indent(indent) {}
void operator()(std::map<std::string, ast::Value> const& map) const {
_os << "map {\n";
for (auto& entry : map) {
_os << _indent << " " << entry.first << '|';
boost::apply_visitor(vis(_os, _indent+" "), entry.second);
_os << "\n";
}
_os << _indent << "}\n";
}
void operator()(std::string const& s) const {
_os << s;
}
private:
std::ostream& _os;
std::string _indent;
};
boost::apply_visitor(vis(os), value);
return os;
}
Now it prints:
map {
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
}
=====
map {
Key1|Value1
Key2|InnerK1 = one; IK2 = 11-Nov-2011;
}
=====
map {
K1|V1
K2|IK1=IV1; IK2=IV2;
K3|V3
K4|JK1=JV1; JK2=JV2;
}
Of course, the clincher is when you change raw[Sequence] to just Sequence now:
map {
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
}
=====
map {
Key1|Value1
Key2|map {
IK2|11-Nov-2011
InnerK1|one
}
}
=====
map {
K1|V1
K2|map {
IK1|IV1
IK2|IV2
}
K3|V3
K4|map {
JK1|JV1
JK2|JV2
}
}
Full Demo Code
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/variant.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <iostream>
#include <string>
#include <map>
namespace ast {
typedef boost::make_recursive_variant<std::string, std::map<std::string, boost::recursive_variant_> >::type Value;
typedef std::map<std::string, Value> Sequence;
}
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper>
struct NestedGrammar : qi::grammar <It, ast::Sequence(), Skipper>
{
NestedGrammar() : NestedGrammar::base_type(Sequence)
{
using namespace qi;
KeyName = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
String = +qi::char_("-.a-zA-Z_0-9");
Pair = KeyName >> -(
'=' >> ('{' >> Sequence >> '}' | String)
);
Sequence = Pair % char_(";&") >> -omit[char_(";&")];
BOOST_SPIRIT_DEBUG_NODES((KeyName) (String) (Pair) (Sequence))
}
private:
qi::rule<It, ast::Sequence(), Skipper> Sequence;
qi::rule<It, std::pair<std::string, ast::Value>(), Skipper> Pair;
qi::rule<It, std::string(), Skipper> String;
qi::rule<It, std::string()> KeyName;
};
template <typename Iterator>
ast::Sequence DoParse(Iterator begin, Iterator end)
{
NestedGrammar<Iterator, qi::space_type> p;
ast::Sequence data;
qi::phrase_parse(begin, end, p, qi::space, data);
return data;
}
static inline std::ostream& operator<<(std::ostream& os, ast::Value const& value) {
struct vis : boost::static_visitor<> {
vis(std::ostream& os, std::string indent = "") : _os(os), _indent(indent) {}
void operator()(std::map<std::string, ast::Value> const& map) const {
_os << "map {\n";
for (auto& entry : map) {
_os << _indent << " " << entry.first << '|';
boost::apply_visitor(vis(_os, _indent+" "), entry.second);
_os << "\n";
}
_os << _indent << "}\n";
}
void operator()(std::string const& s) const {
_os << s;
}
private:
std::ostream& _os;
std::string _indent;
};
boost::apply_visitor(vis(os), value);
return os;
}
int main()
{
std::string const Example1 = "Key1=Value1 ; Key2 = 01-Jan-2015; Key3 = 2.7181; Key4 = Johnny";
std::string const Example2 = "Key1 = Value1; Key2 = {InnerK1 = one; IK2 = 11-Nov-2011;};";
std::string const Example3 = "K1 = V1; K2 = {IK1=IV1; IK2=IV2;}; K3=V3; K4 = {JK1=JV1; JK2=JV2;};";
std::cout << DoParse(Example1.begin(), Example1.end()) << "\n";
std::cout << DoParse(Example2.begin(), Example2.end()) << "\n";
std::cout << DoParse(Example3.begin(), Example3.end()) << "\n";
}
¹ You "had" it, but not in the right place! It should go before any Boost includes.

Related

Boost.Spirit: porting string pairs from Qi to X3

I have the following working Qi code:
struct query_grammar
: public boost::spirit::qi::grammar<Iterator, string_map<std::string>()>
{
query_grammar() : query_grammar::base_type(query)
{
query = pair >> *(boost::spirit::qi::lit('&') >> pair);
pair = +qchar >> -(boost::spirit::qi::lit('=') >> +qchar);
qchar = ~boost::spirit::qi::char_("&=");
}
boost::spirit::qi::rule<Iterator, std::map<std::string,std::string>()> query;
boost::spirit::qi::rule<Iterator, std::map<std::string,std::string>::value_type()> pair;
boost::spirit::qi::rule<Iterator, char()> qchar;
};
I tried porting it to x3:
namespace x3 = boost::spirit::x3;
const x3::rule<class query_char_, char> query_char_ = "query_char";
const x3::rule<class string_pair_, std::map<std::string,std::string>::value_type> string_pair_ = "string_pair";
const x3::rule<class string_map_, std::map<std::string,std::string>> string_map_ = "string_map";
const auto query_char__def = ~boost::spirit::x3::char_("&=");
const auto string_pair__def = +query_char_ >> -(boost::spirit::x3::lit('=') >> +query_char_);
const auto string_map__def = string_pair_ >> *(boost::spirit::x3::lit('&') >> string_pair_);
BOOST_SPIRIT_DEFINE(string_map_)
BOOST_SPIRIT_DEFINE(string_pair_)
BOOST_SPIRIT_DEFINE(query_char_)
but I am getting the following error when trying to parse a string with string_map_ :
/usr/include/boost/spirit/home/x3/support/traits/move_to.hpp:209: erreur : no matching function for call to move_to(const char*&, const char*&, std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&, boost::mpl::identity<boost::spirit::x3::traits::plain_attribute>::type)
detail::move_to(first, last, dest, typename attribute_category<Dest>::type());
~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I saw this answer: Parsing pair of strings fails. Bad spirit x3 grammar and tried to make my string_pair raw but to no avail.
Edit:
this example code from the spirit examples does not compile either so I guess the problem is a bit deeper:
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main()
{
std::string input( "cosmic pizza " );
auto iter = input.begin();
auto end_iter = input.end();
std::pair<std::string, std::string> result;
x3::parse( iter, end_iter, *(~x3::char_(' ')) >> ' ' >> *x3::char_, result);
}
Qi Fixes
First off, I had to fix the rule declaration with the Qi variant before it could work:
qi::rule<Iterator, std::pair<std::string,std::string>()> pair;
For the simple reason that value_type has pair<key_type const, mapped_type> which is never assignable.
Here's a Qi SSCCE:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
namespace qi = boost::spirit::qi;
template <typename T> using string_map = std::map<T, T>;
template <typename Iterator>
struct query_grammar : public qi::grammar<Iterator, string_map<std::string>()>
{
query_grammar() : query_grammar::base_type(query)
{
qchar = ~qi::char_("&=");
pair = +qchar >> -(qi::lit('=') >> +qchar);
query = pair >> *(qi::lit('&') >> pair);
}
private:
qi::rule<Iterator, std::map<std::string,std::string>()> query;
qi::rule<Iterator, std::pair<std::string,std::string>()> pair;
qi::rule<Iterator, char()> qchar;
};
int main() {
using It = std::string::const_iterator;
for (std::string const input : { "foo=bar&baz=boo" })
{
std::cout << "======= " << input << "\n";
It f = input.begin(), l = input.end();
string_map<std::string> sm;
if (parse(f, l, query_grammar<It>{}, sm)) {
std::cout << "Parsed " << sm.size() << " pairs\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Prints
======= foo=bar&baz=boo
Parsed 2 pairs
Qi Improvements
The following simpler grammar seems better:
Live On Coliru
template <typename Iterator, typename T = std::string>
struct query_grammar : public qi::grammar<Iterator, string_map<T>()>
{
query_grammar() : query_grammar::base_type(query) {
using namespace qi;
pair = +~char_("&=") >> '=' >> *~char_("&");
query = pair % '&';
}
private:
qi::rule<Iterator, std::pair<T,T>()> pair;
qi::rule<Iterator, std::map<T,T>()> query;
};
It accepts empty values (e.g. &q=&x=) and values containing additional =: &q=7==8&rt=bool. It could be significantly more efficient (untested).
X3 version
Without looking at your code, I translated it directly into an X3 version:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <iostream>
#include <map>
namespace x3 = boost::spirit::x3;
template <typename T> using string_map = std::map<T, T>;
namespace grammar {
using namespace x3;
auto pair = +~char_("&=") >> '=' >> *~char_("&");
auto query = pair % '&';
}
int main() {
using It = std::string::const_iterator;
for (std::string const input : { "foo=bar&baz=boo" })
{
std::cout << "======= " << input << "\n";
It f = input.begin(), l = input.end();
string_map<std::string> sm;
if (parse(f, l, grammar::query, sm)) {
std::cout << "Parsed " << sm.size() << " pairs\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Which, obviously ( --- ) prints
======= foo=bar&baz=boo
Parsed 2 pairs
X3 Improvements
You should probably want to coerce the attribute types for the rules because automatic attribute propagation can have surprising heuristics.
namespace grammar {
template <typename T = std::string> auto& query() {
using namespace x3;
static const auto s_pair
= rule<struct pair_, std::pair<T, T> > {"pair"}
= +~char_("&=") >> -('=' >> *~char_("&"));
static const auto s_query
= rule<struct query_, std::map<T, T> > {"query"}
= s_pair % '&';
return s_query;
};
}
See it Live On Coliru
What Went wrong?
The X3 version suffered the same problem with const key type in std::map<>::value_type

Treat escaped newline as line continuation

Here is an example of the syntax -- two groups of items:
I_name m_name parameter1=value parameter2=value
I_name m_name parameter1=value \
parameter2=value
My question is how to define the skip-type.
It is not just space_type but space_type minus newline.
But newline followed by backslash is a skip-type.
E.g.
I define name like that:
qi::rule<Iterator, std::string(), ascii::space_type> m_sName;
m_sName %= qi::lexeme[ascii::alpha >> *ascii::alnum];
This is obviously not correct, as the space_type must include newline-backslash.
The following grammar works for me.
*("\\\n" | ~qi::char_('\n')) % '\n'
It will ignore any newline after the backslash. And the following is a simple test.
#include <vector>
#include <string>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#define BOOST_TEST_MODULE example
#include <boost/test/unit_test.hpp>
typedef std::vector<std::string> Lines;
inline auto ParseLines(std::string const& str) {
Lines lines;
namespace qi = boost::spirit::qi;
if (qi::parse(
str.begin(), str.end(),
*("\\\n" | ~qi::char_('\n')) % '\n',
lines)) {
return lines;
}
else {
throw std::invalid_argument("Parse error at ParseLines");
}
}
BOOST_AUTO_TEST_CASE(TestParseLines) {
std::string const str =
"I_name m_name parameter1=value parameter2=value\n"
"I_name m_name parameter1 = value \\\n"
"parameter2 = value";
Lines const expected{
"I_name m_name parameter1=value parameter2=value",
"I_name m_name parameter1 = value parameter2 = value"
};
BOOST_TEST(ParseLines(str) == expected);
}
You should use "-std=c++14 -lboost_unit_test_framework" for compilation. Anyway, it is easy to convert the code for c++03.
qi::blank is exactly that. It's qi::space without newlines.
You can do this too: ("\\\n" | qi::blank)
To be able to declare a rule with such a skipper, define a skipper grammar:
template <typename It>
struct my_skipper : qi::grammar<It> {
my_skipper() : my_skipper::base_type(start) {}
qi::rule<It> start = ("\\\n" | qi::blank);
};
Full Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapted.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace ast {
struct record {
std::string iname, mname;
std::map<std::string, std::string> params;
};
using records = std::vector<record>;
}
BOOST_FUSION_ADAPT_STRUCT(ast::record, iname, mname, params)
template <typename It>
struct my_parser : qi::grammar<It, ast::records()> {
using Skipper = qi::rule<It>;
my_parser() : my_parser::base_type(start) {
skipper = ("\\\n" | qi::blank);
name = +qi::graph;
key = +(qi::graph - '=');
param = key >> '=' >> name;
record = name >> name >> *param;
records = *(record >> +qi::eol);
start = qi::skip(qi::copy(skipper)) [ records ];
}
private:
Skipper skipper;
qi::rule<It, ast::records(), Skipper> records;
qi::rule<It, ast::record(), Skipper> record;
qi::rule<It, ast::records()> start;
qi::rule<It, std::pair<std::string, std::string>()> param;
qi::rule<It, std::string()> name, key;
};
int main() {
#if 1
using It = boost::spirit::istream_iterator;
It f(std::cin >> std::noskipws), l;
#else
using It = std::string::const_iterator;
std::string const input = "something here a=1\n";
It f = input.begin(), l = input.end();
#endif
ast::records data;
bool ok = qi::parse(f, l, my_parser<It>(), data);
if (ok) {
std::cout << "Parsed:\n";
for (auto& r : data) {
std::cout << "\t" << r.iname << " " << r.mname;
for (auto& p : r.params)
std::cout << " [" << p.first << ": " << p.second << "]";
std::cout << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
}
Prints (for the input in your question):
Parsed:
I_name m_name [parameter1: value] [parameter2: value]
I_name m_name [parameter1: value] [parameter2: value]

Additional symbols in spirit parser output

We try parse simple number/text(in text present numbers, so we must split input sequence, into 2 elements type(TEXT and NUMBER) vector) grammar where number can be in follow format:
+10.90
10.90
10
+10
-10
So we write grammar:
struct CMyTag
{
TagTypes tagName;
std::string tagData;
std::vector<CMyTag> tagChild;
};
BOOST_FUSION_ADAPT_STRUCT(::CMyTag, (TagTypes, tagName) (std::string, tagData) (std::vector<CMyTag>, tagChild))
template <typename Iterator>
struct TextWithNumbers_grammar : qi::grammar<Iterator, std::vector<CMyTag>()>
{
TextWithNumbers_grammar() :
TextWithNumbers_grammar::base_type(line)
{
line = +(numbertag | texttag);
number = qi::lexeme[-(qi::lit('+') | '-') >> +qi::digit >> *(qi::char_('.') >> +qi::digit)];
numbertag = qi::attr(NUMBER) >> number;
text = +(~qi::digit - (qi::char_("+-") >> qi::digit));
texttag = qi::attr(TEXT) >> text;
}
qi::rule<Iterator, std::string()> number, text;
qi::rule<Iterator, CMyTag()> numbertag, texttag;
qi::rule<Iterator, std::vector<CMyTag>()> line;
};
Everything work fine, but if we try to parse this line:
wernwl kjwnwenrlwe +10.90+ klwnfkwenwf
We got 3 elements vector as expected, but last element in this vector will be with text(CMyTag.tagData):
++ klwnfkwenwf
Additional symbol "+" added.
We also try to rewrite grammar to simple skip number rule:
text = qi::skip(number)[+~qi::digit];
But parser died with segmentation fault exception
Attribute values are not rolled back on backtracking. In practice this is only visible with container attributes (such as vector<> or string).
In this case, the numbertag rule is parsed first and parses the + sign. Then, the number rule fails, and the already-matched + is left in the input.
I don't know exactly what you're trying to do, but it looks like you just want:
line = +(numbertag | texttag);
numbertag = attr(NUMBER) >> raw[double_];
texttag = attr(TEXT) >> raw[+(char_ - double_)];
For the input "wernwl kjwnwenrlwe +10.90e3++ klwnfkwenwf" it prints
Parse success: 5 elements
TEXT 'wernwl kjwnwenrlwe '
NUMBER '+10.90'
TEXT 'e'
NUMBER '3'
TEXT '++ klwnfkwenwf'
Live Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
enum TagTypes { NUMBER, TEXT, };
struct CMyTag {
TagTypes tagName;
std::string tagData;
};
BOOST_FUSION_ADAPT_STRUCT(::CMyTag, (TagTypes, tagName) (std::string, tagData))
template <typename Iterator>
struct TextWithNumbers_grammar : qi::grammar<Iterator, std::vector<CMyTag>()>
{
TextWithNumbers_grammar() : TextWithNumbers_grammar::base_type(line)
{
using namespace qi;
line = +(numbertag | texttag);
numbertag = attr(NUMBER) >> raw[number];
texttag = attr(TEXT) >> raw[+(char_ - number)];
}
private:
template <typename T>
struct simple_real_policies : boost::spirit::qi::real_policies<T>
{
template <typename It> // No exponent
static bool parse_exp(It&, It const&) { return false; }
template <typename It, typename Attribute> // No exponent
static bool parse_exp_n(It&, It const&, Attribute&) { return false; }
};
qi::real_parser<double, simple_real_policies<double> > number;
qi::rule<Iterator, CMyTag()> numbertag, texttag;
qi::rule<Iterator, std::vector<CMyTag>()> line;
};
int main() {
std::string const input = "wernwl kjwnwenrlwe +10.90e3++ klwnfkwenwf";
using It = std::string::const_iterator;
It f = input.begin(), l = input.end();
std::vector<CMyTag> data;
TextWithNumbers_grammar<It> g;
if (qi::parse(f, l, g, data)) {
std::cout << "Parse success: " << data.size() << " elements\n";
for (auto& s : data) {
std::cout << (s.tagName == NUMBER?"NUMBER":"TEXT")
<< "\t'" << s.tagData << "'\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

Iterative update of abstract syntax tree with boost spirit

I have a working boost spirit parser and was thinking if it is possible to do iterative update of an abstract syntax tree with boost spirit?
I have a struct similar to:
struct ast;
typedef boost::variant< boost::recursive_wrapper<ast> > node;
struct ast
{
std::vector<int> value;
std::vector<node> children;
};
Which is being parsed by use of:
bool r = phrase_parse(begin, end, grammar, space, ast);
Would it be possible to do iterative update of abstract syntax tree with boost spirit? I have not found any documentation on this, but I was thinking if the parsers semantic actions could push_back on an already existing AST. Has anyone tried this?
This would allow for parsing like this:
bool r = phrase_parse(begin, end, grammar, space, ast); //initial parsing
//the second parse will be called at a later state given some event/timer/io/something
bool r = phrase_parse(begin, end, grammar, space, ast); //additional parsing which will update the already existing AST
How would you know which nodes to merge? Or would you always add ("graft") at the root level? In that case, why don't you just parse another and merge moving the elements into the existing ast?
ast& operator+=(ast&& other) {
std::move(other.value.begin(), other.value.end(), back_inserter(value));
std::move(other.children.begin(), other.children.end(), back_inserter(children));
return *this;
}
Demo Time
Let's devise the simplest grammar I can think of for this AST:
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
Note I didn't even make the ; optional. Oh well. Samples. Exercises for readers. ☡ You know the drill.
We implement the trivial function ast parse(It f, It l), and then we can simply merge the asts:
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
merged += parse(input.begin(), input.end());
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct ast;
//typedef boost::make_recursive_variant<boost::recursive_wrapper<ast> >::type node;
typedef boost::variant<boost::recursive_wrapper<ast> > node;
struct ast {
std::vector<int> value;
std::vector<node> children;
ast& operator+=(ast&& other) {
std::move(other.value.begin(), other.value.end(), back_inserter(value));
std::move(other.children.begin(), other.children.end(), back_inserter(children));
return *this;
}
};
BOOST_FUSION_ADAPT_STRUCT(ast,
(std::vector<int>,value)
(std::vector<node>,children)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, ast(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start));
}
private:
qi::rule<It, ast(), Skipper> start;
};
// for output:
static inline std::ostream& operator<<(std::ostream& os, ast const& v) {
using namespace karma;
rule<boost::spirit::ostream_iterator, ast()> r;
r = '{' << -(int_ % ',') << ';' << -((r|eps) % ',') << '}';
return os << format(r, v);
}
template <typename It> ast parse(It f, It l)
{
ast parsed;
static grammar<It> g;
bool ok = qi::phrase_parse(f,l,g,qi::space,parsed);
if (!ok || (f!=l)) {
std::cout << "Parse failure\n";
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
exit(255);
}
return parsed;
}
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
merged += parse(input.begin(), input.end());
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Of course, it prints:
merged + {1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}} --> {1,2,3;{4;{9,8;}},{5,6;}}
merged + {10,20,30;{40;{90, 80;}},{50,60;}} --> {1,2,3,10,20,30;{4;{9,8;}},{5,6;},{40;{90,80;}},{50,60;}}
UPDATE
In this - trivial - example, you can just bind the collections to the attributes in the parse call. The same thing will happen without the operator+= call needed to move the elements, because the rules are written to automatically append to the bound container attribute.
CAVEAT: A distinct disadvantage of modifying the target value in-place is what happens if parsing fails. In the version the merged value will then be "undefined" (has received partial information from the failed parse).
So if you want to parse inputs "atomically", the first, more explicit approach is a better fit.
So the following is a slightly shorter way to write the same:
Live On Coliru
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct ast;
//typedef boost::make_recursive_variant<boost::recursive_wrapper<ast> >::type node;
typedef boost::variant<boost::recursive_wrapper<ast> > node;
struct ast {
std::vector<int> value;
std::vector<node> children;
};
BOOST_FUSION_ADAPT_STRUCT(ast,
(std::vector<int>,value)
(std::vector<node>,children)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, ast(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start));
}
private:
qi::rule<It, ast(), Skipper> start;
};
// for output:
static inline std::ostream& operator<<(std::ostream& os, ast const& v) {
using namespace karma;
rule<boost::spirit::ostream_iterator, ast()> r;
r = '{' << -(int_ % ',') << ';' << -((r|eps) % ',') << '}';
return os << format(r, v);
}
template <typename It> void parse(It f, It l, ast& into)
{
static grammar<It> g;
bool ok = qi::phrase_parse(f,l,g,qi::space,into);
if (!ok || (f!=l)) {
std::cout << "Parse failure\n";
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
exit(255);
}
}
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
parse(input.begin(), input.end(), merged);
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Still prints

Boost spirit revert parsing

I want to parse a file containing the following structure:
some
garbage *&%
section1 {
section_content
}
section2 {
section_content
}
The rule parsing section_name1 { ... } section_name2 { ... } is already defined:
section_name_rule = lexeme[+char_("A-Za-z0-9_")];
section = section_name_rule > lit("{") > /*some complicated things*/... > lit("}");
sections %= +section;
So I need to skip any garbage until the sections rule is met.
Is there any way to accomplish this? I have tried seek[sections], but it seems not to work.
EDIT:
I localized the reason why seek is not working: if I use follows operator(>>), then it works. If expectation parser is used (>), then it throws an exception. Here is a sample code:
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
using boost::phoenix::push_back;
struct section_t {
std::string name, contents;
friend std::ostream& operator<<(std::ostream& os, section_t const& s) { return os << "section_t[" << s.name << "] {" << s.contents << "}"; }
};
BOOST_FUSION_ADAPT_STRUCT(section_t, (std::string, name)(std::string, contents))
typedef std::vector<section_t> sections_t;
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, sections_t(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
using boost::spirit::repository::qi::seek;
section_name_rule = lexeme[+char_("A-Za-z0-9_")];
//Replacing '>>'s with '>'s throws an exception, while this works as expected!!
section = section_name_rule
>>
lit("{") >> lexeme[*~char_('}')] >> lit("}");
start = seek [ hold[section[push_back(qi::_val, qi::_1)]] ]
>> *(section[push_back(qi::_val, qi::_1)]);
}
private:
qi::rule<It, sections_t(), Skipper> start;
qi::rule<It, section_t(), Skipper> section;
qi::rule<It, std::string(), Skipper> section_name_rule;
};
int main() {
typedef std::string::const_iterator iter;
std::string storage("sdfsdf\n sd:fgdfg section1 {dummy } section2 {dummy } section3 {dummy }");
iter f(storage.begin()), l(storage.end());
sections_t sections;
if (qi::phrase_parse(f, l, grammar<iter>(), qi::space, sections))
{
for(auto& s : sections)
std::cout << "Parsed: " << s << "\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
So in the real example my entire grammar is constructed with expectation operators. Do I have to change everything to make the "seek" work, or is there any other way (let's say, seek a simple "{", and revert one section_name_rule back)??
Here's a demonstration, using Hamlet for inspiration: Live On Coliru
start = *seek [ no_skip[eol] >> hold [section] ];
Notes:
drop the expectation points
optimize by requiring start of line before section name
Example input:
some
garbage *&%
section1 {
Claudius: ...But now, my cousin Hamlet, and my son —
Hamlet: A little more than kin, and less than kind.
}
WE CAN DO MOAR GARBAGE
section2 {
Claudius: How is it that the clouds still hang on you?
Hamlet: Not so my lord; I am too much i' the sun
}
Output:
Parsed: section_t[section1] {Claudius: ...But now, my cousin Hamlet, and my son —
Hamlet: A little more than kin, and less than kind.
}
Parsed: section_t[section2] {Claudius: How is it that the clouds still hang on you?
Hamlet: Not so my lord; I am too much i' the sun
}
Reference Listing
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace qi = boost::spirit::qi;
struct section_t {
std::string name, contents;
friend std::ostream& operator<<(std::ostream& os, section_t const& s) { return os << "section_t[" << s.name << "] {" << s.contents << "}"; }
};
BOOST_FUSION_ADAPT_STRUCT(section_t, (std::string, name)(std::string, contents))
typedef std::vector<section_t> sections_t;
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, sections_t(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
using boost::spirit::repository::qi::seek;
section_name_rule = lexeme[+char_("A-Za-z0-9_")];
section = section_name_rule >> '{' >> lexeme[*~char_('}')] >> '}';
start = *seek [ no_skip[eol] >> hold [section] ];
BOOST_SPIRIT_DEBUG_NODES((start)(section)(section_name_rule))
}
private:
qi::rule<It, sections_t(), Skipper> start;
qi::rule<It, section_t(), Skipper> section;
qi::rule<It, std::string(), Skipper> section_name_rule;
};
int main() {
using It = boost::spirit::istream_iterator;
It f(std::cin >> std::noskipws), l;
sections_t sections;
if (qi::phrase_parse(f, l, grammar<It>(), qi::space, sections))
{
for(auto& s : sections)
std::cout << "Parsed: " << s << "\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}