reading out a stringstream and insert it in different vectores - c++

(I'm sorry if I ask this question wrong, this is my first time I write in a forum)
When I started programming at my SFML - Game, I had a very old book, wich was very C-like (eg. recommendation of atoi();).
Now I got a new C++(including C++11) book, and I want to rewrite the old lines wih newer Code.
I saved the Tiles in a file stored like this:
[0-0,15-1|22,44] [0-1|0]
[4-0,10-1,3-1|0] [0-5,5-5|0]
That means:
[...] desribes a Tile
0-0 etc. is the xy position on the Texturesheet
22 etc. is the event that will be triggered.
the amount of events and sf::Vector2i shouldn't be set constantly.
The Tiles are separately taken out from another class, which manages the entire Tilemap.
Now my problem: I have no idea how i should push the numbers from the strinstream right in two vectores?
My code:
class Tile{
private:
std::deque<sf::Sprite> tile;
std::deque<int> event;
public:
Tile(sf::Texture& texture, std::deque<sf::Vector2i>&& ctor_texturerects, std::deque<int>&& ctor_events);//This one is working fine
Tile(sf::Texture& texture, std::stringstream&& ctor_stream/*reads the Tile*/){
std::deque<sf::Vector2i> temp_texturerects;
std::deque<int>temp_events;
/*TODO: filter the stringstream and push them into the containers*/
Tile::Tile(texture,std::move(temp_texturerect),std::move(temp_events));
}
I'd be also very happy if you could give me another solution, like changing sf::Vector2i to a better solution or giving me a better stream and class concept
Thanks in advance
Xeno Ceph
Edit:
I made a little workaround:
(I changed the inputstream to a normal string)
But the code doesn't look good
There mujst be an easier solution
Tile:: Tile(sf::Texture& texture, std::string&& ctor_string){
std::deque<sf::Vector2i> temp_texturerects;
std::deque<int> temp_events;
std::stringstream strstr;
for(int i=0; i<ctor_string.size(); ++i){
while(ctor_string[i]!='|'){
while(ctor_string[i] != ','){
strstr << ctor_string[i];
}
sf::Vector2i v2i;
strstr >> v2i.x >> v2i.y;
temp_texturerects.push_front(v2i);
strstr.str("");
}
while(ctor_string[i]!=']'){
while(ctor_string[i] != ','){
strstr << ctor_string[i];
}
int integer;
strstr >> integer;
temp_events.push_front(integer);
strstr.str("");
}
}
Tile::Tile(texture, std::move(temp_texturerects), std::move(temp_events));
}
Has anybody a better solution?

If I understand your question correctly, you have some strings of the form
[0-0,15-1|22,44] [0-1|0]
[4-0,10-1,3-1|0] [0-5,5-5|0]
and you want to extract 2 types of data - positions (e.g. 0-0) and events (e.g. 22).
Your question is how to extract this data cleanly, discarding the [ and ] characters, etc.
One great way to approach this is to use the getline function that operates on stringstreams, which inherit from std::istream (http://www.cplusplus.com/reference/string/string/getline/). It can take custom delimiters, not just the newline character. So you can use '[', '|' and ']' as different delimiting characters and parse them in a logical order.
For example, since your string is just a collection of tiles, you can split it up into a number of functions - ParseTile, ParsePositions and ParseEvents, something like the following:
void Tile::ParseInput(stringstream&& ctor_string) {
//extract input, tile by tile
while(!ctor_string.eof()) {
string tile;
//you can treat each tile as though it is on a separate line by
//specifying the ']' character as the delimiter for the "line"
getline(ctor_string, tile, ']');
tile += "]"; //add back the ']' character that was discarded from the getline
//the string "tile" should now contain a single tile [...], which we can further process using ParseTile
ParseTile(tile);
}
}
The ParseTile function:
void Tile::ParseTile(string&& tile) {
//input string tile is e.g. " [0-0, 15-1|22,44]"
string discard; //string to hold parts of tile string that should be thrown away
string positions; //string to hold list of positions, separated by ','
string events; //string to hold events, separated by ','
//create stringstream from input
stringstream tilestream(tile);
//tilestream is e.g. "[0-0,15-1|22,44]"
getline(tilestream, discard, '['); //gets everything until '['
//now, discard is " [" and tilestream is "0-0,15-1|22,44]"
getline(tilestream, positions, '|');
//now, positions is "0-0,15-1" and tilestream is "22,44]"
getline(tilestream, events,']');
//now, events is "22,44" and tilestream is empty
ParsePositions(positions);
ParseEvents(events);
}
You can write your own ParsePositions and ParseEvents functions which basically will be a more getline calls using a ',' as the delimiting character (just loop until the string ends).

I suggest either writing a proper parser manually (not unlike the other answer proposes) or to use a proper parsing framework, like Boost Spirit.
The advantages of the latter is that you get debuggability, composability, attributes etc. "for free". Here's the simplest example I could think of:
struct TileData
{
std::deque<sf::Vector2i> texturerects;
std::deque<int> events;
};
typedef std::deque<TileData> TileDatas;
template <typename It>
struct parser : qi::grammar<It, TileDatas(), qi::space_type>
{
parser() : parser::base_type(start)
{
using namespace qi;
v2i = (int_ >> '-' >> int_)
[ _val = phx::construct<sf::Vector2i>(_1, _2) ];
tiledata =
(v2i % ',') >> '|' >>
(int_ % ',');
start = *('[' >> tiledata >> ']');
}
private:
qi::rule<It, sf::Vector2i(), qi::space_type> v2i;
qi::rule<It, TileData(), qi::space_type> tiledata;
qi::rule<It, TileDatas(), qi::space_type> start;
};
Adding a bit of code to test this, see it live on http://liveworkspace.org/code/3WM0My$1, output:
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 0), v2i(15, 1), }
events: deque<i> {22, 44, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 1), }
events: deque<i> {0, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(4, 0), v2i(10, 1), v2i(3, 1), }
events: deque<i> {0, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 5), v2i(5, 5), }
events: deque<i> {0, }
}
Full code:
#define BOOST_SPIRIT_USE_PHOENIX_V3
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
// liveworkspace.org doesn't have SFML
namespace sf { struct Vector2i { int x, y; Vector2i(int ax=0, int ay=0) : x(ax), y(ay) {} }; }
struct TileData
{
std::deque<sf::Vector2i> texturerects;
std::deque<int> events;
};
BOOST_FUSION_ADAPT_STRUCT(TileData,
(std::deque<sf::Vector2i>, texturerects)
(std::deque<int>, events))
typedef std::deque<TileData> TileDatas;
template <typename It>
struct parser : qi::grammar<It, TileDatas(), qi::space_type>
{
parser() : parser::base_type(start)
{
using namespace qi;
v2i = (int_ >> '-' >> int_)
[ _val = phx::construct<sf::Vector2i>(_1, _2) ];
tiledata =
(v2i % ',') >> '|' >>
(int_ % ',');
start = *('[' >> tiledata >> ']');
}
private:
qi::rule<It, sf::Vector2i(), qi::space_type> v2i;
qi::rule<It, TileData(), qi::space_type> tiledata;
qi::rule<It, TileDatas(), qi::space_type> start;
};
typedef boost::spirit::istream_iterator It;
std::ostream& operator<<(std::ostream& os, sf::Vector2i const &v) { return os << "v2i(" << v.x << ", " << v.y << ")"; }
template <typename T> std::ostream& operator<<(std::ostream& os, std::deque<T> const &d) {
os << "deque<" << typeid(T).name() << "> {";
for (auto& t : d) os << t << ", ";
return os << "}";
}
std::ostream& operator<<(std::ostream& os, TileData const& ttd) {
return os << "TileData {\n"
"\ttexturerects: " << ttd.texturerects << "\n"
"\tevents: " << ttd.events << "\n}";
}
int main()
{
parser<It> p;
std::istringstream iss(
"[0-0,15-1|22,44] [0-1|0]\n"
"[4-0,10-1,3-1|0] [0-5,5-5|0]");
It f(iss), l;
TileDatas data;
if (qi::phrase_parse(f,l,p,qi::space,data))
{
for (auto& tile : data)
{
std::cout << "Parsed: " << tile << "\n";
}
}
if (f != l)
{
std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
}
}

Related

Create grammar for retrieving vector of strings

I have a file containing data on the form:
fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
I want to extract the name of data containers, so I want to retrieve a vector containing in the specific case mand1, mand2, julia1.
I've read the sample about parsing a number list into a vector, but I want to maintain the grammar in a separate file.
I've create a struct representing the grammar, and then I use it in order to parse the string containing data. I would expect an output like
mand1
mand2
julia1
Instead I obtain
mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
My parser recognizes the first fractal term but then it parses the rest of the file as single string item instead that parse it as I want.
What I'm doing wrong?
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <vector>
#include <iostream>
using boost::spirit::ascii::space;
using boost::spirit::ascii::space_type;
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::lit;
using boost::spirit::qi::lexeme;
using boost::spirit::qi::skip;
using boost::spirit::ascii::char_;
using boost::spirit::ascii::no_case;
using boost::spirit::qi::rule;
typedef std::string::const_iterator sit;
template <typename Iterator>
struct FractalListParser : boost::spirit::qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type> {
FractalListParser() : FractalListParser::base_type(start) {
no_quoted_string %= *(lexeme[+(char_ - '"')]);
start %= *(no_case[lit("fractal")] >> no_quoted_string >> '{' >> *(skip[*(char_)]) >> '}');
}
rule<Iterator, std::string(), space_type> no_quoted_string;
rule<Iterator, std::vector<std::string>(), space_type> start;
};
int main() {
const std::string fractalListFile(R"(
fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
)");
std::cout << "Read Test:" << std::endl;
FractalListParser<sit> parser;
std::vector<std::string> data;
bool r = phrase_parse(fractalListFile.begin(), fractalListFile.end(), parser, space, data);
for (auto& i : data) std::cout << i << std::endl;
return 0;
}
If you use error handling, you'll find that the parse failed, and nothing got effectively parsed:
Live On Coliru
Output:
Read Test:
Parse success:
----
mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
Remaining unparsed input: 'fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
'
What was the problem?
You probably want to ignore the "body" (between {}). Therefore I suppose you actually wanted to omit the attribute:
>> '{' >> *(omit[*(char_)]) >> '}'
rather than skip(*char_).
The expression *char_ is greedy, and will always match to the end of input... You probably wanted to limit the charset:
in the "name" *~char_("\"{") to avoid "eating" all of the body as well. To avoid matching spaces use graph (e.g. +graph - '"'). In case you want to parse "identifiers" be explicit e.g.
alpha > *(alnum | char_('_'))
in the body *~char_('}') or *(char_ - '}') (the latter being less efficient).
The nesting of optional quantifiers is not productive:
*(omit[*(char_)])
Will just have very slow worst-case runtime (because *char_ could be empty, and *(omit[*(char_)]) could also be empty). Say what you mean instead:
omit[*char_]
The simplest way to have a lexeme is to drop the skipper from the rule declaration (see also Boost spirit skipper issues)
Program logic:
Since your sample contains nested blocks (mand2 for example), you need to treat the blocks recursively in order to avoid calling the first } the end of the outer block:
block = '{' >> -block % (+~char_("{}")) >> '}';
Loose hints:
use BOOST_SPIRIT_DEBUG to find out where parsing is rejected/matched. E.g. after refactoring the rules a bit:
we got the output (On Coliru):
Read Test:
<start>
<try>fractal mand1 {\n </try>
<no_quoted_string>
<try>mand1 {\n ;lkkj;kj</try>
<success> {\n ;lkkj;kj;\n}\n\n</success>
<attributes>[[m, a, n, d, 1]]</attributes>
</no_quoted_string>
<body>
<try>{\n ;lkkj;kj;\n}\n\nf</try>
<fail/>
</body>
<success>fractal mand1 {\n </success>
<attributes>[[]]</attributes>
</start>
Parse success:
Remaining unparsed input: 'fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
'
That output helped me spot that I actually forgot the - '}' part in the body rule... :)
No need for %= when there are no semantic actions involved in that rule definition (docs)
you probably want to make sure fractal is actually a separate word, so you don't match fractalset multi { .... }
Demo Program
With these in place we can have a working demo:
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
template <typename Iterator>
struct FractalListParser : qi::grammar<Iterator, std::vector<std::string>(), qi::space_type> {
FractalListParser() : FractalListParser::base_type(start) {
using namespace qi;
identifier = alpha > *(alnum | char_('_'));
block = '{' >> -block % +~char_("{}") >> '}';
start = *(
no_case["fractal"] >> identifier >> block
);
BOOST_SPIRIT_DEBUG_NODES((start)(block)(identifier))
}
qi::rule<Iterator, std::vector<std::string>(), qi::space_type> start;
// lexemes (just drop the skipper)
qi::rule<Iterator, std::string()> identifier;
qi::rule<Iterator> block; // leaving out the attribute means implicit `omit[]`
};
int main() {
using It = boost::spirit::istream_iterator;
It f(std::cin >> std::noskipws), l;
std::cout << "Read Test:" << std::endl;
FractalListParser<It> parser;
std::vector<std::string> data;
bool r = qi::phrase_parse(f, l, parser, qi::space, data);
if (r) {
std::cout << "Parse success:\n";
for (auto& i : data)
std::cout << "----\n" << i << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints:
Read Test:
Parse success:
----
mand1
----
mand2
----
julia1

Additional symbols in spirit parser output

We try parse simple number/text(in text present numbers, so we must split input sequence, into 2 elements type(TEXT and NUMBER) vector) grammar where number can be in follow format:
+10.90
10.90
10
+10
-10
So we write grammar:
struct CMyTag
{
TagTypes tagName;
std::string tagData;
std::vector<CMyTag> tagChild;
};
BOOST_FUSION_ADAPT_STRUCT(::CMyTag, (TagTypes, tagName) (std::string, tagData) (std::vector<CMyTag>, tagChild))
template <typename Iterator>
struct TextWithNumbers_grammar : qi::grammar<Iterator, std::vector<CMyTag>()>
{
TextWithNumbers_grammar() :
TextWithNumbers_grammar::base_type(line)
{
line = +(numbertag | texttag);
number = qi::lexeme[-(qi::lit('+') | '-') >> +qi::digit >> *(qi::char_('.') >> +qi::digit)];
numbertag = qi::attr(NUMBER) >> number;
text = +(~qi::digit - (qi::char_("+-") >> qi::digit));
texttag = qi::attr(TEXT) >> text;
}
qi::rule<Iterator, std::string()> number, text;
qi::rule<Iterator, CMyTag()> numbertag, texttag;
qi::rule<Iterator, std::vector<CMyTag>()> line;
};
Everything work fine, but if we try to parse this line:
wernwl kjwnwenrlwe +10.90+ klwnfkwenwf
We got 3 elements vector as expected, but last element in this vector will be with text(CMyTag.tagData):
++ klwnfkwenwf
Additional symbol "+" added.
We also try to rewrite grammar to simple skip number rule:
text = qi::skip(number)[+~qi::digit];
But parser died with segmentation fault exception
Attribute values are not rolled back on backtracking. In practice this is only visible with container attributes (such as vector<> or string).
In this case, the numbertag rule is parsed first and parses the + sign. Then, the number rule fails, and the already-matched + is left in the input.
I don't know exactly what you're trying to do, but it looks like you just want:
line = +(numbertag | texttag);
numbertag = attr(NUMBER) >> raw[double_];
texttag = attr(TEXT) >> raw[+(char_ - double_)];
For the input "wernwl kjwnwenrlwe +10.90e3++ klwnfkwenwf" it prints
Parse success: 5 elements
TEXT 'wernwl kjwnwenrlwe '
NUMBER '+10.90'
TEXT 'e'
NUMBER '3'
TEXT '++ klwnfkwenwf'
Live Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
enum TagTypes { NUMBER, TEXT, };
struct CMyTag {
TagTypes tagName;
std::string tagData;
};
BOOST_FUSION_ADAPT_STRUCT(::CMyTag, (TagTypes, tagName) (std::string, tagData))
template <typename Iterator>
struct TextWithNumbers_grammar : qi::grammar<Iterator, std::vector<CMyTag>()>
{
TextWithNumbers_grammar() : TextWithNumbers_grammar::base_type(line)
{
using namespace qi;
line = +(numbertag | texttag);
numbertag = attr(NUMBER) >> raw[number];
texttag = attr(TEXT) >> raw[+(char_ - number)];
}
private:
template <typename T>
struct simple_real_policies : boost::spirit::qi::real_policies<T>
{
template <typename It> // No exponent
static bool parse_exp(It&, It const&) { return false; }
template <typename It, typename Attribute> // No exponent
static bool parse_exp_n(It&, It const&, Attribute&) { return false; }
};
qi::real_parser<double, simple_real_policies<double> > number;
qi::rule<Iterator, CMyTag()> numbertag, texttag;
qi::rule<Iterator, std::vector<CMyTag>()> line;
};
int main() {
std::string const input = "wernwl kjwnwenrlwe +10.90e3++ klwnfkwenwf";
using It = std::string::const_iterator;
It f = input.begin(), l = input.end();
std::vector<CMyTag> data;
TextWithNumbers_grammar<It> g;
if (qi::parse(f, l, g, data)) {
std::cout << "Parse success: " << data.size() << " elements\n";
for (auto& s : data) {
std::cout << (s.tagName == NUMBER?"NUMBER":"TEXT")
<< "\t'" << s.tagData << "'\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

Parsing nested key value pairs in Boost Spirit

I am having trouble writing what I think should be a simple parser using Boost::Spirit. (I'm using Spirit instead of just using string functions as this is partly a learning exercise for me).
Data
The data to parse takes the form of key value pairs, where a value can itself be a key value pair. Keys are alphanumeric (with underscores and no digit as first character); values are alphanumeric plus .-_ - the values can be dates in the format DD-MMM-YYYY e.g. 01-Jan-2015 and floating point numbers like 3.1415 in addition to plain old alphanumeric strings. Keys and values are separated with =; pairs are separated with ;; structured values are delimited with {...}. At the moment I am erasing all spaces from the user input before passing it to Spirit.
Example input:
Key1 = Value1; Key2 = { NestedKey1=Alan; NestedKey2 = 43.1232; }; Key3 = 15-Jul-1974 ;
I would then strip all spaces to give
Key1=Value1;Key2={NestedKey1=Alan;NestedKey2=43.1232;};Key3=15-Jul-1974;
and then I actually pass it to Spirit.
Problem
What I currently have works just dandy when values are simply values. When I start encoding structured values in the input then Spirit stops after the first structured value. A workaround if there is only one structured value is to put it at the end of the input... but I will need two or more structured values on occasion.
The code
The below compiles in VS2013 and illustrates the errors:
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/pair.hpp>
#include <boost/fusion/adapted.hpp>
#include <map>
#include <string>
#include <iostream>
typedef std::map<std::string, std::string> ARGTYPE;
#define BOOST_SPIRIT_DEBUG
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
template < typename It, typename Skipper>
struct NestedGrammar : qi::grammar < It, ARGTYPE(), Skipper >
{
NestedGrammar() : NestedGrammar::base_type(Sequence)
{
using namespace qi;
KeyName = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
Value = +qi::char_("-.a-zA-Z_0-9");
Pair = KeyName >> -(
'=' >> ('{' >> raw[Sequence] >> '}' | Value)
);
Sequence = Pair >> *((qi::lit(';') | '&') >> Pair);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Pair);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, ARGTYPE(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName;
qi::rule<It, std::string(), Skipper> Value;
qi::rule<It, std::pair < std::string, std::string>(), Skipper> Pair;
};
template <typename Iterator>
ARGTYPE Parse2(Iterator begin, Iterator end)
{
NestedGrammar<Iterator, qi::space_type> p;
ARGTYPE data;
qi::phrase_parse(begin, end, p, qi::space, data);
return data;
}
// ARGTYPE is std::map<std::string,std::string>
void NestedParse(std::string Input, ARGTYPE& Output)
{
Input.erase(std::remove_if(Input.begin(), Input.end(), isspace), Input.end());
Output = Parse2(Input.begin(), Input.end());
}
int main(int argc, char** argv)
{
std::string Example1, Example2, Example3;
ARGTYPE Out;
Example1 = "Key1=Value1 ; Key2 = 01-Jan-2015; Key3 = 2.7181; Key4 = Johnny";
Example2 = "Key1 = Value1; Key2 = {InnerK1 = one; IK2 = 11-Nov-2011;};";
Example3 = "K1 = V1; K2 = {IK1=IV1; IK2=IV2;}; K3=V3; K4 = {JK1=JV1; JK2=JV2;};";
NestedParse(Example1, Out);
for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
std::cout << i->first << "|" << i->second << std::endl;
std::cout << "=====" << std::endl;
/* get the following, as expected:
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
*/
NestedParse(Example2, Out);
for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
std::cout << i->first << "|" << i->second << std::endl;
std::cout << "=====" << std::endl;
/* get the following, as expected:
Key1|Value1
key2|InnerK1=one;IK2=11-Nov-2011
*/
NestedParse(Example3, Out);
for (ARGTYPE::iterator i = Out.begin(); i != Out.end(); i++)
std::cout << i->first << "|" << i->second << std::endl;
/* Only get the first two lines of the expected output:
K1|V1
K2|IK1=IV1;IK2=IV2
K3|V3
K4|JK1=JV1;JK2=JV2
*/
return 0;
}
I'm not sure if the problem is down to my ignorance of BNF, my ignorance of Spirit, or perhaps my ignorance of both at this point.
Any help appreciated. I've read e.g. Spirit Qi sequence parsing issues and links therein but I still can't figure out what I am doing wrong.
Indeed this precisely a simple grammar that Spirit excels at.
Moreover there is absolutely no need to skip whitespace up front: Spirit has skippers built in for the purpose.
To your explicit question, though:
The Sequence rule is overcomplicated. You could just use the list operator (%):
Sequence = Pair % char_(";&");
Now your problem is that you end the sequence with a ; that isn't expected, so both Sequence and Value fail the parse eventually. This isn't very clear unless you #define BOOST_SPIRIT_DEBUG¹ and inspect the debug output.
So to fix it use:
Sequence = Pair % char_(";&") >> -omit[char_(";&")];
Fix Live On Coliru (or with debug info)
Prints:
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
=====
Key1|Value1
Key2|InnerK1=one;IK2=11-Nov-2011;
=====
K1|V1
K2|IK1=IV1;IK2=IV2;
K3|V3
K4|JK1=JV1;JK2=JV2;
Bonus Cleanup
Actually, that was simple. Just remove the redundant line removing whitespace. The skipper was already qi::space.
(Note though that the skipper doesn't apply to your Value rule, so values cannot contain whitespace but the parser will not silently skip it either; I suppose this is likely what you want. Just be aware of it).
Recursive AST
You would actually want to have a recursive AST, instead of parsing into a flat map.
Boost recursive variants make this a breeze:
namespace ast {
typedef boost::make_recursive_variant<std::string, std::map<std::string, boost::recursive_variant_> >::type Value;
typedef std::map<std::string, Value> Sequence;
}
To make this work you just change the declared attribute types of the rules:
qi::rule<It, ast::Sequence(), Skipper> Sequence;
qi::rule<It, std::pair<std::string, ast::Value>(), Skipper> Pair;
qi::rule<It, std::string(), Skipper> String;
qi::rule<It, std::string()> KeyName;
The rules themselves don't even have to change at all. You will need to write a little visitor to stream the AST:
static inline std::ostream& operator<<(std::ostream& os, ast::Value const& value) {
struct vis : boost::static_visitor<> {
vis(std::ostream& os, std::string indent = "") : _os(os), _indent(indent) {}
void operator()(std::map<std::string, ast::Value> const& map) const {
_os << "map {\n";
for (auto& entry : map) {
_os << _indent << " " << entry.first << '|';
boost::apply_visitor(vis(_os, _indent+" "), entry.second);
_os << "\n";
}
_os << _indent << "}\n";
}
void operator()(std::string const& s) const {
_os << s;
}
private:
std::ostream& _os;
std::string _indent;
};
boost::apply_visitor(vis(os), value);
return os;
}
Now it prints:
map {
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
}
=====
map {
Key1|Value1
Key2|InnerK1 = one; IK2 = 11-Nov-2011;
}
=====
map {
K1|V1
K2|IK1=IV1; IK2=IV2;
K3|V3
K4|JK1=JV1; JK2=JV2;
}
Of course, the clincher is when you change raw[Sequence] to just Sequence now:
map {
Key1|Value1
Key2|01-Jan-2015
Key3|2.7181
Key4|Johnny
}
=====
map {
Key1|Value1
Key2|map {
IK2|11-Nov-2011
InnerK1|one
}
}
=====
map {
K1|V1
K2|map {
IK1|IV1
IK2|IV2
}
K3|V3
K4|map {
JK1|JV1
JK2|JV2
}
}
Full Demo Code
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/variant.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <iostream>
#include <string>
#include <map>
namespace ast {
typedef boost::make_recursive_variant<std::string, std::map<std::string, boost::recursive_variant_> >::type Value;
typedef std::map<std::string, Value> Sequence;
}
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper>
struct NestedGrammar : qi::grammar <It, ast::Sequence(), Skipper>
{
NestedGrammar() : NestedGrammar::base_type(Sequence)
{
using namespace qi;
KeyName = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
String = +qi::char_("-.a-zA-Z_0-9");
Pair = KeyName >> -(
'=' >> ('{' >> Sequence >> '}' | String)
);
Sequence = Pair % char_(";&") >> -omit[char_(";&")];
BOOST_SPIRIT_DEBUG_NODES((KeyName) (String) (Pair) (Sequence))
}
private:
qi::rule<It, ast::Sequence(), Skipper> Sequence;
qi::rule<It, std::pair<std::string, ast::Value>(), Skipper> Pair;
qi::rule<It, std::string(), Skipper> String;
qi::rule<It, std::string()> KeyName;
};
template <typename Iterator>
ast::Sequence DoParse(Iterator begin, Iterator end)
{
NestedGrammar<Iterator, qi::space_type> p;
ast::Sequence data;
qi::phrase_parse(begin, end, p, qi::space, data);
return data;
}
static inline std::ostream& operator<<(std::ostream& os, ast::Value const& value) {
struct vis : boost::static_visitor<> {
vis(std::ostream& os, std::string indent = "") : _os(os), _indent(indent) {}
void operator()(std::map<std::string, ast::Value> const& map) const {
_os << "map {\n";
for (auto& entry : map) {
_os << _indent << " " << entry.first << '|';
boost::apply_visitor(vis(_os, _indent+" "), entry.second);
_os << "\n";
}
_os << _indent << "}\n";
}
void operator()(std::string const& s) const {
_os << s;
}
private:
std::ostream& _os;
std::string _indent;
};
boost::apply_visitor(vis(os), value);
return os;
}
int main()
{
std::string const Example1 = "Key1=Value1 ; Key2 = 01-Jan-2015; Key3 = 2.7181; Key4 = Johnny";
std::string const Example2 = "Key1 = Value1; Key2 = {InnerK1 = one; IK2 = 11-Nov-2011;};";
std::string const Example3 = "K1 = V1; K2 = {IK1=IV1; IK2=IV2;}; K3=V3; K4 = {JK1=JV1; JK2=JV2;};";
std::cout << DoParse(Example1.begin(), Example1.end()) << "\n";
std::cout << DoParse(Example2.begin(), Example2.end()) << "\n";
std::cout << DoParse(Example3.begin(), Example3.end()) << "\n";
}
¹ You "had" it, but not in the right place! It should go before any Boost includes.

Is Boost skip parser the right approach?

After some delay I'm now again trying to parse some ASCII text file
surrounded by some binary characters.
Parsing text file with binary envelope using boost Spririt
However I'm now struggling if a skip parser is the right approach?
The grammar of the file (it's a JEDEC file) is quite simple:
Each data field in the file starts with a single letter and ends with an asterisk. The data field can contain spaces and carriage return.
After the asterisk spaces and carriage return might follow too before the
next field identifier.
This is what I used to start building a parser for such a file:
phrase_parse(first, last,
// First char in File
char_('\x02') >>
// Data field
*((print[cout << _1] | graph[cout << _1]) - char_('*')) >>
// End of data followed by 4 digit hexnumber. How to limit?
char_('\x03') >> *xdigit,
// Skip asterisks
char_('*') );
Unfortunately I don't get any output from this one. Does someone have an idea what might be wrong?
Sample file:
<STX>
JEDEC file generated by John Doe*
DM SIGNETICS(PHILIPS)*
DD GAL16R8*
QP20*
QV0*
G0*F0*
L00000 1110101111100110111101101110111100111111*
CDEAD*
<ETX>BEEF
and this is what I want to achive:
Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF
I would suggest you want to use a skipper at the toplevel rule only. And use it to skip the insignificant whitespace.
You don't use a skipper for the asterisks because you do not want to ignore them. If they're ignored, your rules cannot act upon them.
Furthermore the inner rules should not use the space skipper for the simple reason that whitespace and linefeeds are valid field data in JEDEC.
So, the upshot of all this would be:
value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;
Where the rules would be declared with the respective skippers:
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
qi::rule<It, ascii::space_type> start;
qi::rule<It> field, value; // no skippers - they are lexemes
Take-away: Split your grammar up in rules. Be happier for it.
Processing the results
Your sample needlessly mixes responsibilities for parsing and "printing".
I'd suggest not using semantic actions here (Boost Spirit: "Semantic actions are evil"?).
Instead, declare appropriate attribute types:
struct JEDEC {
std::string caption;
struct field {
char id;
std::string value;
};
std::vector<field> fields;
uint16_t checksum;
};
And declare them in your rules:
qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()> field;
qi::rule<It, std::string()> value;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
Now, nothing needs to be changed in your grammar, and you can print the desired output with:
inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
os << "Start: " << jedec.caption << "\n";
for(auto& f : jedec.fields)
os << f.id << ": " << f.value << "\n";
auto saved = os.rdstate();
os << "End: " << std::hex << std::setw(4) << std::setfill('0') << jedec.checksum;
os.setstate(saved);
return os;
}
LIVE DEMO
Here's a demo program that ties it together using the sample input from your question:
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace ascii = qi::ascii;
namespace ast {
struct JEDEC {
std::string caption;
struct field {
char id;
std::string value;
};
std::vector<field> fields;
uint16_t checksum;
};
inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
os << "Start: " << jedec.caption << "\n";
for(auto& f : jedec.fields)
os << f.id << ": " << f.value << "\n";
auto saved = os.rdstate();
os << "End: " << std::hex << std::setw(4) << std::setfill('0') << std::uppercase << jedec.checksum;
os.setstate(saved);
return os;
}
}
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC::field,
(char, id)(std::string, value))
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC,
(std::string, caption)
(std::vector<ast::JEDEC::field>, fields)
(uint16_t, checksum))
template <typename It>
struct JedecGrammar : qi::grammar<It, ast::JEDEC(), ascii::space_type>
{
JedecGrammar() : JedecGrammar::base_type(start) {
const char STX = '\x02';
const char ETX = '\x03';
value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;
BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
}
private:
qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()> field;
qi::rule<It, std::string()> value;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
};
int main() {
typedef boost::spirit::istream_iterator It;
It first(std::cin>>std::noskipws), last;
JedecGrammar<It> g;
ast::JEDEC jedec;
bool ok = phrase_parse(first, last, g, ascii::space, jedec);
if (ok)
{
std::cout << "Parse success\n";
std::cout << jedec;
}
else
std::cout << "Parse failed\n";
if (first != last)
std::cout << "Remaining input unparsed: '" << std::string(first, last) << "'\n";
}
Output:
Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF
Take-away: See your dentist twice a year.

Iterative update of abstract syntax tree with boost spirit

I have a working boost spirit parser and was thinking if it is possible to do iterative update of an abstract syntax tree with boost spirit?
I have a struct similar to:
struct ast;
typedef boost::variant< boost::recursive_wrapper<ast> > node;
struct ast
{
std::vector<int> value;
std::vector<node> children;
};
Which is being parsed by use of:
bool r = phrase_parse(begin, end, grammar, space, ast);
Would it be possible to do iterative update of abstract syntax tree with boost spirit? I have not found any documentation on this, but I was thinking if the parsers semantic actions could push_back on an already existing AST. Has anyone tried this?
This would allow for parsing like this:
bool r = phrase_parse(begin, end, grammar, space, ast); //initial parsing
//the second parse will be called at a later state given some event/timer/io/something
bool r = phrase_parse(begin, end, grammar, space, ast); //additional parsing which will update the already existing AST
How would you know which nodes to merge? Or would you always add ("graft") at the root level? In that case, why don't you just parse another and merge moving the elements into the existing ast?
ast& operator+=(ast&& other) {
std::move(other.value.begin(), other.value.end(), back_inserter(value));
std::move(other.children.begin(), other.children.end(), back_inserter(children));
return *this;
}
Demo Time
Let's devise the simplest grammar I can think of for this AST:
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
Note I didn't even make the ; optional. Oh well. Samples. Exercises for readers. ☡ You know the drill.
We implement the trivial function ast parse(It f, It l), and then we can simply merge the asts:
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
merged += parse(input.begin(), input.end());
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct ast;
//typedef boost::make_recursive_variant<boost::recursive_wrapper<ast> >::type node;
typedef boost::variant<boost::recursive_wrapper<ast> > node;
struct ast {
std::vector<int> value;
std::vector<node> children;
ast& operator+=(ast&& other) {
std::move(other.value.begin(), other.value.end(), back_inserter(value));
std::move(other.children.begin(), other.children.end(), back_inserter(children));
return *this;
}
};
BOOST_FUSION_ADAPT_STRUCT(ast,
(std::vector<int>,value)
(std::vector<node>,children)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, ast(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start));
}
private:
qi::rule<It, ast(), Skipper> start;
};
// for output:
static inline std::ostream& operator<<(std::ostream& os, ast const& v) {
using namespace karma;
rule<boost::spirit::ostream_iterator, ast()> r;
r = '{' << -(int_ % ',') << ';' << -((r|eps) % ',') << '}';
return os << format(r, v);
}
template <typename It> ast parse(It f, It l)
{
ast parsed;
static grammar<It> g;
bool ok = qi::phrase_parse(f,l,g,qi::space,parsed);
if (!ok || (f!=l)) {
std::cout << "Parse failure\n";
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
exit(255);
}
return parsed;
}
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
merged += parse(input.begin(), input.end());
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Of course, it prints:
merged + {1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}} --> {1,2,3;{4;{9,8;}},{5,6;}}
merged + {10,20,30;{40;{90, 80;}},{50,60;}} --> {1,2,3,10,20,30;{4;{9,8;}},{5,6;},{40;{90,80;}},{50,60;}}
UPDATE
In this - trivial - example, you can just bind the collections to the attributes in the parse call. The same thing will happen without the operator+= call needed to move the elements, because the rules are written to automatically append to the bound container attribute.
CAVEAT: A distinct disadvantage of modifying the target value in-place is what happens if parsing fails. In the version the merged value will then be "undefined" (has received partial information from the failed parse).
So if you want to parse inputs "atomically", the first, more explicit approach is a better fit.
So the following is a slightly shorter way to write the same:
Live On Coliru
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct ast;
//typedef boost::make_recursive_variant<boost::recursive_wrapper<ast> >::type node;
typedef boost::variant<boost::recursive_wrapper<ast> > node;
struct ast {
std::vector<int> value;
std::vector<node> children;
};
BOOST_FUSION_ADAPT_STRUCT(ast,
(std::vector<int>,value)
(std::vector<node>,children)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, ast(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start));
}
private:
qi::rule<It, ast(), Skipper> start;
};
// for output:
static inline std::ostream& operator<<(std::ostream& os, ast const& v) {
using namespace karma;
rule<boost::spirit::ostream_iterator, ast()> r;
r = '{' << -(int_ % ',') << ';' << -((r|eps) % ',') << '}';
return os << format(r, v);
}
template <typename It> void parse(It f, It l, ast& into)
{
static grammar<It> g;
bool ok = qi::phrase_parse(f,l,g,qi::space,into);
if (!ok || (f!=l)) {
std::cout << "Parse failure\n";
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
exit(255);
}
}
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
parse(input.begin(), input.end(), merged);
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Still prints