Boost.Spirit Grammar. Attributes and _val Questions - c++

I'm attempting to create a Boost::Spirit grammar class that can read a fairly simple grammar.
start = roster;
roster = *student;
student = int >> string;
The goal of the code is create a tree of command objects based on an input file that is being parsed. The Iterator that this grammar is being created with is the given spirit file iterator.
Basically, what I am having trouble doing is moving and using the synthesized attributes of each rule. What I need to to create a tree of objects based on this data, and the only functions to create said objects require the parent object to be known at that time. I'm using the command pattern to delay the creation until I have parsed all data and can correctly build the tree. The way I have implemented this so far is my commands all contain a vector of other commands. When a command is executed, it requires only the parent object, and will create and attach the child object accordingly. Then the object will execute each of the commands in it's own vector, passing itself as the parent. This creates the tree structure I need with the data in tact.
The Issue:
The Issue I am having is how to build the commands when the data is parsed, and how to load them into the appropriate vector. I've tried 3 different ways so far.
I tried to alter the attribute of each rule to an std::vector and parse the attributes in as commands one at a time. The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with.
I tried using boost::phoenix placehold _val as a surrogate for the command being created. I was proud of this solution and a bit upset that it didn't work. I overloaded the += operator for all commands so that when A and B are both commands, A += B pushed B into A's command vector. _val isn't a Command so the compiler didn't like this. I couldn't seem to tinker anything into a more workable status. If at all possible, this was the cleanest solution and I would love for this to be able to work.
The code in it's current form has me attempting to bind the actions together. If I were to have a member function pointer to _val and pass it the created command It would push it back. Again _val isn't actually a Command so that didn't work out.
I'm going to post this wall of code, it's the grammar I've written cleaned up a bit, as well as the point where it is invoked.
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, qi::space_type, T3_Command()>
{
//roster_grammar constructor
roster_grammar() :
roster_grammar::base_type(start_)
{
using qi::lit;
using qi::int_;
using qi::char_;
using qi::lexeme;
start_ = student[boost::bind(&T3_Command::add_command, qi::_val, _1)];
//I removed the roster for the time being to simplify the grammar
//it still containes my second solution that I detailed above. This
//would be my ideal solution if it could work this way.
//roster = *(student[qi::_val += _1]);
student =
qi::eps [ boost::bind(&T3_Command::set_identity, qi::_val, "Student") ]
>>
int_age [ boost::bind(&T3_Command::add_command, qi::_val, _1) ]
>>
string_name [ boost::bind(&T3_Command::add_command, qi::_val, _1) ];
int_age =
int_ [ boost::bind(&Command_Factory::create_int_comm, &cmd_creator, "Age", _1) ];
string_name =
string_p [ boost::bind(&Command_Factory::create_string_comm, &cmd_creator, "Name", _1) ];
//The string parser. Returns type std::string
string_p %= +qi::alnum;
}
qi::rule<Iterator, qi::space_type, T3_Model_Command()> roster;
qi::rule<Iterator, qi::space_type, T3_Atom_Command()> student;
qi::rule<Iterator, qi::space_type, T3_Int_Command()> int_age;
qi::rule<Iterator, qi::space_type, T3_String_Command()> string_name;
qi::rule<Iterator, qi::space_type, T3_Command()> start_;
qi::rule<Iterator, std::string()> string_p;
Command_Factory cmd_creator;
};
This is how the grammar is being instantiated and used.
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::ifstream in = std::ifstream(path);
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;
using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);
So long story short, I have a grammar that I want to build commands out of (call T3_Command). Commands have a std:Vector data member that holds other commands beneath it in the tree.
What I need is a clean way to create a Command as a semantic action, I need to be able to load that into the vector of other commands (By way of attributes or just straight function calls). Commands have a type that is supposed to be specified at creation (will define the type of tree node it makes) and some commands have a data value (an int, string or float, all named value in their respective commands).
Or If there might be a better way to build a tree, I'd be open to suggestion.
Thank you so much for any help you're able to give!
EDIT:
I'll try to be more clear about the original problem I'm trying to solve. Thanks for the patience.
Given that grammar (or any grammar actually) I want to be able to parse through it and create a command tree based on the semantic actions taken within the parser.
So using my sample grammar, and the input
"23 Bryan 45 Tyler 4 Stephen"
I would like the final tree to result in the following data structure.
Command with type = "Roster" holding 3 "Student" type commands.
Command with type = "Student" each holding an Int_Command and a String_Command
Int_Command holds the stored integer and String_Command the stored string.
E.g.
r1 - Roster - [s1][s2][s3]
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
This is the current structure of the commands I've written (The implementation is all trivial).
class T3_Command
{
public:
T3_Command(void);
T3_Command(const std::string &type);
~T3_Command(void);
//Executes this command and all subsequent commands in the command vector.
void Execute(/*const Folder_in parent AND const Model_in parent*/);
//Pushes the passed T3_Command into the Command Vector
//#param comm - The command to be pushed.
void add_command(const T3_Command &comm);
//Sets the Identity of the command.
//#param ID - the new identity to be set.
void set_identity(std::string &ID);
private:
const std::string ident;
std::vector <T3_Command> command_vec;
T3_Command& operator+=(const T3_Command& rhs);
};
#pragma once
#include "T3_command.h"
class T3_Int_Command :
public T3_Command
{
public:
T3_Int_Command();
T3_Int_Command(const std::string &type, const int val);
~T3_Int_Command(void);
void Execute();
void setValue(int val);
private:
int value;
};
So the problem I am having is I would like to be able to create a data structure of various commands that represent the parse tree as spirit parses through it.

Updated in response to the edited question
Though there's still a lot of information missing (see my [new comment]), at least now you showed some input and output :)
So, without further ado, let me interpret those:
you still want to just parse (int, string) pairs, but per line
use qi::blank_type as a skipper
do roster % eol to parse roster lines
my sample parses into a vector of Rosters (one per line)
each roster contains a variable number of Students:
start = roster % eol;
roster = +student;
student = int_ >> string_p;
Note: Rule #1 Don't complicate your parser unless you really have to
you want to output the individual elements ("commands"?!?) - I'm assuming the part where this would be non-trivial is the part where the same Student might appear in several rosters?
By defining a total ordering on Students:
bool operator<(Student const& other) const {
return boost::tie(i,s) < boost::tie(other.i, other.s);
}
you make it possible to store a unique collection of students in e.g. a std::set<Student>
perhaps generating the 'variable names' (I mean r1, s1, s2...) is part of the task as well. So, to establish a unique 'variable name' with each student I create a bi-directional map of Students (after parsing, see Rule #1: don't complicate the parser unless it's absolutely necessary):
boost::bimap<std::string, Student> student_vars;
auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
for(Roster const& r: data)
for(Student const& s: r.students)
student_vars.insert({generate_id(), s});
That's about everything I can think of here. I used c++11 and boost liberally here to save on lines-of-code, but writing this without c++11/boost would be fairly trivial too. C++03 version online now
The following sample input:
ParsedT3Data const data = parseData(
"23 Bryan 45 Tyler 4 Stephen\n"
"7 Mary 45 Tyler 8 Stephane\n"
"23 Bryan 8 Stephane");
Results in (See it Live On Coliru):
parse success
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
s4 - Student - [int 7][string Mary]
s5 - Student - [int 8][string Stephane]
r1 [s1][s2][s3]
r2 [s4][s2][s5]
r3 [s1][s5]
Full code:
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <boost/bimap.hpp>
namespace qi = boost::spirit::qi;
struct Student
{
int i;
std::string s;
bool operator<(Student const& other) const {
return boost::tie(i,s) < boost::tie(other.i, other.s);
}
friend std::ostream& operator<<(std::ostream& os, Student const& o) {
return os << "Student - [int " << o.i << "][string " << o.s << "]";
}
};
struct Roster
{
std::vector<Student> students;
};
BOOST_FUSION_ADAPT_STRUCT(Student, (int, i)(std::string, s))
BOOST_FUSION_ADAPT_STRUCT(Roster, (std::vector<Student>, students))
typedef std::vector<Roster> ParsedT3Data;
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, ParsedT3Data(), qi::blank_type>
{
roster_grammar() :
roster_grammar::base_type(start)
{
using namespace qi;
start = roster % eol;
roster = eps >> +student; // known workaround
student = int_ >> string_p;
string_p = lexeme[+(graph)];
BOOST_SPIRIT_DEBUG_NODES((start)(roster)(student)(string_p))
}
qi::rule <Iterator, ParsedT3Data(), qi::blank_type> start;
qi::rule <Iterator, Roster(), qi::blank_type> roster;
qi::rule <Iterator, Student(), qi::blank_type> student;
qi::rule <Iterator, std::string()> string_p;
};
ParsedT3Data parseData(std::string const& demoData)
{
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::istringstream iss(demoData);
iss.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(iss), end;
ParsedT3Data result;
bool r = phrase_parse(begin, end, my_parser, qi::blank, result);
if (r)
std::cout << "parse (partial) success\n";
else
std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
if (begin!=end)
std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";
if (!r)
throw "TODO error handling";
return result;
}
int main()
{
ParsedT3Data const data = parseData(
"23 Bryan 45 Tyler 4 Stephen\n"
"7 Mary 45 Tyler 8 Stephane\n"
"23 Bryan 8 Stephane");
// now produce that list of stuff :)
boost::bimap<std::string, Student> student_vars;
auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
for(Roster const& r: data)
for(Student const& s: r.students)
student_vars.insert({generate_id(), s});
for(auto const& s: student_vars.left)
std::cout << s.first << " - " << s.second << "\n";
int r_id = 1;
for(Roster const& r: data)
{
std::cout << "r" << (r_id++) << " ";
for(Student const& s: r.students)
std::cout << "[" << student_vars.right.at(s) << "]";
std::cout << "\n";
}
}
OLD ANSWER
I'll respond to individual points, while awaiting more information:
1. "The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with"
A solution here would be
boost::vector<> which allows incomplete element types at time of instantiation (Boost Containers have several other nifty properties, go read about them!)
boost::variant with recursive_wrapper<> so you can indeed make logical trees. I have many answers in the boost-spirit and boost-spirit-qi tags that show this approach (e.g. for expression trees).
2. Calling factory methods from semantic actions
I have a few minor hints:
you can use qi::_1, qi::_2... to refer to the elements of a compound attribute
you should prefer using phoenix::bind inside Phoenix actors (semantic actions are Phoenix actors)
you can assign to qi::_pass to indicate parser failure
Here's a simplified version of the grammar, which shows these in action. I haven't actually built a tree, since you didn't describe any of the desired behaviour. Instead, I just print a debug line on adding nodes to the tree.
See it Live on Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <fstream>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
struct T3_Command
{
bool add_command(int i, std::string const& s)
{
std::cout << "adding command [" << i << ", " << s << "]\n";
return i != 42; // just to show how you can do input validation
}
};
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, T3_Command(), qi::space_type>
{
roster_grammar() :
roster_grammar::base_type(start_)
{
start_ = *(qi::int_ >> string_p)
[qi::_pass = phx::bind(&T3_Command::add_command, qi::_val, qi::_1, qi::_2)];
string_p = qi::lexeme[+(qi::graph)];
}
qi::rule <Iterator, T3_Command(), qi::space_type> start_;
qi::rule <Iterator, std::string()> string_p;
};
int main()
{
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::ifstream in("input.txt");
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;
using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);
if (r)
std::cout << "parse (partial) success\n";
else
std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
if (begin!=end)
std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";
return r?0:255;
}
Input:
1 klaas-jan
2 doeke-jan
3 jan-herbert
4 taeke-jan
42 oops-invalid-number
5 not-parsed
Output:
adding command [1, klaas-jan]
adding command [2, doeke-jan]
adding command [3, jan-herbert]
adding command [4, taeke-jan]
adding command [42, oops-invalid-number]
parse success
trailing unparsed: '42 oops-invalid-number
5 not-parsed
'

Related

Extract messages from stream and ignore data between the messages using a boost::spirit parser

I'm trying to create a (pretty simple) parser using boost::spirit::qi to extract messages from a stream. Each message starts from a short marker and ends with \r\n. The message body is ASCII text (letters and numbers) separated by a comma. For example:
!START,01,2.3,ABC\r\n
!START,456.2,890\r\n
I'm using unit tests to check the parser and everything works well when I pass only correct messages one by one. But when I try to emulate some invalid input, like:
!START,01,2.3,ABC\r\n
trash-message
!START,456.2,890\r\n
The parser doesn't see the following messages after an unexpected text.
I'm new in boost::spirit and I'd like to know how a parser based on boost::spirit::qi::grammar is supposed to work.
My question is:
Should the parser slide in the input stream and try to find a beginning of a message?
Or the caller should check the parsing result and in case of failure move an iterator and then recall the parser again?
Many thanks for considering my request.
My question is: Should the parser slide in the input stream and try to find a beginning of a message?
Only when you tell it to. It's called qi::parse, not qi::search. But obviously you can make a grammar ignore things.
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
struct Command {
enum Type { START, QUIT, TRASH } type = TRASH;
std::vector<std::string> args;
};
using Commands = std::vector<Command>;
BOOST_FUSION_ADAPT_STRUCT(Command, type, args)
template <typename It> struct CmdParser : qi::grammar<It, Commands()> {
CmdParser() : CmdParser::base_type(commands_) {
type_.add("!start", Command::START);
type_.add("!quit", Command::QUIT);
trash_ = *~qi::char_("\r\n"); // just ignore the entire line
arg_ = *~qi::char_(",\r\n");
command_ = qi::no_case[type_] >> *(',' >> arg_);
commands_ = *((command_ | trash_) >> +qi::eol);
BOOST_SPIRIT_DEBUG_NODES((trash_)(arg_)(command_)(commands_))
}
private:
qi::symbols<char, Command::Type> type_;
qi::rule<It, Commands()> commands_;
qi::rule<It, Command()> command_;
qi::rule<It, std::string()> arg_;
qi::rule<It> trash_;
};
int main() {
std::string_view input = "!START,01,2.3,ABC\r\n"
"trash-message\r\n"
"!START,456.2,890\r\n";
using It = std::string_view::const_iterator;
static CmdParser<It> const parser;
Commands parsed;
auto f = input.begin(), l = input.end();
if (parse(f, l, parser, parsed)) {
std::cout << "Parsed:\n";
for(Command const& cmd : parsed) {
std::cout << cmd.type;
for (auto& arg: cmd.args)
std::cout << ", " << quoted(arg);
std::cout << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << quoted(std::string(f, l)) << "\n";
}
Printing
Parsed:
0, "01", "2.3", "ABC"
2
0, "456.2", "890"

Getting Boost.Spirit to handle optional elements [duplicate]

I'm attempting to parse a string of whitespace-delimited, optionally-tagged keywords. For example
descr:expense type:receivable customer 27.3
where the expression before the colon is the tag, and it is optional (i.e. a default tag is assumed).
I can't quite get the parser to do what I want. I've made some minor adaptations from a canonical example whose purpose it is to parse key/value pairs (much like an HTTP query string).
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct field_value_sequence_default_field
: qi::grammar<Iterator, pairs_type()>
{
field_value_sequence_default_field()
: field_value_sequence_default_field::base_type(query)
{
query = pair >> *(qi::lit(' ') >> pair);
pair = -(field >> ':') >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\\.");
}
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
However, when I parse it, when the tag is left out, the optional<string> isn't empty/false. Rather, it's got a copy of the value. The second part of the pair has the value as well.
If the untagged keyword can't be a tag (syntax rules, e.g. has a decimal point), then things work like I'd expect.
What am I doing wrong? Is this a conceptual mistake with the PEG?
Rather, it's got a copy of the value. The second part of the pair has the value as well.
This is the common pitfall with container attributes and backtracking: use qi::hold, e.g. Understanding Boost.spirit's string parser
pair = -qi::hold[field >> ':'] >> value;
Complete sample Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/optional/optional_io.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct Grammar : qi::grammar<Iterator, pairs_type()>
{
Grammar() : Grammar::base_type(query) {
query = pair % ' ';
pair = -qi::hold[field >> ':'] >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\\.");
}
private:
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
int main()
{
using It = std::string::const_iterator;
for (std::string const input : {
"descr:expense type:receivable customer 27.3",
"expense type:receivable customer 27.3",
"descr:expense receivable customer 27.3",
"expense receivable customer 27.3",
}) {
It f = input.begin(), l = input.end();
std::cout << "==== '" << input << "' =============\n";
pairs_type data;
if (qi::parse(f, l, Grammar<It>(), data)) {
std::cout << "Parsed: \n";
for (auto& p : data) {
std::cout << p.first << "\t->'" << p.second << "'\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Printing
==== 'descr:expense type:receivable customer 27.3' =============
Parsed:
descr ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense type:receivable customer 27.3' =============
Parsed:
-- ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'descr:expense receivable customer 27.3' =============
Parsed:
descr ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense receivable customer 27.3' =============
Parsed:
-- ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'

reading out a stringstream and insert it in different vectores

(I'm sorry if I ask this question wrong, this is my first time I write in a forum)
When I started programming at my SFML - Game, I had a very old book, wich was very C-like (eg. recommendation of atoi();).
Now I got a new C++(including C++11) book, and I want to rewrite the old lines wih newer Code.
I saved the Tiles in a file stored like this:
[0-0,15-1|22,44] [0-1|0]
[4-0,10-1,3-1|0] [0-5,5-5|0]
That means:
[...] desribes a Tile
0-0 etc. is the xy position on the Texturesheet
22 etc. is the event that will be triggered.
the amount of events and sf::Vector2i shouldn't be set constantly.
The Tiles are separately taken out from another class, which manages the entire Tilemap.
Now my problem: I have no idea how i should push the numbers from the strinstream right in two vectores?
My code:
class Tile{
private:
std::deque<sf::Sprite> tile;
std::deque<int> event;
public:
Tile(sf::Texture& texture, std::deque<sf::Vector2i>&& ctor_texturerects, std::deque<int>&& ctor_events);//This one is working fine
Tile(sf::Texture& texture, std::stringstream&& ctor_stream/*reads the Tile*/){
std::deque<sf::Vector2i> temp_texturerects;
std::deque<int>temp_events;
/*TODO: filter the stringstream and push them into the containers*/
Tile::Tile(texture,std::move(temp_texturerect),std::move(temp_events));
}
I'd be also very happy if you could give me another solution, like changing sf::Vector2i to a better solution or giving me a better stream and class concept
Thanks in advance
Xeno Ceph
Edit:
I made a little workaround:
(I changed the inputstream to a normal string)
But the code doesn't look good
There mujst be an easier solution
Tile:: Tile(sf::Texture& texture, std::string&& ctor_string){
std::deque<sf::Vector2i> temp_texturerects;
std::deque<int> temp_events;
std::stringstream strstr;
for(int i=0; i<ctor_string.size(); ++i){
while(ctor_string[i]!='|'){
while(ctor_string[i] != ','){
strstr << ctor_string[i];
}
sf::Vector2i v2i;
strstr >> v2i.x >> v2i.y;
temp_texturerects.push_front(v2i);
strstr.str("");
}
while(ctor_string[i]!=']'){
while(ctor_string[i] != ','){
strstr << ctor_string[i];
}
int integer;
strstr >> integer;
temp_events.push_front(integer);
strstr.str("");
}
}
Tile::Tile(texture, std::move(temp_texturerects), std::move(temp_events));
}
Has anybody a better solution?
If I understand your question correctly, you have some strings of the form
[0-0,15-1|22,44] [0-1|0]
[4-0,10-1,3-1|0] [0-5,5-5|0]
and you want to extract 2 types of data - positions (e.g. 0-0) and events (e.g. 22).
Your question is how to extract this data cleanly, discarding the [ and ] characters, etc.
One great way to approach this is to use the getline function that operates on stringstreams, which inherit from std::istream (http://www.cplusplus.com/reference/string/string/getline/). It can take custom delimiters, not just the newline character. So you can use '[', '|' and ']' as different delimiting characters and parse them in a logical order.
For example, since your string is just a collection of tiles, you can split it up into a number of functions - ParseTile, ParsePositions and ParseEvents, something like the following:
void Tile::ParseInput(stringstream&& ctor_string) {
//extract input, tile by tile
while(!ctor_string.eof()) {
string tile;
//you can treat each tile as though it is on a separate line by
//specifying the ']' character as the delimiter for the "line"
getline(ctor_string, tile, ']');
tile += "]"; //add back the ']' character that was discarded from the getline
//the string "tile" should now contain a single tile [...], which we can further process using ParseTile
ParseTile(tile);
}
}
The ParseTile function:
void Tile::ParseTile(string&& tile) {
//input string tile is e.g. " [0-0, 15-1|22,44]"
string discard; //string to hold parts of tile string that should be thrown away
string positions; //string to hold list of positions, separated by ','
string events; //string to hold events, separated by ','
//create stringstream from input
stringstream tilestream(tile);
//tilestream is e.g. "[0-0,15-1|22,44]"
getline(tilestream, discard, '['); //gets everything until '['
//now, discard is " [" and tilestream is "0-0,15-1|22,44]"
getline(tilestream, positions, '|');
//now, positions is "0-0,15-1" and tilestream is "22,44]"
getline(tilestream, events,']');
//now, events is "22,44" and tilestream is empty
ParsePositions(positions);
ParseEvents(events);
}
You can write your own ParsePositions and ParseEvents functions which basically will be a more getline calls using a ',' as the delimiting character (just loop until the string ends).
I suggest either writing a proper parser manually (not unlike the other answer proposes) or to use a proper parsing framework, like Boost Spirit.
The advantages of the latter is that you get debuggability, composability, attributes etc. "for free". Here's the simplest example I could think of:
struct TileData
{
std::deque<sf::Vector2i> texturerects;
std::deque<int> events;
};
typedef std::deque<TileData> TileDatas;
template <typename It>
struct parser : qi::grammar<It, TileDatas(), qi::space_type>
{
parser() : parser::base_type(start)
{
using namespace qi;
v2i = (int_ >> '-' >> int_)
[ _val = phx::construct<sf::Vector2i>(_1, _2) ];
tiledata =
(v2i % ',') >> '|' >>
(int_ % ',');
start = *('[' >> tiledata >> ']');
}
private:
qi::rule<It, sf::Vector2i(), qi::space_type> v2i;
qi::rule<It, TileData(), qi::space_type> tiledata;
qi::rule<It, TileDatas(), qi::space_type> start;
};
Adding a bit of code to test this, see it live on http://liveworkspace.org/code/3WM0My$1, output:
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 0), v2i(15, 1), }
events: deque<i> {22, 44, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 1), }
events: deque<i> {0, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(4, 0), v2i(10, 1), v2i(3, 1), }
events: deque<i> {0, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 5), v2i(5, 5), }
events: deque<i> {0, }
}
Full code:
#define BOOST_SPIRIT_USE_PHOENIX_V3
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
// liveworkspace.org doesn't have SFML
namespace sf { struct Vector2i { int x, y; Vector2i(int ax=0, int ay=0) : x(ax), y(ay) {} }; }
struct TileData
{
std::deque<sf::Vector2i> texturerects;
std::deque<int> events;
};
BOOST_FUSION_ADAPT_STRUCT(TileData,
(std::deque<sf::Vector2i>, texturerects)
(std::deque<int>, events))
typedef std::deque<TileData> TileDatas;
template <typename It>
struct parser : qi::grammar<It, TileDatas(), qi::space_type>
{
parser() : parser::base_type(start)
{
using namespace qi;
v2i = (int_ >> '-' >> int_)
[ _val = phx::construct<sf::Vector2i>(_1, _2) ];
tiledata =
(v2i % ',') >> '|' >>
(int_ % ',');
start = *('[' >> tiledata >> ']');
}
private:
qi::rule<It, sf::Vector2i(), qi::space_type> v2i;
qi::rule<It, TileData(), qi::space_type> tiledata;
qi::rule<It, TileDatas(), qi::space_type> start;
};
typedef boost::spirit::istream_iterator It;
std::ostream& operator<<(std::ostream& os, sf::Vector2i const &v) { return os << "v2i(" << v.x << ", " << v.y << ")"; }
template <typename T> std::ostream& operator<<(std::ostream& os, std::deque<T> const &d) {
os << "deque<" << typeid(T).name() << "> {";
for (auto& t : d) os << t << ", ";
return os << "}";
}
std::ostream& operator<<(std::ostream& os, TileData const& ttd) {
return os << "TileData {\n"
"\ttexturerects: " << ttd.texturerects << "\n"
"\tevents: " << ttd.events << "\n}";
}
int main()
{
parser<It> p;
std::istringstream iss(
"[0-0,15-1|22,44] [0-1|0]\n"
"[4-0,10-1,3-1|0] [0-5,5-5|0]");
It f(iss), l;
TileDatas data;
if (qi::phrase_parse(f,l,p,qi::space,data))
{
for (auto& tile : data)
{
std::cout << "Parsed: " << tile << "\n";
}
}
if (f != l)
{
std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
}
}

Spirit Qi sequence parsing issues

I have some issues with parser writing with Spirit::Qi 2.4.
I have a series of key-value pairs to parse in following format <key name>=<value>.
Key name can be [a-zA-Z0-9] and is always followed by = sign with no white-space between key name and = sign. Key name is also always preceded by at least one space.
Value can be almost any C expression (spaces are possible as well), with the exception of the expressions containing = char and code blocks { }.
At the end of the sequence of the key value pairs there's a { sign.
I struggle a lot with writing parser for this expression. Since the key name always is preceded by at least one space and followed by = and contains no spaces I defined it as
KeyName %= [+char_("a-zA-Z0-9_") >> lit("=")] ;
Value can be almost anything, but it can not contain = nor { chars, so I defined it as:
Value %= +(char_ - char_("{=")) ;
I thought about using look-ahead's like this to catch the value:
ValueExpression
%= (
Value
>> *space
>> &(KeyName | lit("{"))
)
;
But it won't work, for some reason (seems like the ValueExpression greedily goes up to the = sign and "doesn't know" what to do from there). I have limited knowledge of LL parsers, so I'm not really sure what's cooking here. Is there any other way I could tackle this kind of sequence?
Here's example series:
EXP1=FunctionCall(A, B, C) TEST="Example String" \
AnotherArg=__FILENAME__ - 'BlahBlah' EXP2= a+ b+* {
Additional info: since this is a part of a much larger grammar I can't really solve this problem any other way than by a Spirit.Qi parser (like splitting by '=' and doing some custom parsing or something similar).
Edit:
I've created minimum working example here: http://ideone.com/kgYD8
(compiled under VS 2012 with boost 1.50, but should be fine on older setups as well).
I'd suggest you have a look at the article Parsing a List of Key-Value Pairs Using Spirit.Qi.
I've greatly simplified your code, while
adding attribute handling
removing phoenix semantic actions
debugging of rules
Here it is, without further ado:
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
typedef std::map<std::string, std::string> data_t;
template <typename It, typename Skipper>
struct grammar : qi::grammar<It, data_t(), Skipper>
{
grammar() : grammar::base_type(Sequence)
{
using namespace qi;
KeyName = +char_("a-zA-Z0-9_") >> '=';
Value = qi::no_skip [+(~char_("={") - KeyName)];
Sequence = +(KeyName > Value);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, data_t(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName; // no skipper, removes need for qi::lexeme
qi::rule<It, std::string(), Skipper> Value;
};
template <typename Iterator>
data_t parse (Iterator begin, Iterator end)
{
grammar<Iterator, qi::space_type> p;
data_t data;
if (qi::phrase_parse(begin, end, p, qi::space, data)) {
std::cout << "parse ok\n";
if (begin!=end) {
std::cout << "remaining: " << std::string(begin,end) << '\n';
}
} else {
std::cout << "failed: " << std::string(begin,end) << '\n';
}
return data;
}
int main ()
{
std::string test(" ARG=Test still in first ARG ARG2=Zombie cat EXP2=FunctionCall(A, B C) {" );
auto data = parse(test.begin(), test.end());
for (auto& e : data)
std::cout << e.first << "=" << e.second << '\n';
}
Output will be:
parse ok
remaining: {
ARG=Test still in first ARG
ARG2=Zombie cat
EXP2=FunctionCall(A, B C)
If you really wanted '{' to be part of the last value, change this line:
Value = qi::no_skip [+(char_ - KeyName)];

Passing file-path string to semantic action in Boost.Spirit

I am new to Boost.Spirit, and I have a question related to a mini-interpreter I am trying to implement using the library. As a sub-task of parsing my language, I need to extract a file-path from an input of the form:
"path = \"/path/to/file\""
and pass it as a string (without quotes) to a semantic action.
I wrote some code which can parse this type of input, but passing the parsed string does not work as expected, probably due to my lack of experience with Boost.Spirit.
Can anyone help?
In reality, my grammar is more complex, but I have isolated the problem to:
#include <string>
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/phoenix_core.hpp"
#include "boost/spirit/include/phoenix_operator.hpp"
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
namespace parser {
// Semantic action (note: in reality, this would use file_path_string in non-trivial way)
void display_path(std::string file_path_string) {
std::cout << "Detected file-path: " << file_path_string << std::endl;
}
// Grammar
template <typename Iterator>
struct path_command : qi::grammar<Iterator, ascii::space_type> {
path_command() : path_command::base_type(path_specifier) {
using qi::string;
using qi::lit;
path = +(qi::char_("/") >> *qi::char_("a-zA-Z_0-9"));
quoted_path_string = lit('"') >> (path- lit('"')) >> lit('"');
path_specifier = lit("path") >> qi::lit("=")
>> quoted_path_string[&display_path];
}
qi::rule<Iterator, ascii::space_type> path_specifier;
qi::rule<Iterator, std::string()> path, quoted_path_string;
};
}
int main() {
using ascii::space;
typedef std::string::const_iterator iterator_type;
typedef parser::path_command<iterator_type> path_command;
bool parse_res;
path_command command_instance; // Instance of our Grammar
iterator_type iter, end;
std::string test_command1 = "path = \"/file1\"";
std::string test_command2 = "path = \"/dirname1/dirname2/file2\"";
// Testing example command 1
iter = test_command1.begin();
end = test_command1.end();
parse_res = phrase_parse(iter, end, command_instance, space);
std::cout << "Parse result for test 1: " << parse_res << std::endl;
// Testing example command 2
iter = test_command2.begin();
end = test_command2.end();
parse_res = phrase_parse(iter, end, command_instance, space);
std::cout << "Parse result for test 2: " << parse_res << std::endl;
return EXIT_SUCCESS;
}
The output is:
Detected file-path: /
Parse result for test 1: 1
Detected file-path: ///
Parse result for test 2: 1
but I would like to obtain:
Detected file-path: /file1
Parse result for test 1: 1
Detected file-path: /dirname1/dirname2/file2
Parse result for test 2: 1
Almost everything is fine with your parser. The problem is a bug in Spirit (upto Boost V1.46) preventing the correct handling of the attribute in cases like this. This has been recently fixed in SVN and will be available in Boost V1.47 (I tried running your unchanged program with this version and everything works just fine).
For now, you can work around this problem by utilizing the raw[] directive (see below).
I said 'almost' above, because you can a) simplify what you have, b) you should use no_skip[] to avoid invoking the skip parser in between the qutoes.
path = raw[+(qi::char_("/") >> *qi::char_("a-zA-Z_0-9"))];
quoted_path_string = no_skip['"' >> path >> '"'];
path_specifier = lit("path") >> qi::lit("=")
>> quoted_path_string[&display_path];
You can omit the - lit('"') part because your path parser does not recognize quotes in the first place.