I have some issues with parser writing with Spirit::Qi 2.4.
I have a series of key-value pairs to parse in following format <key name>=<value>.
Key name can be [a-zA-Z0-9] and is always followed by = sign with no white-space between key name and = sign. Key name is also always preceded by at least one space.
Value can be almost any C expression (spaces are possible as well), with the exception of the expressions containing = char and code blocks { }.
At the end of the sequence of the key value pairs there's a { sign.
I struggle a lot with writing parser for this expression. Since the key name always is preceded by at least one space and followed by = and contains no spaces I defined it as
KeyName %= [+char_("a-zA-Z0-9_") >> lit("=")] ;
Value can be almost anything, but it can not contain = nor { chars, so I defined it as:
Value %= +(char_ - char_("{=")) ;
I thought about using look-ahead's like this to catch the value:
ValueExpression
%= (
Value
>> *space
>> &(KeyName | lit("{"))
)
;
But it won't work, for some reason (seems like the ValueExpression greedily goes up to the = sign and "doesn't know" what to do from there). I have limited knowledge of LL parsers, so I'm not really sure what's cooking here. Is there any other way I could tackle this kind of sequence?
Here's example series:
EXP1=FunctionCall(A, B, C) TEST="Example String" \
AnotherArg=__FILENAME__ - 'BlahBlah' EXP2= a+ b+* {
Additional info: since this is a part of a much larger grammar I can't really solve this problem any other way than by a Spirit.Qi parser (like splitting by '=' and doing some custom parsing or something similar).
Edit:
I've created minimum working example here: http://ideone.com/kgYD8
(compiled under VS 2012 with boost 1.50, but should be fine on older setups as well).
I'd suggest you have a look at the article Parsing a List of Key-Value Pairs Using Spirit.Qi.
I've greatly simplified your code, while
adding attribute handling
removing phoenix semantic actions
debugging of rules
Here it is, without further ado:
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
typedef std::map<std::string, std::string> data_t;
template <typename It, typename Skipper>
struct grammar : qi::grammar<It, data_t(), Skipper>
{
grammar() : grammar::base_type(Sequence)
{
using namespace qi;
KeyName = +char_("a-zA-Z0-9_") >> '=';
Value = qi::no_skip [+(~char_("={") - KeyName)];
Sequence = +(KeyName > Value);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, data_t(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName; // no skipper, removes need for qi::lexeme
qi::rule<It, std::string(), Skipper> Value;
};
template <typename Iterator>
data_t parse (Iterator begin, Iterator end)
{
grammar<Iterator, qi::space_type> p;
data_t data;
if (qi::phrase_parse(begin, end, p, qi::space, data)) {
std::cout << "parse ok\n";
if (begin!=end) {
std::cout << "remaining: " << std::string(begin,end) << '\n';
}
} else {
std::cout << "failed: " << std::string(begin,end) << '\n';
}
return data;
}
int main ()
{
std::string test(" ARG=Test still in first ARG ARG2=Zombie cat EXP2=FunctionCall(A, B C) {" );
auto data = parse(test.begin(), test.end());
for (auto& e : data)
std::cout << e.first << "=" << e.second << '\n';
}
Output will be:
parse ok
remaining: {
ARG=Test still in first ARG
ARG2=Zombie cat
EXP2=FunctionCall(A, B C)
If you really wanted '{' to be part of the last value, change this line:
Value = qi::no_skip [+(char_ - KeyName)];
Related
I'm trying to create a (pretty simple) parser using boost::spirit::qi to extract messages from a stream. Each message starts from a short marker and ends with \r\n. The message body is ASCII text (letters and numbers) separated by a comma. For example:
!START,01,2.3,ABC\r\n
!START,456.2,890\r\n
I'm using unit tests to check the parser and everything works well when I pass only correct messages one by one. But when I try to emulate some invalid input, like:
!START,01,2.3,ABC\r\n
trash-message
!START,456.2,890\r\n
The parser doesn't see the following messages after an unexpected text.
I'm new in boost::spirit and I'd like to know how a parser based on boost::spirit::qi::grammar is supposed to work.
My question is:
Should the parser slide in the input stream and try to find a beginning of a message?
Or the caller should check the parsing result and in case of failure move an iterator and then recall the parser again?
Many thanks for considering my request.
My question is: Should the parser slide in the input stream and try to find a beginning of a message?
Only when you tell it to. It's called qi::parse, not qi::search. But obviously you can make a grammar ignore things.
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
struct Command {
enum Type { START, QUIT, TRASH } type = TRASH;
std::vector<std::string> args;
};
using Commands = std::vector<Command>;
BOOST_FUSION_ADAPT_STRUCT(Command, type, args)
template <typename It> struct CmdParser : qi::grammar<It, Commands()> {
CmdParser() : CmdParser::base_type(commands_) {
type_.add("!start", Command::START);
type_.add("!quit", Command::QUIT);
trash_ = *~qi::char_("\r\n"); // just ignore the entire line
arg_ = *~qi::char_(",\r\n");
command_ = qi::no_case[type_] >> *(',' >> arg_);
commands_ = *((command_ | trash_) >> +qi::eol);
BOOST_SPIRIT_DEBUG_NODES((trash_)(arg_)(command_)(commands_))
}
private:
qi::symbols<char, Command::Type> type_;
qi::rule<It, Commands()> commands_;
qi::rule<It, Command()> command_;
qi::rule<It, std::string()> arg_;
qi::rule<It> trash_;
};
int main() {
std::string_view input = "!START,01,2.3,ABC\r\n"
"trash-message\r\n"
"!START,456.2,890\r\n";
using It = std::string_view::const_iterator;
static CmdParser<It> const parser;
Commands parsed;
auto f = input.begin(), l = input.end();
if (parse(f, l, parser, parsed)) {
std::cout << "Parsed:\n";
for(Command const& cmd : parsed) {
std::cout << cmd.type;
for (auto& arg: cmd.args)
std::cout << ", " << quoted(arg);
std::cout << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << quoted(std::string(f, l)) << "\n";
}
Printing
Parsed:
0, "01", "2.3", "ABC"
2
0, "456.2", "890"
I'm attempting to parse a string of whitespace-delimited, optionally-tagged keywords. For example
descr:expense type:receivable customer 27.3
where the expression before the colon is the tag, and it is optional (i.e. a default tag is assumed).
I can't quite get the parser to do what I want. I've made some minor adaptations from a canonical example whose purpose it is to parse key/value pairs (much like an HTTP query string).
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct field_value_sequence_default_field
: qi::grammar<Iterator, pairs_type()>
{
field_value_sequence_default_field()
: field_value_sequence_default_field::base_type(query)
{
query = pair >> *(qi::lit(' ') >> pair);
pair = -(field >> ':') >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\\.");
}
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
However, when I parse it, when the tag is left out, the optional<string> isn't empty/false. Rather, it's got a copy of the value. The second part of the pair has the value as well.
If the untagged keyword can't be a tag (syntax rules, e.g. has a decimal point), then things work like I'd expect.
What am I doing wrong? Is this a conceptual mistake with the PEG?
Rather, it's got a copy of the value. The second part of the pair has the value as well.
This is the common pitfall with container attributes and backtracking: use qi::hold, e.g. Understanding Boost.spirit's string parser
pair = -qi::hold[field >> ':'] >> value;
Complete sample Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/optional/optional_io.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct Grammar : qi::grammar<Iterator, pairs_type()>
{
Grammar() : Grammar::base_type(query) {
query = pair % ' ';
pair = -qi::hold[field >> ':'] >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\\.");
}
private:
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
int main()
{
using It = std::string::const_iterator;
for (std::string const input : {
"descr:expense type:receivable customer 27.3",
"expense type:receivable customer 27.3",
"descr:expense receivable customer 27.3",
"expense receivable customer 27.3",
}) {
It f = input.begin(), l = input.end();
std::cout << "==== '" << input << "' =============\n";
pairs_type data;
if (qi::parse(f, l, Grammar<It>(), data)) {
std::cout << "Parsed: \n";
for (auto& p : data) {
std::cout << p.first << "\t->'" << p.second << "'\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Printing
==== 'descr:expense type:receivable customer 27.3' =============
Parsed:
descr ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense type:receivable customer 27.3' =============
Parsed:
-- ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'descr:expense receivable customer 27.3' =============
Parsed:
descr ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense receivable customer 27.3' =============
Parsed:
-- ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'
I'm using boost::spirit lex and qi to parse some source code.
I already skip white spaces from the input string using the lexer. What I would like to do is to switch skipping the comments depending on the context in the parser.
Here is a basic demo. See the comments in Grammar::Grammar() for my problem:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef lex::lexertl::token<char const*, boost::mpl::vector<std::string>, boost::mpl::false_ > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
struct TokenId
{
enum type
{
INVALID_TOKEN_ID = lex::min_token_id,
COMMENT
};
};
struct Lexer : lex::lexer<lexer_type>
{
public:
lex::token_def<std::string> comment;
lex::token_def<std::string> identifier;
lex::token_def<std::string> lineFeed;
lex::token_def<std::string> space;
Lexer()
{
comment = "\\/\\*.*?\\*\\/|\\/\\/[^\\r\\n]*";
identifier = "[A-Za-z_][A-Za-z0-9_]*";
space = "[\\x20\\t\\f\\v]+";
lineFeed = "(\\r\\n)|\\r|\\n";
this->self = space[lex::_pass = lex::pass_flags::pass_ignore];
this->self += lineFeed[lex::_pass = lex::pass_flags::pass_ignore];
this->self.add
(comment, TokenId::COMMENT)
(identifier)
(';')
;
}
};
typedef Lexer::iterator_type Iterator;
void traceComment(const std::string& content)
{
std::cout << " comment: " << content << std::endl;
}
class Grammar : public qi::grammar<Iterator>
{
typedef token_type skipped_t;
qi::rule<Iterator, qi::unused_type, qi::unused_type> m_start;
qi::rule<Iterator, qi::unused_type, qi::unused_type, skipped_t> m_variable;
qi::rule<Iterator, std::string(), qi::unused_type> m_comment;
public:
Lexer lx;
public:
Grammar() :
Grammar::base_type(m_start)
{
// This does not work (comments are not skipped in m_variable)
m_start = *(
m_comment[phx::bind(&traceComment, qi::_1)]
| qi::skip(qi::token(TokenId::COMMENT))[m_variable]
);
m_variable = lx.identifier >> lx.identifier >> ';';
m_comment = qi::token(TokenId::COMMENT);
/** But this works:
m_start = *(
m_comment[phx::bind(&traceComment, qi::_1)]
| m_variable
);
m_variable = qi::skip(qi::token(TokenId::COMMENT))[lx.identifier >> lx.identifier >> ';'];
m_comment = qi::token(TokenId::COMMENT);
*/
}
};
void test(const char* code)
{
std::cout << code << std::endl;
Grammar parser;
const char* begin = code;
const char* end = code + strlen(code);
tokenize_and_parse(begin, end, parser.lx, parser);
if (begin == end)
std::cout << "-- OK --" << std::endl;
else
std::cout << "-- FAILED --" << std::endl;
std::cout << std::endl;
}
int main(int argc, char* argv[])
{
test("/* kept */ int foo;");
test("int /* ignored */ foo;");
test("int foo /* ignored */;");
test("int foo; // kept");
}
The output is:
/* kept */ int foo;
comment: /* kept */
-- OK --
int /* ignored */ foo;
-- FAILED --
int foo /* ignored */;
-- FAILED --
int foo; // kept
comment: // kept
-- OK --
Is there any issue with skipped_t?
The behavior you are describing is what I would expect from my experience.
When you write
my_rule = qi::skip(ws) [ foo >> lit(',') >> bar >> lit('=') >> baz ];
this is essentially the same as writing
my_rule = *ws >> foo >> *ws >> lit(',') >> *ws >> bar >> *ws >> lit('=') >> *ws >> baz;
(assuming that ws is rule with no attribute. If it has an attribute in your grammar, that attribute is ignored, as if using qi::omit.)
Notably, the skipper does not get propogated inside of the foo rule. So foo, bar, and baz can still be whitespace-sensitive in the above. What the skip directive is doing is causing the grammar not to care about leading whitespace in this rule, or whitespace around the ',' and '=' in this rule.
More info here: http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/
Edit:
Also, I don't think the skipped_t is doing what you think it is there.
When you use a custom skipper, most straightforwardly you specify an actual instance of a parser as the skip parser for that rule. When you use a type instead of an object e.g. qi::skip(qi::blank_type), that is a shorthand, where the tag-type qi::blank_type has been linked via prior template declarations to the type qi::blank, and qi knows that when it sees qi::blank_type in certain places that it should instantiate a qi::blank parser object.
I don't see any evidence that you've actually set up that machinery, you've just typedef'ed skipped_t to token_type. What you should do if you want this to work that way (if it's even possible, I don't know) is read about qi customization points and instead declare qi::skipped_t as an empty struct which is linked via some template boiler plate to the rule m_comment, which is presumably what you actually want to be skipping. (If you skip all tokens of all types, then you can't possibly match anything so that wouldn't make sense, so I'm not sure what your intention was with making token_type the skipper.)
My guess is that when qi saw that typedef token_type in your parameter list, that it either ignored it or interprets it as part of the return value of the rule or something like this, not sure exactly what it would do.
I am attempting to get a qi::rule<> to emit a struct with BOOST_FUSION_ADAPT_STRUCT based on the boost employee example.
I have the following struct and its associated fusion macro:
struct LineOnCommand
{
int lineNum;
std::vector<char> humpType;
};
BOOST_FUSION_ADAPT_STRUCT(
LineOnCommand,
(int, lineNum)
(std::vector<char>, humpType)
)
The associated parsing rules are:
qi::rule<Iterator, std::vector<char> ascii::space_type> humpIdentifer = qi::lit("BH") | qi::lit("DH");
qi::rule<Iterator, LineOnCommand(), ascii::space_type> Cmd_LNON = qi::int_ >> -humpIdentifier >> qi::lit("LNON");
I then have a compound rule, of which all others (including this simple test case) are a part which is passed to the parser:
qi::rule<Iterator, qi::unused_type, ascii::space_type> commands =
+( /* other rules | */ Cmd_LNON /*| other rules */);
bool success = qi::phrase_parse(StartIterator, EndIterator, commands, ascii::space);
The problem comes when I attempt to compile, and I get the error:
<boostsource>/spirit/home/qi/detail/assign_to.hpp(152): error: no suitable constructor exists to convert form "const int" to "LineOnCommand"
attr = static_cast<Attribute>(val);
Clearly I'm doing something wrong, but I'm not sure what. If I understand the way spirit works, the 2nd argument to the template of the rule represents the attribute (i.e. the data type emitted by the rule), and the BOOST_FUSION_ADAPT_STRUCT macro will adapt my struct so that boost knows how to convert a stream that is "int, std::vector" to it.
The only difference between what I'm doing here and the boost employee example is that I'm not using an explicit grammar to do the parsing. My understanding is this is not necessary, and that a rule by itself is sufficient.
What am I doing wrong?
I'm not sure. I think I'm missing the problem. Perhaps, I "naturally" sidestep the problem because your sample is not self-contained.
So, here's my take on it: See it Live On Coliru, in the hope that just comparing things helps you:
I fixed the obvious typos in your rule declaration
I suggested something other than qi::unused_type; if there's no attribute, there's no need to state it; beyond the iterator type, the template arguments to qi::rule and qi::grammar are not positional. So
qi::rule<It, qi::unused_type(), ascii::space_type> r;
qi::rule<It, ascii::space_type, qi::unused_type()> r;
qi::rule<It, ascii::space_type> r;
are all /logically/ equivalent.
Full listing:
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
struct LineOnCommand
{
int lineNum;
std::vector<char> humpType;
};
BOOST_FUSION_ADAPT_STRUCT(
LineOnCommand,
(int, lineNum)
(std::vector<char>, humpType)
)
template <typename It, typename Skipper = ascii::space_type>
struct parser : qi::grammar<It, std::vector<LineOnCommand>(), Skipper>
{
parser() : parser::base_type(commands)
{
using namespace qi;
humpIdentifier = string("BH") | string("DH");
Cmd_LNON = int_ >> -humpIdentifier >> "LNON";
commands = +( /* other rules | */ Cmd_LNON /*| other rules */ );
}
private:
qi::rule<It, std::vector<char>(), Skipper> humpIdentifier;
qi::rule<It, LineOnCommand(), Skipper> Cmd_LNON;
qi::rule<It, std::vector<LineOnCommand>(), Skipper> commands;
};
int main()
{
typedef std::string::const_iterator Iterator;
parser<Iterator> p;
std::string const input =
"123 BH LNON\n"
"124 LNON\t\t\t"
"125 DH LNON\n"
"126 INVALID LNON";
auto f(input.begin()), l(input.end());
std::vector<LineOnCommand> data;
bool success = qi::phrase_parse(f, l, p, ascii::space, data);
std::cout << "success:" << std::boolalpha << success << ", "
<< "elements: " << data.size() << "\n";
if (success)
{
for (auto& el : data)
{
std::cout << "Item: " << el.lineNum << ", humpType '" << std::string(el.humpType.begin(), el.humpType.end()) << "'\n";
}
}
if (f!=l)
std::cout << "Trailing unparsed: '" << std::string(f,l) << "'\n";
return success? 0 : 1;
}
Output:
success:true, elements: 3
Item: 123, humpType 'BH'
Item: 124, humpType ''
Item: 125, humpType 'DH'
Trailing unparsed: '126 INVALID LNON'
I'm attempting to create a Boost::Spirit grammar class that can read a fairly simple grammar.
start = roster;
roster = *student;
student = int >> string;
The goal of the code is create a tree of command objects based on an input file that is being parsed. The Iterator that this grammar is being created with is the given spirit file iterator.
Basically, what I am having trouble doing is moving and using the synthesized attributes of each rule. What I need to to create a tree of objects based on this data, and the only functions to create said objects require the parent object to be known at that time. I'm using the command pattern to delay the creation until I have parsed all data and can correctly build the tree. The way I have implemented this so far is my commands all contain a vector of other commands. When a command is executed, it requires only the parent object, and will create and attach the child object accordingly. Then the object will execute each of the commands in it's own vector, passing itself as the parent. This creates the tree structure I need with the data in tact.
The Issue:
The Issue I am having is how to build the commands when the data is parsed, and how to load them into the appropriate vector. I've tried 3 different ways so far.
I tried to alter the attribute of each rule to an std::vector and parse the attributes in as commands one at a time. The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with.
I tried using boost::phoenix placehold _val as a surrogate for the command being created. I was proud of this solution and a bit upset that it didn't work. I overloaded the += operator for all commands so that when A and B are both commands, A += B pushed B into A's command vector. _val isn't a Command so the compiler didn't like this. I couldn't seem to tinker anything into a more workable status. If at all possible, this was the cleanest solution and I would love for this to be able to work.
The code in it's current form has me attempting to bind the actions together. If I were to have a member function pointer to _val and pass it the created command It would push it back. Again _val isn't actually a Command so that didn't work out.
I'm going to post this wall of code, it's the grammar I've written cleaned up a bit, as well as the point where it is invoked.
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, qi::space_type, T3_Command()>
{
//roster_grammar constructor
roster_grammar() :
roster_grammar::base_type(start_)
{
using qi::lit;
using qi::int_;
using qi::char_;
using qi::lexeme;
start_ = student[boost::bind(&T3_Command::add_command, qi::_val, _1)];
//I removed the roster for the time being to simplify the grammar
//it still containes my second solution that I detailed above. This
//would be my ideal solution if it could work this way.
//roster = *(student[qi::_val += _1]);
student =
qi::eps [ boost::bind(&T3_Command::set_identity, qi::_val, "Student") ]
>>
int_age [ boost::bind(&T3_Command::add_command, qi::_val, _1) ]
>>
string_name [ boost::bind(&T3_Command::add_command, qi::_val, _1) ];
int_age =
int_ [ boost::bind(&Command_Factory::create_int_comm, &cmd_creator, "Age", _1) ];
string_name =
string_p [ boost::bind(&Command_Factory::create_string_comm, &cmd_creator, "Name", _1) ];
//The string parser. Returns type std::string
string_p %= +qi::alnum;
}
qi::rule<Iterator, qi::space_type, T3_Model_Command()> roster;
qi::rule<Iterator, qi::space_type, T3_Atom_Command()> student;
qi::rule<Iterator, qi::space_type, T3_Int_Command()> int_age;
qi::rule<Iterator, qi::space_type, T3_String_Command()> string_name;
qi::rule<Iterator, qi::space_type, T3_Command()> start_;
qi::rule<Iterator, std::string()> string_p;
Command_Factory cmd_creator;
};
This is how the grammar is being instantiated and used.
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::ifstream in = std::ifstream(path);
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;
using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);
So long story short, I have a grammar that I want to build commands out of (call T3_Command). Commands have a std:Vector data member that holds other commands beneath it in the tree.
What I need is a clean way to create a Command as a semantic action, I need to be able to load that into the vector of other commands (By way of attributes or just straight function calls). Commands have a type that is supposed to be specified at creation (will define the type of tree node it makes) and some commands have a data value (an int, string or float, all named value in their respective commands).
Or If there might be a better way to build a tree, I'd be open to suggestion.
Thank you so much for any help you're able to give!
EDIT:
I'll try to be more clear about the original problem I'm trying to solve. Thanks for the patience.
Given that grammar (or any grammar actually) I want to be able to parse through it and create a command tree based on the semantic actions taken within the parser.
So using my sample grammar, and the input
"23 Bryan 45 Tyler 4 Stephen"
I would like the final tree to result in the following data structure.
Command with type = "Roster" holding 3 "Student" type commands.
Command with type = "Student" each holding an Int_Command and a String_Command
Int_Command holds the stored integer and String_Command the stored string.
E.g.
r1 - Roster - [s1][s2][s3]
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
This is the current structure of the commands I've written (The implementation is all trivial).
class T3_Command
{
public:
T3_Command(void);
T3_Command(const std::string &type);
~T3_Command(void);
//Executes this command and all subsequent commands in the command vector.
void Execute(/*const Folder_in parent AND const Model_in parent*/);
//Pushes the passed T3_Command into the Command Vector
//#param comm - The command to be pushed.
void add_command(const T3_Command &comm);
//Sets the Identity of the command.
//#param ID - the new identity to be set.
void set_identity(std::string &ID);
private:
const std::string ident;
std::vector <T3_Command> command_vec;
T3_Command& operator+=(const T3_Command& rhs);
};
#pragma once
#include "T3_command.h"
class T3_Int_Command :
public T3_Command
{
public:
T3_Int_Command();
T3_Int_Command(const std::string &type, const int val);
~T3_Int_Command(void);
void Execute();
void setValue(int val);
private:
int value;
};
So the problem I am having is I would like to be able to create a data structure of various commands that represent the parse tree as spirit parses through it.
Updated in response to the edited question
Though there's still a lot of information missing (see my [new comment]), at least now you showed some input and output :)
So, without further ado, let me interpret those:
you still want to just parse (int, string) pairs, but per line
use qi::blank_type as a skipper
do roster % eol to parse roster lines
my sample parses into a vector of Rosters (one per line)
each roster contains a variable number of Students:
start = roster % eol;
roster = +student;
student = int_ >> string_p;
Note: Rule #1 Don't complicate your parser unless you really have to
you want to output the individual elements ("commands"?!?) - I'm assuming the part where this would be non-trivial is the part where the same Student might appear in several rosters?
By defining a total ordering on Students:
bool operator<(Student const& other) const {
return boost::tie(i,s) < boost::tie(other.i, other.s);
}
you make it possible to store a unique collection of students in e.g. a std::set<Student>
perhaps generating the 'variable names' (I mean r1, s1, s2...) is part of the task as well. So, to establish a unique 'variable name' with each student I create a bi-directional map of Students (after parsing, see Rule #1: don't complicate the parser unless it's absolutely necessary):
boost::bimap<std::string, Student> student_vars;
auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
for(Roster const& r: data)
for(Student const& s: r.students)
student_vars.insert({generate_id(), s});
That's about everything I can think of here. I used c++11 and boost liberally here to save on lines-of-code, but writing this without c++11/boost would be fairly trivial too. C++03 version online now
The following sample input:
ParsedT3Data const data = parseData(
"23 Bryan 45 Tyler 4 Stephen\n"
"7 Mary 45 Tyler 8 Stephane\n"
"23 Bryan 8 Stephane");
Results in (See it Live On Coliru):
parse success
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
s4 - Student - [int 7][string Mary]
s5 - Student - [int 8][string Stephane]
r1 [s1][s2][s3]
r2 [s4][s2][s5]
r3 [s1][s5]
Full code:
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <boost/bimap.hpp>
namespace qi = boost::spirit::qi;
struct Student
{
int i;
std::string s;
bool operator<(Student const& other) const {
return boost::tie(i,s) < boost::tie(other.i, other.s);
}
friend std::ostream& operator<<(std::ostream& os, Student const& o) {
return os << "Student - [int " << o.i << "][string " << o.s << "]";
}
};
struct Roster
{
std::vector<Student> students;
};
BOOST_FUSION_ADAPT_STRUCT(Student, (int, i)(std::string, s))
BOOST_FUSION_ADAPT_STRUCT(Roster, (std::vector<Student>, students))
typedef std::vector<Roster> ParsedT3Data;
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, ParsedT3Data(), qi::blank_type>
{
roster_grammar() :
roster_grammar::base_type(start)
{
using namespace qi;
start = roster % eol;
roster = eps >> +student; // known workaround
student = int_ >> string_p;
string_p = lexeme[+(graph)];
BOOST_SPIRIT_DEBUG_NODES((start)(roster)(student)(string_p))
}
qi::rule <Iterator, ParsedT3Data(), qi::blank_type> start;
qi::rule <Iterator, Roster(), qi::blank_type> roster;
qi::rule <Iterator, Student(), qi::blank_type> student;
qi::rule <Iterator, std::string()> string_p;
};
ParsedT3Data parseData(std::string const& demoData)
{
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::istringstream iss(demoData);
iss.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(iss), end;
ParsedT3Data result;
bool r = phrase_parse(begin, end, my_parser, qi::blank, result);
if (r)
std::cout << "parse (partial) success\n";
else
std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
if (begin!=end)
std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";
if (!r)
throw "TODO error handling";
return result;
}
int main()
{
ParsedT3Data const data = parseData(
"23 Bryan 45 Tyler 4 Stephen\n"
"7 Mary 45 Tyler 8 Stephane\n"
"23 Bryan 8 Stephane");
// now produce that list of stuff :)
boost::bimap<std::string, Student> student_vars;
auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
for(Roster const& r: data)
for(Student const& s: r.students)
student_vars.insert({generate_id(), s});
for(auto const& s: student_vars.left)
std::cout << s.first << " - " << s.second << "\n";
int r_id = 1;
for(Roster const& r: data)
{
std::cout << "r" << (r_id++) << " ";
for(Student const& s: r.students)
std::cout << "[" << student_vars.right.at(s) << "]";
std::cout << "\n";
}
}
OLD ANSWER
I'll respond to individual points, while awaiting more information:
1. "The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with"
A solution here would be
boost::vector<> which allows incomplete element types at time of instantiation (Boost Containers have several other nifty properties, go read about them!)
boost::variant with recursive_wrapper<> so you can indeed make logical trees. I have many answers in the boost-spirit and boost-spirit-qi tags that show this approach (e.g. for expression trees).
2. Calling factory methods from semantic actions
I have a few minor hints:
you can use qi::_1, qi::_2... to refer to the elements of a compound attribute
you should prefer using phoenix::bind inside Phoenix actors (semantic actions are Phoenix actors)
you can assign to qi::_pass to indicate parser failure
Here's a simplified version of the grammar, which shows these in action. I haven't actually built a tree, since you didn't describe any of the desired behaviour. Instead, I just print a debug line on adding nodes to the tree.
See it Live on Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <fstream>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
struct T3_Command
{
bool add_command(int i, std::string const& s)
{
std::cout << "adding command [" << i << ", " << s << "]\n";
return i != 42; // just to show how you can do input validation
}
};
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, T3_Command(), qi::space_type>
{
roster_grammar() :
roster_grammar::base_type(start_)
{
start_ = *(qi::int_ >> string_p)
[qi::_pass = phx::bind(&T3_Command::add_command, qi::_val, qi::_1, qi::_2)];
string_p = qi::lexeme[+(qi::graph)];
}
qi::rule <Iterator, T3_Command(), qi::space_type> start_;
qi::rule <Iterator, std::string()> string_p;
};
int main()
{
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::ifstream in("input.txt");
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;
using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);
if (r)
std::cout << "parse (partial) success\n";
else
std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
if (begin!=end)
std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";
return r?0:255;
}
Input:
1 klaas-jan
2 doeke-jan
3 jan-herbert
4 taeke-jan
42 oops-invalid-number
5 not-parsed
Output:
adding command [1, klaas-jan]
adding command [2, doeke-jan]
adding command [3, jan-herbert]
adding command [4, taeke-jan]
adding command [42, oops-invalid-number]
parse success
trailing unparsed: '42 oops-invalid-number
5 not-parsed
'