I am new to Boost.Spirit, and I have a question related to a mini-interpreter I am trying to implement using the library. As a sub-task of parsing my language, I need to extract a file-path from an input of the form:
"path = \"/path/to/file\""
and pass it as a string (without quotes) to a semantic action.
I wrote some code which can parse this type of input, but passing the parsed string does not work as expected, probably due to my lack of experience with Boost.Spirit.
Can anyone help?
In reality, my grammar is more complex, but I have isolated the problem to:
#include <string>
#include "boost/spirit/include/qi.hpp"
#include "boost/spirit/include/phoenix_core.hpp"
#include "boost/spirit/include/phoenix_operator.hpp"
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
namespace parser {
// Semantic action (note: in reality, this would use file_path_string in non-trivial way)
void display_path(std::string file_path_string) {
std::cout << "Detected file-path: " << file_path_string << std::endl;
}
// Grammar
template <typename Iterator>
struct path_command : qi::grammar<Iterator, ascii::space_type> {
path_command() : path_command::base_type(path_specifier) {
using qi::string;
using qi::lit;
path = +(qi::char_("/") >> *qi::char_("a-zA-Z_0-9"));
quoted_path_string = lit('"') >> (path- lit('"')) >> lit('"');
path_specifier = lit("path") >> qi::lit("=")
>> quoted_path_string[&display_path];
}
qi::rule<Iterator, ascii::space_type> path_specifier;
qi::rule<Iterator, std::string()> path, quoted_path_string;
};
}
int main() {
using ascii::space;
typedef std::string::const_iterator iterator_type;
typedef parser::path_command<iterator_type> path_command;
bool parse_res;
path_command command_instance; // Instance of our Grammar
iterator_type iter, end;
std::string test_command1 = "path = \"/file1\"";
std::string test_command2 = "path = \"/dirname1/dirname2/file2\"";
// Testing example command 1
iter = test_command1.begin();
end = test_command1.end();
parse_res = phrase_parse(iter, end, command_instance, space);
std::cout << "Parse result for test 1: " << parse_res << std::endl;
// Testing example command 2
iter = test_command2.begin();
end = test_command2.end();
parse_res = phrase_parse(iter, end, command_instance, space);
std::cout << "Parse result for test 2: " << parse_res << std::endl;
return EXIT_SUCCESS;
}
The output is:
Detected file-path: /
Parse result for test 1: 1
Detected file-path: ///
Parse result for test 2: 1
but I would like to obtain:
Detected file-path: /file1
Parse result for test 1: 1
Detected file-path: /dirname1/dirname2/file2
Parse result for test 2: 1
Almost everything is fine with your parser. The problem is a bug in Spirit (upto Boost V1.46) preventing the correct handling of the attribute in cases like this. This has been recently fixed in SVN and will be available in Boost V1.47 (I tried running your unchanged program with this version and everything works just fine).
For now, you can work around this problem by utilizing the raw[] directive (see below).
I said 'almost' above, because you can a) simplify what you have, b) you should use no_skip[] to avoid invoking the skip parser in between the qutoes.
path = raw[+(qi::char_("/") >> *qi::char_("a-zA-Z_0-9"))];
quoted_path_string = no_skip['"' >> path >> '"'];
path_specifier = lit("path") >> qi::lit("=")
>> quoted_path_string[&display_path];
You can omit the - lit('"') part because your path parser does not recognize quotes in the first place.
Related
I'm trying to create a (pretty simple) parser using boost::spirit::qi to extract messages from a stream. Each message starts from a short marker and ends with \r\n. The message body is ASCII text (letters and numbers) separated by a comma. For example:
!START,01,2.3,ABC\r\n
!START,456.2,890\r\n
I'm using unit tests to check the parser and everything works well when I pass only correct messages one by one. But when I try to emulate some invalid input, like:
!START,01,2.3,ABC\r\n
trash-message
!START,456.2,890\r\n
The parser doesn't see the following messages after an unexpected text.
I'm new in boost::spirit and I'd like to know how a parser based on boost::spirit::qi::grammar is supposed to work.
My question is:
Should the parser slide in the input stream and try to find a beginning of a message?
Or the caller should check the parsing result and in case of failure move an iterator and then recall the parser again?
Many thanks for considering my request.
My question is: Should the parser slide in the input stream and try to find a beginning of a message?
Only when you tell it to. It's called qi::parse, not qi::search. But obviously you can make a grammar ignore things.
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
struct Command {
enum Type { START, QUIT, TRASH } type = TRASH;
std::vector<std::string> args;
};
using Commands = std::vector<Command>;
BOOST_FUSION_ADAPT_STRUCT(Command, type, args)
template <typename It> struct CmdParser : qi::grammar<It, Commands()> {
CmdParser() : CmdParser::base_type(commands_) {
type_.add("!start", Command::START);
type_.add("!quit", Command::QUIT);
trash_ = *~qi::char_("\r\n"); // just ignore the entire line
arg_ = *~qi::char_(",\r\n");
command_ = qi::no_case[type_] >> *(',' >> arg_);
commands_ = *((command_ | trash_) >> +qi::eol);
BOOST_SPIRIT_DEBUG_NODES((trash_)(arg_)(command_)(commands_))
}
private:
qi::symbols<char, Command::Type> type_;
qi::rule<It, Commands()> commands_;
qi::rule<It, Command()> command_;
qi::rule<It, std::string()> arg_;
qi::rule<It> trash_;
};
int main() {
std::string_view input = "!START,01,2.3,ABC\r\n"
"trash-message\r\n"
"!START,456.2,890\r\n";
using It = std::string_view::const_iterator;
static CmdParser<It> const parser;
Commands parsed;
auto f = input.begin(), l = input.end();
if (parse(f, l, parser, parsed)) {
std::cout << "Parsed:\n";
for(Command const& cmd : parsed) {
std::cout << cmd.type;
for (auto& arg: cmd.args)
std::cout << ", " << quoted(arg);
std::cout << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << quoted(std::string(f, l)) << "\n";
}
Printing
Parsed:
0, "01", "2.3", "ABC"
2
0, "456.2", "890"
I want to parse header columns of a text file. The column names should be allowed to be quoted and any case of letters. Currently I am using the following grammar:
#include <string>
#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename Iterator, typename Skipper>
struct Grammar : qi::grammar<Iterator, void(), Skipper>
{
static constexpr char colsep = '|';
Grammar() : Grammar::base_type(header)
{
using namespace qi;
using ascii::char_;
#define COL(name) (no_case[name] | ('"' >> no_case[name] >> '"'))
header = (COL("columna") | COL("column_a")) >> colsep >>
(COL("columnb") | COL("column_b")) >> colsep >>
(COL("columnc") | COL("column_c")) >> eol >> eoi;
#undef COL
}
qi::rule<Iterator, void(), Skipper> header;
};
int main()
{
const std::string s{"columnA|column_B|column_c\n"};
auto begin(std::begin(s)), end(std::end(s));
Grammar<std::string::const_iterator, qi::blank_type> p;
bool ok = qi::phrase_parse(begin, end, p, qi::blank);
if (ok && begin == end)
std::cout << "Header ok" << std::endl;
else if (ok && begin != end)
std::cout << "Remaining unparsed: '" << std::string(begin, end) << "'" << std::endl;
else
std::cout << "Parse failed" << std::endl;
return 0;
}
Is this possible without the use of a macro? Further I would like to ignore any underscores at all. Can this be achieved with a custom skipper? In the end it would be ideal if one could write:
header = col("columna") >> colsep >> col("columnb") >> colsep >> column("columnc") >> eol >> eoi;
where col would be an appropriate grammar or rule.
#sehe how can I fix this grammar to support "\"Column_A\"" as well? 6 hours ago
By this time you should probably have realized that there's two different things going on here.
Separate Yo Concerns
On the one hand you have a grammar (that allows |-separated columns like columna or "Column_A").
On the other hand you have semantic analysis (the phase where you check that the parsed contents match certain criteria).
The thing that is making your life hard is trying to conflate the two. Now, don't get me wrong, there could be (very rare) circumstances where fusing those responsibilities together is absolutely required - but I feel that would always be an optimization. If you need that, Spirit is not your thing, and you're much more likely to be served with a handwritten parser.
Parsing
So let's get brain-dead simple about the grammar:
static auto headers = (quoted|bare) % '|' > (eol|eoi);
The bare and quoted rules can be pretty much the same as before:
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
As you can see this will implicitly take care of quoting and escaping as well whitespace skipping outside lexemes. When applied simply, it will result in a clean list of column names:
std::string const s = "\"columnA\"|column_B| column_c \n";
std::vector<std::string> headers;
bool ok = phrase_parse(begin(s), end(s), Grammar::headers, x3::blank, headers);
std::cout << "Parse " << (ok?"ok":"invalid") << std::endl;
if (ok) for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
Prints Live On Coliru
Parse ok
"columnA"
"column_B"
"column_c"
INTERMEZZO: Coding Style
Let's structure our code so that the separation of concerns is reflected. Our parsing code might use X3, but our validation code doesn't need to be in the same translation unit (cpp file).
Have a header defining some basic types:
#include <string>
#include <vector>
using Header = std::string;
using Headers = std::vector<Header>;
Define the operations we want to perform on them:
Headers parse_headers(std::string const& input);
bool header_match(Header const& actual, Header const& expected);
bool headers_match(Headers const& actual, Headers const& expected);
Now, main can be rewritten as just:
auto headers = parse_headers("\"columnA\"|column_B| column_c \n");
for(auto& col : headers) {
std::cout << std::quoted(col) << "\n";
}
bool valid = headers_match(headers, {"columna","columnb","columnc"});
std::cout << "Validation " << (valid?"passed":"failed") << "\n";
And e.g. a parse_headers.cpp could contain:
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace Grammar {
using namespace x3;
static auto quoted = lexeme['"' >> *('\\' >> char_ | "\"\"" >> attr('"') | ~char_('"')) >> '"'];
static auto bare = *(graph - '|');
static auto headers = (quoted|bare) % '|' > (eol|eoi);
}
Headers parse_headers(std::string const& input) {
Headers output;
if (phrase_parse(begin(input), end(input), Grammar::headers, x3::blank, output))
return output;
return {}; // or throw, if you prefer
}
Validating
This is what is known as "semantic checks". You take the vector of strings and check them according to your logic:
#include <boost/range/adaptors.hpp>
#include <boost/algorithm/string.hpp>
bool header_match(Header const& actual, Header const& expected) {
using namespace boost::adaptors;
auto significant = [](unsigned char ch) {
return ch != '_' && std::isgraph(ch);
};
return boost::algorithm::iequals(actual | filtered(significant), expected);
}
bool headers_match(Headers const& actual, Headers const& expected) {
return boost::equal(actual, expected, header_match);
}
That's all. All the power of algorithms and modern C++ at your disposal, no need to fight with constraints due to parsing context.
Full Demo
The above, Live On Wandbox
Both parts got significantly simpler:
your parser doesn't have to deal with quirky comparison logic
your comparison logic doesn't have to deal with grammar concerns (quotes, escapes, delimiters and whitespace)
I'm attempting to create a Boost::Spirit grammar class that can read a fairly simple grammar.
start = roster;
roster = *student;
student = int >> string;
The goal of the code is create a tree of command objects based on an input file that is being parsed. The Iterator that this grammar is being created with is the given spirit file iterator.
Basically, what I am having trouble doing is moving and using the synthesized attributes of each rule. What I need to to create a tree of objects based on this data, and the only functions to create said objects require the parent object to be known at that time. I'm using the command pattern to delay the creation until I have parsed all data and can correctly build the tree. The way I have implemented this so far is my commands all contain a vector of other commands. When a command is executed, it requires only the parent object, and will create and attach the child object accordingly. Then the object will execute each of the commands in it's own vector, passing itself as the parent. This creates the tree structure I need with the data in tact.
The Issue:
The Issue I am having is how to build the commands when the data is parsed, and how to load them into the appropriate vector. I've tried 3 different ways so far.
I tried to alter the attribute of each rule to an std::vector and parse the attributes in as commands one at a time. The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with.
I tried using boost::phoenix placehold _val as a surrogate for the command being created. I was proud of this solution and a bit upset that it didn't work. I overloaded the += operator for all commands so that when A and B are both commands, A += B pushed B into A's command vector. _val isn't a Command so the compiler didn't like this. I couldn't seem to tinker anything into a more workable status. If at all possible, this was the cleanest solution and I would love for this to be able to work.
The code in it's current form has me attempting to bind the actions together. If I were to have a member function pointer to _val and pass it the created command It would push it back. Again _val isn't actually a Command so that didn't work out.
I'm going to post this wall of code, it's the grammar I've written cleaned up a bit, as well as the point where it is invoked.
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, qi::space_type, T3_Command()>
{
//roster_grammar constructor
roster_grammar() :
roster_grammar::base_type(start_)
{
using qi::lit;
using qi::int_;
using qi::char_;
using qi::lexeme;
start_ = student[boost::bind(&T3_Command::add_command, qi::_val, _1)];
//I removed the roster for the time being to simplify the grammar
//it still containes my second solution that I detailed above. This
//would be my ideal solution if it could work this way.
//roster = *(student[qi::_val += _1]);
student =
qi::eps [ boost::bind(&T3_Command::set_identity, qi::_val, "Student") ]
>>
int_age [ boost::bind(&T3_Command::add_command, qi::_val, _1) ]
>>
string_name [ boost::bind(&T3_Command::add_command, qi::_val, _1) ];
int_age =
int_ [ boost::bind(&Command_Factory::create_int_comm, &cmd_creator, "Age", _1) ];
string_name =
string_p [ boost::bind(&Command_Factory::create_string_comm, &cmd_creator, "Name", _1) ];
//The string parser. Returns type std::string
string_p %= +qi::alnum;
}
qi::rule<Iterator, qi::space_type, T3_Model_Command()> roster;
qi::rule<Iterator, qi::space_type, T3_Atom_Command()> student;
qi::rule<Iterator, qi::space_type, T3_Int_Command()> int_age;
qi::rule<Iterator, qi::space_type, T3_String_Command()> string_name;
qi::rule<Iterator, qi::space_type, T3_Command()> start_;
qi::rule<Iterator, std::string()> string_p;
Command_Factory cmd_creator;
};
This is how the grammar is being instantiated and used.
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::ifstream in = std::ifstream(path);
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;
using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);
So long story short, I have a grammar that I want to build commands out of (call T3_Command). Commands have a std:Vector data member that holds other commands beneath it in the tree.
What I need is a clean way to create a Command as a semantic action, I need to be able to load that into the vector of other commands (By way of attributes or just straight function calls). Commands have a type that is supposed to be specified at creation (will define the type of tree node it makes) and some commands have a data value (an int, string or float, all named value in their respective commands).
Or If there might be a better way to build a tree, I'd be open to suggestion.
Thank you so much for any help you're able to give!
EDIT:
I'll try to be more clear about the original problem I'm trying to solve. Thanks for the patience.
Given that grammar (or any grammar actually) I want to be able to parse through it and create a command tree based on the semantic actions taken within the parser.
So using my sample grammar, and the input
"23 Bryan 45 Tyler 4 Stephen"
I would like the final tree to result in the following data structure.
Command with type = "Roster" holding 3 "Student" type commands.
Command with type = "Student" each holding an Int_Command and a String_Command
Int_Command holds the stored integer and String_Command the stored string.
E.g.
r1 - Roster - [s1][s2][s3]
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
This is the current structure of the commands I've written (The implementation is all trivial).
class T3_Command
{
public:
T3_Command(void);
T3_Command(const std::string &type);
~T3_Command(void);
//Executes this command and all subsequent commands in the command vector.
void Execute(/*const Folder_in parent AND const Model_in parent*/);
//Pushes the passed T3_Command into the Command Vector
//#param comm - The command to be pushed.
void add_command(const T3_Command &comm);
//Sets the Identity of the command.
//#param ID - the new identity to be set.
void set_identity(std::string &ID);
private:
const std::string ident;
std::vector <T3_Command> command_vec;
T3_Command& operator+=(const T3_Command& rhs);
};
#pragma once
#include "T3_command.h"
class T3_Int_Command :
public T3_Command
{
public:
T3_Int_Command();
T3_Int_Command(const std::string &type, const int val);
~T3_Int_Command(void);
void Execute();
void setValue(int val);
private:
int value;
};
So the problem I am having is I would like to be able to create a data structure of various commands that represent the parse tree as spirit parses through it.
Updated in response to the edited question
Though there's still a lot of information missing (see my [new comment]), at least now you showed some input and output :)
So, without further ado, let me interpret those:
you still want to just parse (int, string) pairs, but per line
use qi::blank_type as a skipper
do roster % eol to parse roster lines
my sample parses into a vector of Rosters (one per line)
each roster contains a variable number of Students:
start = roster % eol;
roster = +student;
student = int_ >> string_p;
Note: Rule #1 Don't complicate your parser unless you really have to
you want to output the individual elements ("commands"?!?) - I'm assuming the part where this would be non-trivial is the part where the same Student might appear in several rosters?
By defining a total ordering on Students:
bool operator<(Student const& other) const {
return boost::tie(i,s) < boost::tie(other.i, other.s);
}
you make it possible to store a unique collection of students in e.g. a std::set<Student>
perhaps generating the 'variable names' (I mean r1, s1, s2...) is part of the task as well. So, to establish a unique 'variable name' with each student I create a bi-directional map of Students (after parsing, see Rule #1: don't complicate the parser unless it's absolutely necessary):
boost::bimap<std::string, Student> student_vars;
auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
for(Roster const& r: data)
for(Student const& s: r.students)
student_vars.insert({generate_id(), s});
That's about everything I can think of here. I used c++11 and boost liberally here to save on lines-of-code, but writing this without c++11/boost would be fairly trivial too. C++03 version online now
The following sample input:
ParsedT3Data const data = parseData(
"23 Bryan 45 Tyler 4 Stephen\n"
"7 Mary 45 Tyler 8 Stephane\n"
"23 Bryan 8 Stephane");
Results in (See it Live On Coliru):
parse success
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
s4 - Student - [int 7][string Mary]
s5 - Student - [int 8][string Stephane]
r1 [s1][s2][s3]
r2 [s4][s2][s5]
r3 [s1][s5]
Full code:
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <boost/bimap.hpp>
namespace qi = boost::spirit::qi;
struct Student
{
int i;
std::string s;
bool operator<(Student const& other) const {
return boost::tie(i,s) < boost::tie(other.i, other.s);
}
friend std::ostream& operator<<(std::ostream& os, Student const& o) {
return os << "Student - [int " << o.i << "][string " << o.s << "]";
}
};
struct Roster
{
std::vector<Student> students;
};
BOOST_FUSION_ADAPT_STRUCT(Student, (int, i)(std::string, s))
BOOST_FUSION_ADAPT_STRUCT(Roster, (std::vector<Student>, students))
typedef std::vector<Roster> ParsedT3Data;
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, ParsedT3Data(), qi::blank_type>
{
roster_grammar() :
roster_grammar::base_type(start)
{
using namespace qi;
start = roster % eol;
roster = eps >> +student; // known workaround
student = int_ >> string_p;
string_p = lexeme[+(graph)];
BOOST_SPIRIT_DEBUG_NODES((start)(roster)(student)(string_p))
}
qi::rule <Iterator, ParsedT3Data(), qi::blank_type> start;
qi::rule <Iterator, Roster(), qi::blank_type> roster;
qi::rule <Iterator, Student(), qi::blank_type> student;
qi::rule <Iterator, std::string()> string_p;
};
ParsedT3Data parseData(std::string const& demoData)
{
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::istringstream iss(demoData);
iss.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(iss), end;
ParsedT3Data result;
bool r = phrase_parse(begin, end, my_parser, qi::blank, result);
if (r)
std::cout << "parse (partial) success\n";
else
std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
if (begin!=end)
std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";
if (!r)
throw "TODO error handling";
return result;
}
int main()
{
ParsedT3Data const data = parseData(
"23 Bryan 45 Tyler 4 Stephen\n"
"7 Mary 45 Tyler 8 Stephane\n"
"23 Bryan 8 Stephane");
// now produce that list of stuff :)
boost::bimap<std::string, Student> student_vars;
auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
for(Roster const& r: data)
for(Student const& s: r.students)
student_vars.insert({generate_id(), s});
for(auto const& s: student_vars.left)
std::cout << s.first << " - " << s.second << "\n";
int r_id = 1;
for(Roster const& r: data)
{
std::cout << "r" << (r_id++) << " ";
for(Student const& s: r.students)
std::cout << "[" << student_vars.right.at(s) << "]";
std::cout << "\n";
}
}
OLD ANSWER
I'll respond to individual points, while awaiting more information:
1. "The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with"
A solution here would be
boost::vector<> which allows incomplete element types at time of instantiation (Boost Containers have several other nifty properties, go read about them!)
boost::variant with recursive_wrapper<> so you can indeed make logical trees. I have many answers in the boost-spirit and boost-spirit-qi tags that show this approach (e.g. for expression trees).
2. Calling factory methods from semantic actions
I have a few minor hints:
you can use qi::_1, qi::_2... to refer to the elements of a compound attribute
you should prefer using phoenix::bind inside Phoenix actors (semantic actions are Phoenix actors)
you can assign to qi::_pass to indicate parser failure
Here's a simplified version of the grammar, which shows these in action. I haven't actually built a tree, since you didn't describe any of the desired behaviour. Instead, I just print a debug line on adding nodes to the tree.
See it Live on Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <fstream>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
struct T3_Command
{
bool add_command(int i, std::string const& s)
{
std::cout << "adding command [" << i << ", " << s << "]\n";
return i != 42; // just to show how you can do input validation
}
};
template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, T3_Command(), qi::space_type>
{
roster_grammar() :
roster_grammar::base_type(start_)
{
start_ = *(qi::int_ >> string_p)
[qi::_pass = phx::bind(&T3_Command::add_command, qi::_val, qi::_1, qi::_2)];
string_p = qi::lexeme[+(qi::graph)];
}
qi::rule <Iterator, T3_Command(), qi::space_type> start_;
qi::rule <Iterator, std::string()> string_p;
};
int main()
{
typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;
//open the target file and wrap istream into the iterator
std::ifstream in("input.txt");
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;
using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);
if (r)
std::cout << "parse (partial) success\n";
else
std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
if (begin!=end)
std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";
return r?0:255;
}
Input:
1 klaas-jan
2 doeke-jan
3 jan-herbert
4 taeke-jan
42 oops-invalid-number
5 not-parsed
Output:
adding command [1, klaas-jan]
adding command [2, doeke-jan]
adding command [3, jan-herbert]
adding command [4, taeke-jan]
adding command [42, oops-invalid-number]
parse success
trailing unparsed: '42 oops-invalid-number
5 not-parsed
'
I have some issues with parser writing with Spirit::Qi 2.4.
I have a series of key-value pairs to parse in following format <key name>=<value>.
Key name can be [a-zA-Z0-9] and is always followed by = sign with no white-space between key name and = sign. Key name is also always preceded by at least one space.
Value can be almost any C expression (spaces are possible as well), with the exception of the expressions containing = char and code blocks { }.
At the end of the sequence of the key value pairs there's a { sign.
I struggle a lot with writing parser for this expression. Since the key name always is preceded by at least one space and followed by = and contains no spaces I defined it as
KeyName %= [+char_("a-zA-Z0-9_") >> lit("=")] ;
Value can be almost anything, but it can not contain = nor { chars, so I defined it as:
Value %= +(char_ - char_("{=")) ;
I thought about using look-ahead's like this to catch the value:
ValueExpression
%= (
Value
>> *space
>> &(KeyName | lit("{"))
)
;
But it won't work, for some reason (seems like the ValueExpression greedily goes up to the = sign and "doesn't know" what to do from there). I have limited knowledge of LL parsers, so I'm not really sure what's cooking here. Is there any other way I could tackle this kind of sequence?
Here's example series:
EXP1=FunctionCall(A, B, C) TEST="Example String" \
AnotherArg=__FILENAME__ - 'BlahBlah' EXP2= a+ b+* {
Additional info: since this is a part of a much larger grammar I can't really solve this problem any other way than by a Spirit.Qi parser (like splitting by '=' and doing some custom parsing or something similar).
Edit:
I've created minimum working example here: http://ideone.com/kgYD8
(compiled under VS 2012 with boost 1.50, but should be fine on older setups as well).
I'd suggest you have a look at the article Parsing a List of Key-Value Pairs Using Spirit.Qi.
I've greatly simplified your code, while
adding attribute handling
removing phoenix semantic actions
debugging of rules
Here it is, without further ado:
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
typedef std::map<std::string, std::string> data_t;
template <typename It, typename Skipper>
struct grammar : qi::grammar<It, data_t(), Skipper>
{
grammar() : grammar::base_type(Sequence)
{
using namespace qi;
KeyName = +char_("a-zA-Z0-9_") >> '=';
Value = qi::no_skip [+(~char_("={") - KeyName)];
Sequence = +(KeyName > Value);
BOOST_SPIRIT_DEBUG_NODE(KeyName);
BOOST_SPIRIT_DEBUG_NODE(Value);
BOOST_SPIRIT_DEBUG_NODE(Sequence);
}
private:
qi::rule<It, data_t(), Skipper> Sequence;
qi::rule<It, std::string()> KeyName; // no skipper, removes need for qi::lexeme
qi::rule<It, std::string(), Skipper> Value;
};
template <typename Iterator>
data_t parse (Iterator begin, Iterator end)
{
grammar<Iterator, qi::space_type> p;
data_t data;
if (qi::phrase_parse(begin, end, p, qi::space, data)) {
std::cout << "parse ok\n";
if (begin!=end) {
std::cout << "remaining: " << std::string(begin,end) << '\n';
}
} else {
std::cout << "failed: " << std::string(begin,end) << '\n';
}
return data;
}
int main ()
{
std::string test(" ARG=Test still in first ARG ARG2=Zombie cat EXP2=FunctionCall(A, B C) {" );
auto data = parse(test.begin(), test.end());
for (auto& e : data)
std::cout << e.first << "=" << e.second << '\n';
}
Output will be:
parse ok
remaining: {
ARG=Test still in first ARG
ARG2=Zombie cat
EXP2=FunctionCall(A, B C)
If you really wanted '{' to be part of the last value, change this line:
Value = qi::no_skip [+(char_ - KeyName)];
I would like to write a boost::spirit parser that parses a simple string in double quotes that uses escaped double quotes, e.g. "a \"b\" c".
Here is what I tried:
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator>
bool parse(Iterator first, Iterator last)
{
using qi::char_;
qi::rule< Iterator, std::string(), ascii::space_type > text;
qi::rule< Iterator, std::string() > content;
qi::rule< Iterator, char() > escChar;
text = '"' >> content >> '"';
content = +(~char_('"') | escChar);
escChar = '\\' >> char_("\"");
bool r = qi::phrase_parse(first, last, text, ascii::space);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
}
int main() {
std::string str = "\"a \\\"b\\\" c\"";
if (client::parse(str.begin(), str.end()))
std::cout << str << " Parses OK: " << std::endl;
else
std::cout << "Fail\n";
return 0;
}
It follows the example on Parsing escaped strings with boost spirit, but the output is "Fail". How can I get it to work?
Been a while since I had a go at spirit, but I think one of your rules is the wrong way round.
Try:
content = +(escChar | ~char_('"'))
instead of:
content = +(~char_('"') | escChar)
It is matching your \ using ~char('"') and therefore never gets round to checking if escChar matches. It then reads the next " as the end of the string and stops parsing.