I'm having trouble writing a Qi grammar which utilizes another Qi grammar. A similar question was asked here, but I'm also trying to use phoenix::construct and having compilation difficulties.
Here's a simplified version of what I'm trying to do. I realize that this example could probably be done easily using BOOST_FUSION_ADAPT_STRUCT, but my actual code deals with more complex object types so I'm hoping there's a way to accomplish this using semantic actions.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_container.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <cstdlib>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
// grammar for real numbers
template <typename Iterator>
struct Real : qi::grammar<Iterator, long double()>
{
qi::rule<Iterator, long double()> r;
Real() : Real::base_type(r)
{
r %= qi::long_double;
}
};
// grammar for complex numbers of the form a+bi
template <typename Iterator>
struct Complex : qi::grammar<Iterator, std::complex<long double>()>
{
qi::rule<Iterator, std::complex<long double>()> r;
Real<Iterator> real;
Complex() : Complex::base_type(r)
{
r = real [qi::_a = qi::_1] >> (qi::lit("+") | qi::lit("-"))
>> real [qi::_b = qi::_1] >> -qi::lit("*") >> qi::lit("i")
[
qi::_val = phx::construct<std::complex<long double> >(qi::_a, qi::_b)
];
}
};
int main()
{
// test real parsing
std::cout << "Parsing '3'" << std::endl;
std::string toParse = "3";
Real<std::string::iterator> real_parser;
long double real_val;
std::string::iterator beginIt = toParse.begin();
std::string::iterator endIt = toParse.end();
bool r = qi::parse(beginIt, endIt, real_parser, real_val);
if(r && beginIt == endIt)
std::cout << "Successful parse: " << real_val << std::endl;
else
std::cout << "Could not parse" << std::endl;
// test complex parsing
std::cout << "Parsing '3+4i'" << std::endl;
toParse = "3+4i";
Complex<std::string::iterator> complex_parser;
std::complex<long double> complex_val;
beginIt = toParse.begin();
endIt = toParse.end();
r = qi::parse(beginIt, endIt, complex_parser, complex_val);
if(r && beginIt == endIt)
std::cout << "Successful parse: " << real_val << std::endl;
else
std::cout << "Could not parse" << std::endl;
}
I'm able to parse a Complex using the phrase_parse approach demonstrated in Spirit's documentation, but I'd like to be able to easily integrate the Complex grammar into other parsers (an expression parser, for instance). Is there something I'm missing that would allow me to parse Real and Complex objects as distinct entities while still being able to effectively use them in other rules/grammars?
qi::_a and qi::_b represent the first and second local variables for a rule. These variables are only available if you add qi::locals<long double, long double> as a template parameter in the declaration of rule r (and in this case also to qi::grammar... since the start rule passed to the constructor of the grammar needs to be compatible with the grammar, ie have the same template parameters).
Below you can see another alternative without the need for the local variables:
// grammar for complex numbers of the form a+bi
template <typename Iterator>
struct Complex : qi::grammar<Iterator, std::complex<long double>()>
{
qi::rule<Iterator, std::complex<long double>()> r;
Real<Iterator> real;
Complex() : Complex::base_type(r)
{
r = (
real >> (qi::lit("+") | qi::lit("-"))
>> real >> -qi::lit("*") >> qi::lit("i")
)
[
qi::_val = phx::construct<std::complex<long double> >(qi::_1, qi::_2)
];
}
};
In this case the semantic action is attached to the whole parser sequence and we can get the attributes we need with the _N placeholders. Here, qi::_1 refers to the attribute matched by the first Real parser, and qi::_2 to the second one.
Using any of the alternatives we can then use those grammars normally:
//using complex_parser, real_parser, complex_val and real_val declared in your code
std::cout << "Parsing 'variable=3+4i-2'" << std::endl;
toParse = "variable=3+4i-2";
beginIt = toParse.begin();
endIt = toParse.end();
std::string identifier;
r = qi::parse(beginIt, endIt, *qi::char_("a-z") >> '=' >> complex_parser >> '-' >> real_parser, identifier, complex_val, real_val);
if(r && beginIt == endIt)
std::cout << "Successful parse: " << identifier << complex_val.real() << " " << complex_val.imag() << " " << real_val << std::endl;
else
std::cout << "Could not parse" << std::endl;
Related
I am currently implementing a parser which succeeds on the "strongest" match for spirit::qi. There are meaningful applications for such a thing. E.g matching references to either simple refs (eg "willy") or namespace qualified refs (eg. "willy::anton"). That's not my actual real world case but it is almost self-explanatory, I guess. At least it helped me to track down the issue.
I found a solution for that. It works perfectly, when the skipper parser is not involved (i.e. there is nothing to skip). It does not work as expected if there are areas which need skipping.
I believe, I tracked down the problem. It seems like under certain conditions spaces are actually not skipped allthough they should be.
Below is find a self-contained very working example. It loops over some rules and some input to provide enough information. If you run it with BOOST_SPIRIT_DEBUG enabled, you get in particular the output:
<qualifier>
<try> :: anton</try>
<fail/>
</qualifier>
I think, this one should not have failed. Am I right guessing so? Does anyone know a way to get around that? Or is it just my poor understanding of qi semantics? Thank you very much for your time. :)
My environment: MSVC 2015 latest, target win32 console
#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include<map>
#include <boost/spirit/include/qi.hpp>
typedef std::string::const_iterator iterator_type;
namespace qi = boost::spirit::qi;
using map_type = std::map<std::string, qi::rule<iterator_type, std::string()>&>;
namespace maxence { namespace parser {
template <typename Iterator>
struct ident : qi::grammar<Iterator, std::string()>
{
ident();
qi::rule<Iterator, std::string()>
id, id_raw;
qi::rule<Iterator, std::string()>
not_used,
qualifier,
qualified_id, simple_id,
id_reference, id_reference_final;
map_type rules = {
{ "E1", id },
{ "E2", id_raw}
};
};
template <typename Iterator>
// we actually don't need the start rule (see below)
ident<Iterator>::ident() : ident::base_type(not_used)
{
id_reference = (!simple_id >> qualified_id) | (!qualified_id >> simple_id);
id_reference_final = id_reference;
///////////////////////////////////////////////////
// standard simple id (not followed by
// delimiter "::")
simple_id = (qi::alpha | '_') >> *(qi::alnum | '_') >> !qi::lit("::");
///////////////////////////////////////////////////
// this is qualifier <- "::" simple_id
// I repeat the simple_id pattern here to make sure
// this demo has no "early match" issues
qualifier = qi::string("::") > (qi::alpha | '_') >> *(qi::alnum | '_');
///////////////////////////////////////////////////
// this is: qualified_id <- simple_id qualifier*
qualified_id = (qi::alpha | '_') >> *(qi::alnum | '_') >> +(qualifier) >> !qi::lit("::");
id = id_reference_final;
id_raw = qi::raw[id_reference_final];
BOOST_SPIRIT_DEBUG_NODES(
(id)
(id_raw)
(qualifier)
(qualified_id)
(simple_id)
(id_reference)
(id_reference_final)
)
}
}}
int main()
{
maxence::parser::ident<iterator_type> ident;
using ss_map_type = std::map<std::string, std::string>;
ss_map_type parser_input =
{
{ "Simple id (behaves ok)", "willy" },
{ "Qualified id (behaves ok)", "willy::anton" },
{ "Skipper involved (unexpected)", "willy :: anton" }
};
for (ss_map_type::const_iterator input = parser_input.begin(); input != parser_input.end(); input++) {
for (map_type::const_iterator example = ident.rules.begin(); example != ident.rules.end(); example++) {
std::string to_parse = input->second;
std::string result;
std::string parser_name = (example->second).name();
std::cout << "--------------------------------------------" << std::endl;
std::cout << "Description: " << input->first << std::endl;
std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
auto b(to_parse.begin()), e(to_parse.end());
// --- test for parser success
bool success = qi::phrase_parse(b, e, (example)->second, qi::space, result);
if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
else std::cout << " Parser failed. " << std::endl;
//--- test for EOI
if (b == e) {
std::cout << "EOI reached.";
if (success) std::cout << " The sun is shining brightly. :)";
} else {
std::cout << "Failure: EOI not reached. Remaining: [";
while (b != e) std::cout << *b++; std::cout << "]";
}
std::cout << std::endl << "--------------------------------------------" << std::endl;
}
}
return 0;
}
I have a working boost spirit parser and was thinking if it is possible to do iterative update of an abstract syntax tree with boost spirit?
I have a struct similar to:
struct ast;
typedef boost::variant< boost::recursive_wrapper<ast> > node;
struct ast
{
std::vector<int> value;
std::vector<node> children;
};
Which is being parsed by use of:
bool r = phrase_parse(begin, end, grammar, space, ast);
Would it be possible to do iterative update of abstract syntax tree with boost spirit? I have not found any documentation on this, but I was thinking if the parsers semantic actions could push_back on an already existing AST. Has anyone tried this?
This would allow for parsing like this:
bool r = phrase_parse(begin, end, grammar, space, ast); //initial parsing
//the second parse will be called at a later state given some event/timer/io/something
bool r = phrase_parse(begin, end, grammar, space, ast); //additional parsing which will update the already existing AST
How would you know which nodes to merge? Or would you always add ("graft") at the root level? In that case, why don't you just parse another and merge moving the elements into the existing ast?
ast& operator+=(ast&& other) {
std::move(other.value.begin(), other.value.end(), back_inserter(value));
std::move(other.children.begin(), other.children.end(), back_inserter(children));
return *this;
}
Demo Time
Let's devise the simplest grammar I can think of for this AST:
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
Note I didn't even make the ; optional. Oh well. Samples. Exercises for readers. ☡ You know the drill.
We implement the trivial function ast parse(It f, It l), and then we can simply merge the asts:
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
merged += parse(input.begin(), input.end());
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct ast;
//typedef boost::make_recursive_variant<boost::recursive_wrapper<ast> >::type node;
typedef boost::variant<boost::recursive_wrapper<ast> > node;
struct ast {
std::vector<int> value;
std::vector<node> children;
ast& operator+=(ast&& other) {
std::move(other.value.begin(), other.value.end(), back_inserter(value));
std::move(other.children.begin(), other.children.end(), back_inserter(children));
return *this;
}
};
BOOST_FUSION_ADAPT_STRUCT(ast,
(std::vector<int>,value)
(std::vector<node>,children)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, ast(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start));
}
private:
qi::rule<It, ast(), Skipper> start;
};
// for output:
static inline std::ostream& operator<<(std::ostream& os, ast const& v) {
using namespace karma;
rule<boost::spirit::ostream_iterator, ast()> r;
r = '{' << -(int_ % ',') << ';' << -((r|eps) % ',') << '}';
return os << format(r, v);
}
template <typename It> ast parse(It f, It l)
{
ast parsed;
static grammar<It> g;
bool ok = qi::phrase_parse(f,l,g,qi::space,parsed);
if (!ok || (f!=l)) {
std::cout << "Parse failure\n";
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
exit(255);
}
return parsed;
}
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
merged += parse(input.begin(), input.end());
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Of course, it prints:
merged + {1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}} --> {1,2,3;{4;{9,8;}},{5,6;}}
merged + {10,20,30;{40;{90, 80;}},{50,60;}} --> {1,2,3,10,20,30;{4;{9,8;}},{5,6;},{40;{90,80;}},{50,60;}}
UPDATE
In this - trivial - example, you can just bind the collections to the attributes in the parse call. The same thing will happen without the operator+= call needed to move the elements, because the rules are written to automatically append to the bound container attribute.
CAVEAT: A distinct disadvantage of modifying the target value in-place is what happens if parsing fails. In the version the merged value will then be "undefined" (has received partial information from the failed parse).
So if you want to parse inputs "atomically", the first, more explicit approach is a better fit.
So the following is a slightly shorter way to write the same:
Live On Coliru
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct ast;
//typedef boost::make_recursive_variant<boost::recursive_wrapper<ast> >::type node;
typedef boost::variant<boost::recursive_wrapper<ast> > node;
struct ast {
std::vector<int> value;
std::vector<node> children;
};
BOOST_FUSION_ADAPT_STRUCT(ast,
(std::vector<int>,value)
(std::vector<node>,children)
)
template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, ast(), Skipper>
{
grammar() : grammar::base_type(start) {
using namespace qi;
start = '{' >> -(int_ % ',') >> ';' >> -(start % ',') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start));
}
private:
qi::rule<It, ast(), Skipper> start;
};
// for output:
static inline std::ostream& operator<<(std::ostream& os, ast const& v) {
using namespace karma;
rule<boost::spirit::ostream_iterator, ast()> r;
r = '{' << -(int_ % ',') << ';' << -((r|eps) % ',') << '}';
return os << format(r, v);
}
template <typename It> void parse(It f, It l, ast& into)
{
static grammar<It> g;
bool ok = qi::phrase_parse(f,l,g,qi::space,into);
if (!ok || (f!=l)) {
std::cout << "Parse failure\n";
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
exit(255);
}
}
int main() {
ast merged;
for(std::string const& input : {
"{1 ,2 ,3 ;{4 ;{9 , 8 ;}},{5 ,6 ;}}",
"{10,20,30;{40;{90, 80;}},{50,60;}}",
})
{
parse(input.begin(), input.end(), merged);
std::cout << "merged + " << input << " --> " << merged << "\n";
}
}
Still prints
In my Boost Spirit grammar I would like to have a rule that does this:
rule<...> noCaseLit = no_case[ lit( "KEYWORD" ) ];
but for a custom keyword so that I can do this:
... >> noCaseLit( "SomeSpecialKeyword" ) >> ... >> noCaseLit( "OtherSpecialKeyword1" )
Is this possible with Boost Spirit rules and if so how?
P.S. I use the case insensitive thing as an example, what I'm after is rule parameterization in general.
Edits:
Through the link provided by 'sehe' in the comments I was able to come close to what I wanted but I'm not quite there yet.
/* Defining the noCaseLit rule */
rule<Iterator, string(string)> noCaseLit = no_case[lit(_r1)];
/* Using the noCaseLit rule */
rule<...> someRule = ... >> noCaseLit(phx::val("SomeSpecialKeyword")) >> ...
I haven't yet figured out a way to automatically convert the literal string to the Phoenix value so that I can use the rule like this:
rule<...> someRule = ... >> noCaseLit("SomeSpecialKeyword") >> ...
The easiest way is to simply create a function that returns your rule/parser. In the example near the end of this page you can find a way to declare the return value of your function. (The same here in a commented example).
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace ascii = boost::spirit::ascii;
namespace qi = boost::spirit::qi;
typedef boost::proto::result_of::deep_copy<
BOOST_TYPEOF(ascii::no_case[qi::lit(std::string())])
>::type nocaselit_return_type;
nocaselit_return_type nocaselit(const std::string& keyword)
{
return boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]);
}
//C++11 VERSION EASIER TO MODIFY (AND DOESN'T REQUIRE THE TYPEDEF)
//auto nocaselit(const std::string& keyword) -> decltype(boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]))
//{
// return boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]);
//}
int main()
{
std::string test1="MyKeYWoRD";
std::string::const_iterator iter=test1.begin();
std::string::const_iterator end=test1.end();
if(qi::parse(iter,end,nocaselit("mYkEywOrd"))&& (iter==end))
std::cout << "Parse 1 Successful" << std::endl;
else
std::cout << "Parse 2 Failed. Remaining: " << std::string(iter,end) << std::endl;
qi::rule<std::string::const_iterator,ascii::space_type> myrule =
*(
( nocaselit("double") >> ':' >> qi::double_ )
| ( nocaselit("keyword") >> '-' >> *(qi::char_ - '.') >> '.')
);
std::string test2=" DOUBLE : 3.5 KEYWORD-whatever.Double :2.5";
iter=test2.begin();
end=test2.end();
if(qi::phrase_parse(iter,end,myrule,ascii::space)&& (iter==end))
std::cout << "Parse 2 Successful" << std::endl;
else
std::cout << "Parse 2 Failed. Remaining: " << std::string(iter,end) << std::endl;
return 0;
}
template <typename Iterator>
struct parse_grammar
: qi::grammar<Iterator, std::string()>
{
parse_grammar()
: parse_grammar::base_type(start_p, "start_p"){
a_p = ',' > qi::double_;
b_p = *a_p;
start_p = qi::double_ > b_p >> qi::eoi;
}
qi::rule<Iterator, std::string()> a_p;
qi::rule<Iterator, std::string()> b_p;
qi::rule<Iterator, std::string()> start_p;
};
// implementation
std::vector<double> parse(std::istream& input, const std::string& filename)
{
// iterate over stream input
typedef std::istreambuf_iterator<char> base_iterator_type;
base_iterator_type in_begin(input);
// convert input iterator to forward iterator, usable by spirit parser
typedef boost::spirit::multi_pass<base_iterator_type> forward_iterator_type;
forward_iterator_type fwd_begin = boost::spirit::make_default_multi_pass(in_begin);
forward_iterator_type fwd_end;
// prepare output
std::vector<double> output;
// wrap forward iterator with position iterator, to record the position
typedef classic::position_iterator2<forward_iterator_type> pos_iterator_type;
pos_iterator_type position_begin(fwd_begin, fwd_end, filename);
pos_iterator_type position_end;
parse_grammar<pos_iterator_type> gram;
// parse
try
{
qi::phrase_parse(
position_begin, position_end, // iterators over input
gram, // recognize list of doubles
ascii::space); // comment skipper
}
catch(const qi::expectation_failure<pos_iterator_type>& e)
{
const classic::file_position_base<std::string>& pos = e.first.get_position();
std::stringstream msg;
msg <<
"parse error at file " << pos.file <<
" line " << pos.line << " column " << pos.column << std::endl <<
"'" << e.first.get_currentline() << "'" << std::endl <<
" " << "^- here";
throw std::runtime_error(msg.str());
}
// return result
return output;
}
I have this above sample code(Code used from boost-spirit website for example here).
In the grammar in the rule a_p I want to use semantic action and call a method and pass the iterator to it something as below:
a_p = ',' > qi::double_[boost::bind(&parse_grammar::doStuff(), this,
boost::ref(position_begin), boost::ref(position_end)];
and if the signature of the method doStuff is like this:
void doStuff(pos_iterator_type const& first, pos_iterator_type const& last);
Any ideas how to do this?
I do not mind any way(if I can do it using boost::phoenix or something not sure how) as long as to the method the iterators are passed with their current state.
I'm not completely sure why you think you 'need' what you describe. I'm afraid the solution to your actual task might be very simple:
start_p = qi::double_ % ',' > qi::eoi;
However, since the actual question is quite interesting, and the use of position interators in combination with istream_buf (rather than just the usual (slower) boost::spirit::istream_iterator) has it's merit, I'll show you how to do it with the semantic action as well.
For a simple (but rather complete) test main of
int main()
{
std::istringstream iss(
"1, -3.4 ,3.1415926\n"
",+inF,-NaN ,\n"
"2,-.4,4.14e7\n");
data_t parsed = parse(iss, "<inline-test>");
std::cout << "Done, parsed " << parsed.size() << " values ("
<< "min: " << *std::min_element(parsed.begin(), parsed.end()) << ", "
<< "max: " << *std::max_element(parsed.begin(), parsed.end()) << ")\n";
}
The output with the semantic action now becomes:
debug ('start_p') at <inline-test>:1:[1..2] '1' = 1
debug ('start_p') at <inline-test>:1:[4..8] '-3.4' = -3.4
debug ('start_p') at <inline-test>:1:[10..19] '3.1415926' = 3.14159
debug ('start_p') at <inline-test>:2:[2..6] '+inF' = inf
debug ('start_p') at <inline-test>:2:[7..11] '-NaN' = -nan
debug ('start_p') at <inline-test>:3:[1..2] '2' = 2
debug ('start_p') at <inline-test>:3:[3..6] '-.4' = -0.4
debug ('start_p') at <inline-test>:3:[7..13] '4.14e7' = 4.14e+07
Done, parsed 8 values (min: -3.4, max: inf)
See it live at http://liveworkspace.org/code/8a874ef3...
Note how it
demonstrates access to the name of the actual parser instance ('start_p')
demonstrates accces to the full source iterator range
shows how to do specialized processing inside the semantic action
I still suggest using qi::double_ to parse the raw input, because it is the only thing I know that easily handles all cases (see test data and this other question: Is it possible to read infinity or NaN values using input streams?)
demonstrates parsing the actual data into the vector efficiently by displaying statistics of the parsed values
Full Code
Here is the full code for future reference:
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_multi_pass.hpp>
#include <boost/spirit/include/classic_position_iterator.hpp>
#include <boost/phoenix/function/adapt_function.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
namespace classic = boost::spirit::classic;
namespace ascii = boost::spirit::ascii;
typedef std::vector<double> data_t;
///////// USING A FREE FUNCTION
//
template <typename Grammar, typename Range>
double doStuff_(Grammar &grammar, Range pos_range)
{
// for efficiency, cache adhoc grammar:
static const qi::rule <typename Range::iterator, double()> r_double = qi::double_;
static const qi::grammar<typename Range::iterator, double()> g_double(r_double); // caching just the rule may be enough, actually
double value = 0;
qi::parse(pos_range.begin(), pos_range.end(), g_double, value);
std::cout << "debug ('" << grammar.name() << "') at "
<< pos_range.begin().get_position().file << ":"
<< pos_range.begin().get_position().line << ":["
<< pos_range.begin().get_position().column << ".."
<< pos_range.end ().get_position().column << "]\t"
<< "'" << std::string(pos_range.begin(),pos_range.end()) << "'\t = "
<< value
<< '\n';
return value;
}
BOOST_PHOENIX_ADAPT_FUNCTION(double, doStuff, doStuff_, 2)
template <typename Iterator, typename Skipper>
struct parse_grammar : qi::grammar<Iterator, data_t(), Skipper>
{
parse_grammar()
: parse_grammar::base_type(start_p, "start_p")
{
using qi::raw;
using qi::double_;
using qi::_1;
using qi::_val;
using qi::eoi;
using phx::push_back;
value_p = raw [ double_ ] [ _val = doStuff(phx::ref(*this), _1) ];
start_p = value_p % ',' > eoi;
// // To use without the semantic action (more efficient):
// start_p = double_ % ',' >> eoi;
}
qi::rule<Iterator, data_t::value_type(), Skipper> value_p;
qi::rule<Iterator, data_t(), Skipper> start_p;
};
// implementation
data_t parse(std::istream& input, const std::string& filename)
{
// iterate over stream input
typedef std::istreambuf_iterator<char> base_iterator_type;
base_iterator_type in_begin(input);
// convert input iterator to forward iterator, usable by spirit parser
typedef boost::spirit::multi_pass<base_iterator_type> forward_iterator_type;
forward_iterator_type fwd_begin = boost::spirit::make_default_multi_pass(in_begin);
forward_iterator_type fwd_end;
// wrap forward iterator with position iterator, to record the position
typedef classic::position_iterator2<forward_iterator_type> pos_iterator_type;
pos_iterator_type position_begin(fwd_begin, fwd_end, filename);
pos_iterator_type position_end;
parse_grammar<pos_iterator_type, ascii::space_type> gram;
data_t output;
// parse
try
{
if (!qi::phrase_parse(
position_begin, position_end, // iterators over input
gram, // recognize list of doubles
ascii::space, // comment skipper
output) // <-- attribute reference
)
{
std::cerr << "Parse failed at "
<< position_begin.get_position().file << ":"
<< position_begin.get_position().line << ":"
<< position_begin.get_position().column << "\n";
}
}
catch(const qi::expectation_failure<pos_iterator_type>& e)
{
const classic::file_position_base<std::string>& pos = e.first.get_position();
std::stringstream msg;
msg << "parse error at file " << pos.file
<< " line " << pos.line
<< " column " << pos.column
<< "\n\t'" << e.first.get_currentline()
<< "'\n\t " << std::string(pos.column, ' ') << "^-- here";
throw std::runtime_error(msg.str());
}
return output;
}
int main()
{
std::istringstream iss(
"1, -3.4 ,3.1415926\n"
",+inF,-NaN ,\n"
"2,-.4,4.14e7\n");
data_t parsed = parse(iss, "<inline-test>");
std::cout << "Done, parsed " << parsed.size() << " values ("
<< "min: " << *std::min_element(parsed.begin(), parsed.end()) << ", "
<< "max: " << *std::max_element(parsed.begin(), parsed.end()) << ")\n";
}
I'm parsing a text file, possibly several GB in size, consisting of lines as follows:
11 0.1
14 0.78
532 -3.5
Basically, one int and one float per line. The ints should be ordered and non-negative. I'd like to verify the data are as described, and have returned to me the min and max int in the range. This is what I've come up with:
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace my_parsers
{
using namespace qi;
using px::at_c;
using px::val;
template <typename Iterator>
struct verify_data : grammar<Iterator, locals<int>, std::pair<int, int>()>
{
verify_data() : verify_data::base_type(section)
{
section
= line(val(0)) [ at_c<0>(_val) = _1]
>> +line(_a) [ _a = _1]
>> eps [ at_c<1>(_val) = _a]
;
line
%= (int_ >> other) [
if_(_r1 >= _1)
[
std::cout << _r1 << " and "
<< _1 << val(" out of order\n")
]
]
;
other
= omit[(lit(' ') | '\t') >> float_ >> eol];
}
rule<Iterator, locals<int>, std::pair<int, int>() > section;
rule<Iterator, int(int)> line;
rule<Iterator> other;
};
}
using namespace std;
int main(int argc, char** argv)
{
string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
my_parsers::verify_data<string::iterator> verifier;
pair<int, int> p;
std::string::iterator begin(input.begin()), end(input.end());
cout << "parse result: " << boolalpha
<< qi::parse(begin, end, verifier, p) << endl;
cout << "p.first: " << p.first << "\np.second: " << p.second << endl;
return 0;
}
What I'd like to know is the following:
Is there a better way of going about this? I have used inherited and synthesised attributes, local variables and a bit of phoenix voodoo. This is great; learning the tools is good but I can't help thinking there might be a much simpler way of achieving the same thing :/ (within a PEG parser that is...)
How could it be done without the local variable for instance?
More info: I have other data formats that are being parsed at the same time and so I'd like to keep the return value as a parser attribute. At the moment this is a std::pair, the other data formats when parsed, will expose their own std::pairs for instance and it's these that I'd like to stuff in a std::vector.
This is at least a lot shorter already:
down to 28 LOC
no more locals
no more fusion vector at<> wizardry
no more inherited attributes
no more grammar class
no more manual iteration
using expectation points (see other) to enhance parse error reporting
this parser expressions synthesizes neatly into a vector<int> if you choose to assign it with %= (but it will cost performance, besides potentially allocating a largish array)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
int min=-1, max=0;
{
using namespace qi;
using px::val;
using px::ref;
It begin(input.begin()), end(input.end());
rule<It> index = int_
[
if_(ref(max) < _1) [ ref(max) = _1 ] .else_ [ std::cout << _1 << val(" out of order\n") ],
if_(ref(min) < 0) [ ref(min) = _1 ]
] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
Bonus
I might suggest taking the validation out of the expression and make it a free-standing function; of course, this makes things more verbose (and... legible) and my braindead sample uses global variables... -- but I trust you know how to use boost::bind or px::bind to make it more real-life
In addition to the above
down to 27 LOC even with the free function
no more phoenix, no more phoenix includes (yay compile times)
no more phoenix expression types in debug builds ballooning the binary and slowing it down
no more var, ref, if_, .else_ and the wretched operator, (which had major bug risk (at some time) due to the overload not being included with phoenix.hpp)
(easily ported to c++0x lambda's - immediately removing the need for global variables)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int min=-1, max=0, linenumber=0;
void validate_index(int index)
{
linenumber++;
if (min < 0) min = index;
if (max < index) max = index;
else std::cout << index << " out of order at line " << linenumber << std::endl;
}
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
It begin(input.begin()), end(input.end());
{
using namespace qi;
rule<It> index = int_ [ validate_index ] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
I guess a much simpler way would be to parse the file using standard stream operations and then check the ordering in a loop. First, the input:
typedef std::pair<int, float> value_pair;
bool greater(const value_pair & left, const value_pair & right) {
return left.first > right.first;
}
std::istream & operator>>(std::istream & stream, value_pair & value) {
stream >> value.first >> value.second;
return stream;
}
The use it like this:
std::ifstream file("your_file.txt");
std::istream_iterator<value_pair> it(file);
std::istream_iterator<value_pair> eof;
if(std::adjacent_find(it, eof, greater) != eof) {
std::cout << "The values are not ordered" << std::endl;
}
I find this a lot simpler.