How to set max recursion in boost spirit

How to set max recursion in boost spirit - c++

Using boost::spirit, if I have a recursive rule to parse parentheses
rule<std::string::iterator, std::string()> term;
term %= string("(") >> *term >> string(")");
how do I limit the maximum amount of recursion? For example, if I try to parse a million nested parentheses, I get a segfault because the stack size has been exceeded. To be concrete, here is a complete sample.
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
int main(void)
{
using namespace boost::spirit;
using namespace boost::spirit::qi;
const size_t string_size = 1000000;
std::string str;
str.resize(string_size);
for (size_t s=0; s<str.size()/2; ++s)
{
str[s]='(';
str[str.size() - s -1] = ')';
}
rule<std::string::iterator, std::string()> term;
term %= string("(") >> *term >> string(")");
std::string h;
parse(str.begin(), str.end(), term, h);
}
I compiled it with the command
g++ simple.cxx -o simple -std=c++11
It works fine if I set string_size to 1000 instead of 1000000.

Keep track of the depth in a qi::local<> or a phx::ref().
In this case an inherited attribute can take the role of the qi::local quite naturally:
qi::rule<std::string::const_iterator, std::string(size_t depth)> term;
qi::_r1_type _depth;
term %=
qi::eps(_depth < 32) >>
qi::string("(") >> *term(_depth + 1) >> qi::string(")");
term will now fail when depth exceeds 32.
Full Sample
Live On Coliru
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
int main(void) {
for (size_t n : { 2, 4, 8, 16, 32, 64 }) {
auto const str = [&n] {
std::string str;
str.reserve(n);
while (n--) { str.insert(str.begin(), '('); str.append(1, ')'); }
return str;
}();
std::cout << "Input length " << str.length() << "\n";
qi::rule<std::string::const_iterator, std::string(size_t depth)> term;
qi::_r1_type _depth;
term %=
qi::eps(_depth < 32) >>
qi::string("(") >> *term(_depth + 1) >> qi::string(")");
std::string h;
auto f = str.begin(), l = str.end();
bool ok = qi::parse(f, l, term(0u), h);
if (ok)
std::cout << "Ok: " << h << "\n";
else
std::cout << "Fail\n";
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f, std::min(f + 40, l)) << "'...\n";
}
}
Output:
Input length 4
Ok: (())
Input length 8
Ok: (((())))
Input length 16
Ok: (((((((())))))))
Input length 32
Ok: (((((((((((((((())))))))))))))))
Input length 64
Ok: (((((((((((((((((((((((((((((((())))))))))))))))))))))))))))))))
Input length 128
Fail
Remaining unparsed: '(((((((((((((((((((((((((((((((((((((((('...

Related

boost::spirit qi::uint_ valid number range

I want to parse string which consists of CC[n], where 1 <= n <= 4 or from SERVICE[k], where 1 <= k <= 63.
Valid strings: "CC1", "CC2", "CC3", "CC4", "SERVICE1", "SERVICE2", ..., "SERVICE63".
I wrote the next expression:
( '"' >> (qi::raw["CC" >> qi::uint_] | qi::raw["SERVICE" >> qi::uint_]) >> '"' >> qi::eoi)
But how I can limit n and k?
In output I need to got full string CC1, CC2, ... SERVICE63

The simplest way would be to use symbols<>.
The elaborate way is to validate the numbers in semantic actions.
My recommendation is is either symbols OR separate semantic validation from parsing (i.e. parse the numbers raw and validate the AST after the parse)
Symbols
This is likely the more flexible, most efficient, and allows you to be strongtyped in your AST domain. It sidesteps the compilation overhead and complexity of semantic actions: Boost Spirit: "Semantic actions are evil"?
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
int main() {
qi::symbols<char> cc, service;
cc += "CC1", "CC2", "CC3", "CC4";
service += "SERVICE1", "SERVICE2", "SERVICE3", "SERVICE4", "SERVICE5",
"SERVICE6", "SERVICE7", "SERVICE8", "SERVICE9", "SERVICE10",
"SERVICE11", "SERVICE12", "SERVICE13", "SERVICE14", "SERVICE15",
"SERVICE16", "SERVICE17", "SERVICE18", "SERVICE19", "SERVICE20",
"SERVICE21", "SERVICE22", "SERVICE23", "SERVICE24", "SERVICE25",
"SERVICE26", "SERVICE27", "SERVICE28", "SERVICE29", "SERVICE30",
"SERVICE31", "SERVICE32", "SERVICE33", "SERVICE34", "SERVICE35",
"SERVICE36", "SERVICE37", "SERVICE38", "SERVICE39", "SERVICE40",
"SERVICE41", "SERVICE42", "SERVICE43", "SERVICE44", "SERVICE45",
"SERVICE46", "SERVICE47", "SERVICE48", "SERVICE49", "SERVICE50",
"SERVICE51", "SERVICE52", "SERVICE53", "SERVICE54", "SERVICE55",
"SERVICE56", "SERVICE57", "SERVICE58", "SERVICE59", "SERVICE60",
"SERVICE61", "SERVICE62", "SERVICE63";
for (std::string const input : {
// valid:
"CC1",
"CC2",
"CC3",
"CC4",
"SERVICE1",
"SERVICE2",
"SERVICE63",
// invalid:
"CC0",
"CC5",
"SERVICE0",
"SERVICE64",
}) {
bool valid = parse(begin(input), end(input), service|cc);
std::cout << std::quoted(input) << " -> "
<< (valid ? "valid" : "invalid") << "\n";
}
}
Prints
"CC1" -> valid
"CC2" -> valid
"CC3" -> valid
"CC4" -> valid
"SERVICE1" -> valid
"SERVICE2" -> valid
"SERVICE63" -> valid
"CC0" -> invalid
"CC5" -> invalid
"SERVICE0" -> invalid
"SERVICE64" -> invalid
Bonus: the strongtyped idea: http://coliru.stacked-crooked.com/a/2cb07d4da9aad39e
Semantic Actions
In a nutshell:
qi::rule<It, intmax_t(intmax_t min, intmax_t max)> constrained_num =
qi::uint_[_pass = (_1 >= _r1 && _1 <= _r2)];
qi::rule<It> cc = "CC" >> constrained_num(1, 4),
service = "SERVICE" >> constrained_num(1, 63);
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
using It = std::string::const_iterator;
int main() {
using namespace qi::labels;
qi::rule<It, intmax_t(intmax_t min, intmax_t max)> constrained_num =
qi::uint_[_pass = (_1 >= _r1 && _1 <= _r2)];
qi::rule<It> cc = "CC" >> constrained_num(1, 4),
service = "SERVICE" >> constrained_num(1, 63);
for (std::string const input : {
// valid:
"CC1",
"CC2",
"CC3",
"CC4",
"SERVICE1",
"SERVICE2",
"SERVICE63",
// invalid:
"CC0",
"CC5",
"SERVICE0",
"SERVICE64",
}) {
bool valid = parse(begin(input), end(input), service|cc);
std::cout << std::quoted(input) << " -> "
<< (valid ? "valid" : "invalid") << "\n";
}
}
Prints the same as above

To limit uint_ range, you can perform a range-checking in a semantic action. It can be implemented, for example, as lambda or, more concisely, as a Boost.Phenix expression.
The following code parses these numbers into a vector (omitting the strings):
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
int main()
{
std::string input = "CC1 CC2 CC3 CC4 SERVICE1 SERVICE2";
std::vector<unsigned int> out;
using namespace boost::spirit::qi;
phrase_parse(
input.begin(),
input.end(),
*(lexeme[lit("CC") >> uint_ [ _pass = (_1>=1 && _1<=4) ]] |
lexeme[lit("SERVICE") >> uint_ [ _pass = (_1>=1 && _1<=63) ]]),
ascii::space,
out
);
for (auto i : out)
std::cout << i << std::endl;
}

Parsing two vectors of strings using boost:qi

I am new to using qi, and have run into a difficulty. I wish to parse an input like:
X + Y + Z , A + B
Into two vectors of strings.
I have code does this, but only if the grammar parses single characters. Ideally, the following line should be readable:
Xi + Ye + Zou , Ao + Bi
Using a simple replacement such as elem = +(char_ - '+') % '+' fails to parse, because it will consume the ',' on the first elem, but I've not discovered a simple way around this.
Here is my single-character code, for reference:
#include <bits/stdc++.h>
#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef std::vector<std::string> element_array;
struct reaction_t
{
element_array reactants;
element_array products;
};
BOOST_FUSION_ADAPT_STRUCT(reaction_t, (element_array, reactants)(element_array, products))
template<typename Iterator>
struct reaction_parser : qi::grammar<Iterator,reaction_t(),qi::blank_type>
{
reaction_parser() : reaction_parser::base_type(reaction)
{
using namespace qi;
elem = char_ % '+';
reaction = elem >> ',' >> elem;
BOOST_SPIRIT_DEBUG_NODES((reaction)(elem));
}
qi::rule<Iterator, reaction_t(), qi::blank_type> reaction;
qi::rule<Iterator, element_array(), qi::blank_type> elem;
};
int main()
{
const std::string input = "X + Y + Z, A + B";
auto f = begin(input), l = end(input);
reaction_parser<std::string::const_iterator> p;
reaction_t data;
bool ok = qi::phrase_parse(f, l, p, qi::blank, data);
if (ok) std::cout << "success\n";
else std::cout << "failed\n";
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

Using a simple replacement such as elem = +(char_ - '+') % '+' fails to parse, because it will consume the ',' on the first elem, but I've not discovered a simple way around this.
Well, the complete (braindead) simple solution would be to use +(char_ - '+' - ',') or +~char_("+,").
Really, though, I'd make the rule for element more specific, e.g.:
elem = qi::lexeme [ +alpha ] % '+';
See Boost spirit skipper issues about lexeme and skippers
Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef std::vector<std::string> element_array;
struct reaction_t
{
element_array reactants;
element_array products;
};
BOOST_FUSION_ADAPT_STRUCT(reaction_t, (element_array, reactants)(element_array, products))
template<typename Iterator>
struct reaction_parser : qi::grammar<Iterator,reaction_t(),qi::blank_type>
{
reaction_parser() : reaction_parser::base_type(reaction) {
using namespace qi;
elem = qi::lexeme [ +alpha ] % '+';
reaction = elem >> ',' >> elem;
BOOST_SPIRIT_DEBUG_NODES((reaction)(elem));
}
qi::rule<Iterator, reaction_t(), qi::blank_type> reaction;
qi::rule<Iterator, element_array(), qi::blank_type> elem;
};
int main()
{
reaction_parser<std::string::const_iterator> p;
for (std::string const input : {
"X + Y + Z, A + B",
"Xi + Ye + Zou , Ao + Bi",
})
{
std::cout << "----- " << input << "\n";
auto f = begin(input), l = end(input);
reaction_t data;
bool ok = qi::phrase_parse(f, l, p, qi::blank, data);
if (ok) {
std::cout << "success\n";
for (auto r : data.reactants) { std::cout << "reactant: " << r << "\n"; }
for (auto p : data.products) { std::cout << "product: " << p << "\n"; }
}
else
std::cout << "failed\n";
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
}
}
Printing:
----- X + Y + Z, A + B
success
reactant: X
reactant: Y
reactant: Z
product: A
product: B
----- Xi + Ye + Zou , Ao + Bi
success
reactant: Xi
reactant: Ye
reactant: Zou
product: Ao
product: Bi

Understanding the List Operator (%) in Boost.Spirit

Can you help me understand the difference between the a % b parser and its expanded a >> *(b >> a) form in Boost.Spirit? Even though the reference manual states that they are equivalent,
The list operator, a % b, is a binary operator that matches a list of one or more repetitions of a separated by occurrences of b. This is equivalent to a >> *(b >> a).
the following program produces different results depending on which is used:
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
struct Record {
int id;
std::vector<int> values;
};
BOOST_FUSION_ADAPT_STRUCT(Record,
(int, id)
(std::vector<int>, values)
)
int main() {
namespace qi = boost::spirit::qi;
const auto str = std::string{"1: 2, 3, 4"};
const auto rule1 = qi::int_ >> ':' >> (qi::int_ % ',') >> qi::eoi;
const auto rule2 = qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_)) >> qi::eoi;
Record record1;
if (qi::phrase_parse(str.begin(), str.end(), rule1, qi::space, record1)) {
std::cout << record1.id << ": ";
for (const auto& value : record1.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
Record record2;
if (qi::phrase_parse(str.begin(), str.end(), rule2, qi::space, record2)) {
std::cout << record2.id << ": ";
for (const auto& value : record2.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
Live on Coliru
1: 2, 3, 4,
1: 2,
rule1 and rule2 are different only in that rule1 uses the list operator ((qi::int_ % ',')) and rule2 uses its expanded form ((qi::int_ >> *(',' >> qi::int_))). However, rule1 produced 1: 2, 3, 4, (as expected) and rule2 produced 1: 2,. I cannot understand the result of rule2: 1) why is it different from that of rule1 and 2) why were 3 and 4 not included in record2.values even though phrase_parse returned true somehow?

Update X3 version added
First off, you fallen into a deep trap here:
Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).
So, first off:
const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',') >> qi::eoi);
Now, when you delete the redundancy in the program, you get:
Reproducing the problem
Live On Coliru
int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
}
Printing
1: 2, 3, 4,
1: 2,
The cause and the fix
What happened to 3, 4 which was successfully parsed?
Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.
If you want to make container attributes parse as "an atomic group", use qi::as<>:
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};
BOOST_FUSION_ADAPT_STRUCT(Record, id, values)
namespace qi = boost::spirit::qi;
template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";
Record record;
if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
}
Prints
1: 2, 3, 4,
1: 2,
1: 2, 3, 4,

Because it's time to get people started with X3 (the new version of Spirit), and because I like to challenge msyelf to do the corresponding tasks in Spirit X3, here is the Spirit X3 version.
There's no problem with auto in X3.
The "broken" case also behaves much better, triggering this static assertion:
// If you got an error here, then you are trying to pass
// a fusion sequence with the wrong number of elements
// as that expected by the (sequence) parser.
static_assert(
fusion::result_of::size<Attribute>::value == (l_size + r_size)
, "Attribute does not have the expected size."
);
That's nice, right?
The workaround seems a bit less readable:
test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));
But it would be trivial to write your own as<> "directive" (or just a function), if you wanted:
namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};
template <typename T> static const as_type<T> as = {};
}
DEMO
Live On Coliru
#include <iostream>
#include <string>
#include <vector>
#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/home/x3.hpp>
struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};
namespace x3 = boost::spirit::x3;
template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";
Record record;
auto attr = std::tie(record.id, record.values);
if (x3::phrase_parse(str.begin(), str.end(), rule >> x3::eoi, x3::space, attr)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}
namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};
template <typename T> static const as_type<T> as = {};
}
int main() {
using namespace x3;
test(int_ >> ':' >> (int_ % ','));
//test(int_ >> ':' >> (int_ >> *(',' >> int_))); // COMPILER asserts "Attribute does not have the expected size."
// "clumsy" x3 style workaround
test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));
// using an ad-hoc `as<>` implementation:
test(int_ >> ':' >> as<Record::values_t>[int_ >> *(',' >> int_)]);
}
Prints
1: 2, 3, 4,
1: 2, 3, 4,
1: 2, 3, 4,

How do you output the original unparsed code (as a comment) from a spirit parser

Given the input string: A = 23; B = 5, I currently get the (expected) output:
Output: 0xa0000023
Output: 0xa0010005
-------------------------
I would like to see this instead:
Output: 0xa0000023 // A = 23
Output: 0xa0010005 // B = 5
-------------------------
The core line of code is:
statement = eps[_val = 0x50000000] >> identifier[_val += _1<<16] >>
"=" >> hex[_val += (_1 & 0x0000FFFF)];
Where identifier is a qi::symbols table lookup.
The rest of my code looks like this:
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <iomanip>
#include <ios>
#include <string>
#include <complex>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
struct reg16_ : qi::symbols<char,unsigned> {
reg16_() {
add ("A", 0) ("B", 1) ("C", 2) ("D", 3) ;
}
} reg16;
template <typename Iterator>
struct dash_script_parser : qi::grammar<Iterator, std::vector<unsigned>(), ascii::space_type> {
dash_script_parser() : dash_script_parser::base_type(start) {
using qi::hex;
using qi::_val;
using qi::_1;
using qi::eps;
identifier %= reg16;
start %= (statement % ";" );
statement = eps[_val = 0x50000000] >> identifier[_val += _1<<16]>> "=" >> hex[_val += (_1 & 0x0000FFFF)];
}
qi::rule<Iterator, std::vector<unsigned>(), ascii::space_type> start;
qi::rule<Iterator, unsigned(), ascii::space_type> statement;
qi::rule<Iterator, unsigned()> identifier;
};
int
main()
{
std::cout << "\t\tA parser for Spirit...\n\n" << "Type [q or Q] to quit\n\n";
dash_script_parser<std::string::const_iterator> g;
std::string str;
while (getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q') break;
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
std::vector<unsigned> strs;
bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, strs);
if (r && iter == end) {
for(std::vector<unsigned>::const_iterator it=strs.begin(); it<strs.end(); ++it)
std::cout << "Output: 0x" << std::setw(8) << std::setfill('0') << std::hex <<*it << "\n";
} else
std::cout << "Parsing failed\n";
}
return 0;
}

Update A newer answer brought iter_pos to my attention (from Boost Spirit Repository):
How do I capture the original input into the synthesized output from a spirit grammar?
This basically does the same as below, but without 'abusing' semantic actions (making it a much better fit, especially with automatic attribute propagation.
My gut feeling says that it will probably be easier to isolate statements into raw source iterator ranges first, and then parse the statements in isolation. That way, you'll have the corresponding source text at the start.
With that out of the way, here is an approach I tested to work without subverting your sample code too much:
1. Make the attribute type a struct
Replace the primitive unsigned with a struct that also contains the source snippet, verbatim, as a string:
struct statement_t
{
unsigned value;
std::string source;
};
BOOST_FUSION_ADAPT_STRUCT(statement_t, (unsigned, value)(std::string, source));
2. Make the parser fill both fields
The good thing is, you were already using semantic actions, so it is merely building onto that. Note that the result is not very pretty, and would benefit hugely from being converted into a (fused) functor. But it shows the technique very clearly:
start %= (statement % ";" );
statement = qi::raw [
raw[eps] [ at_c<0>(_val) = 0x50000000 ]
>> identifier [ at_c<0>(_val) += _1<<16 ]
>> "=" >> hex [ at_c<0>(_val) += (_1 & 0x0000FFFF) ]
]
[ at_c<1>(_val) = construct<std::string>(begin(_1), end(_1)) ]
;
3. Print
So, at_c<0>(_val) corresponds to statement::value, and at_c<1>(_val) corresponds to statement::source. This slightly modified output loop:
for(std::vector<statement_t>::const_iterator it=strs.begin(); it<strs.end(); ++it)
std::cout << "Output: 0x" << std::setw(8) << std::setfill('0') << std::hex << it->value << " // " << it->source << "\n";
outputs:
Output: 0x50000023 // A = 23
Output: 0x50010005 // B = 5
Full sample
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <iomanip>
#include <ios>
#include <string>
#include <complex>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
namespace phx = boost::phoenix;
struct reg16_ : qi::symbols<char,unsigned> {
reg16_() {
add ("A", 0) ("B", 1) ("C", 2) ("D", 3) ;
}
} reg16;
struct statement_t
{
unsigned value;
std::string source;
};
BOOST_FUSION_ADAPT_STRUCT(statement_t, (unsigned, value)(std::string, source));
template <typename Iterator>
struct dash_script_parser : qi::grammar<Iterator, std::vector<statement_t>(), ascii::space_type> {
dash_script_parser() : dash_script_parser::base_type(start) {
using qi::hex;
using qi::_val;
using qi::_1;
using qi::eps;
using qi::raw;
identifier %= reg16;
using phx::begin;
using phx::end;
using phx::at_c;
using phx::construct;
start %= (statement % ";" );
statement = raw [
raw[eps] [ at_c<0>(_val) = 0x50000000 ]
>> identifier [ at_c<0>(_val) += _1<<16 ]
>> "=" >> hex [ at_c<0>(_val) += (_1 & 0x0000FFFF) ]
]
[ at_c<1>(_val) = construct<std::string>(begin(_1), end(_1)) ]
;
}
qi::rule<Iterator, std::vector<statement_t>(), ascii::space_type> start;
qi::rule<Iterator, statement_t(), ascii::space_type> statement;
qi::rule<Iterator, unsigned()> identifier;
};
int
main()
{
std::cout << "\t\tA parser for Spirit...\n\n" << "Type [q or Q] to quit\n\n";
dash_script_parser<std::string::const_iterator> g;
std::string str;
while (getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q') break;
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
std::vector<statement_t> strs;
bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, strs);
if (r && iter == end) {
for(std::vector<statement_t>::const_iterator it=strs.begin(); it<strs.end(); ++it)
std::cout << "Output: 0x" << std::setw(8) << std::setfill('0') << std::hex << it->value << " // " << it->source << "\n";
} else
std::cout << "Parsing failed\n";
}
return 0;
}

how to parse and verify an ordered list of integers using qi

I'm parsing a text file, possibly several GB in size, consisting of lines as follows:
11 0.1
14 0.78
532 -3.5
Basically, one int and one float per line. The ints should be ordered and non-negative. I'd like to verify the data are as described, and have returned to me the min and max int in the range. This is what I've come up with:
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace my_parsers
{
using namespace qi;
using px::at_c;
using px::val;
template <typename Iterator>
struct verify_data : grammar<Iterator, locals<int>, std::pair<int, int>()>
{
verify_data() : verify_data::base_type(section)
{
section
= line(val(0)) [ at_c<0>(_val) = _1]
>> +line(_a) [ _a = _1]
>> eps [ at_c<1>(_val) = _a]
;
line
%= (int_ >> other) [
if_(_r1 >= _1)
[
std::cout << _r1 << " and "
<< _1 << val(" out of order\n")
]
]
;
other
= omit[(lit(' ') | '\t') >> float_ >> eol];
}
rule<Iterator, locals<int>, std::pair<int, int>() > section;
rule<Iterator, int(int)> line;
rule<Iterator> other;
};
}
using namespace std;
int main(int argc, char** argv)
{
string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
my_parsers::verify_data<string::iterator> verifier;
pair<int, int> p;
std::string::iterator begin(input.begin()), end(input.end());
cout << "parse result: " << boolalpha
<< qi::parse(begin, end, verifier, p) << endl;
cout << "p.first: " << p.first << "\np.second: " << p.second << endl;
return 0;
}
What I'd like to know is the following:
Is there a better way of going about this? I have used inherited and synthesised attributes, local variables and a bit of phoenix voodoo. This is great; learning the tools is good but I can't help thinking there might be a much simpler way of achieving the same thing :/ (within a PEG parser that is...)
How could it be done without the local variable for instance?
More info: I have other data formats that are being parsed at the same time and so I'd like to keep the return value as a parser attribute. At the moment this is a std::pair, the other data formats when parsed, will expose their own std::pairs for instance and it's these that I'd like to stuff in a std::vector.

This is at least a lot shorter already:
down to 28 LOC
no more locals
no more fusion vector at<> wizardry
no more inherited attributes
no more grammar class
no more manual iteration
using expectation points (see other) to enhance parse error reporting
this parser expressions synthesizes neatly into a vector<int> if you choose to assign it with %= (but it will cost performance, besides potentially allocating a largish array)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
int min=-1, max=0;
{
using namespace qi;
using px::val;
using px::ref;
It begin(input.begin()), end(input.end());
rule<It> index = int_
[
if_(ref(max) < _1) [ ref(max) = _1 ] .else_ [ std::cout << _1 << val(" out of order\n") ],
if_(ref(min) < 0) [ ref(min) = _1 ]
] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
Bonus
I might suggest taking the validation out of the expression and make it a free-standing function; of course, this makes things more verbose (and... legible) and my braindead sample uses global variables... -- but I trust you know how to use boost::bind or px::bind to make it more real-life
In addition to the above
down to 27 LOC even with the free function
no more phoenix, no more phoenix includes (yay compile times)
no more phoenix expression types in debug builds ballooning the binary and slowing it down
no more var, ref, if_, .else_ and the wretched operator, (which had major bug risk (at some time) due to the overload not being included with phoenix.hpp)
(easily ported to c++0x lambda's - immediately removing the need for global variables)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int min=-1, max=0, linenumber=0;
void validate_index(int index)
{
linenumber++;
if (min < 0) min = index;
if (max < index) max = index;
else std::cout << index << " out of order at line " << linenumber << std::endl;
}
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
It begin(input.begin()), end(input.end());
{
using namespace qi;
rule<It> index = int_ [ validate_index ] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}

I guess a much simpler way would be to parse the file using standard stream operations and then check the ordering in a loop. First, the input:
typedef std::pair<int, float> value_pair;
bool greater(const value_pair & left, const value_pair & right) {
return left.first > right.first;
}
std::istream & operator>>(std::istream & stream, value_pair & value) {
stream >> value.first >> value.second;
return stream;
}
The use it like this:
std::ifstream file("your_file.txt");
std::istream_iterator<value_pair> it(file);
std::istream_iterator<value_pair> eof;
if(std::adjacent_find(it, eof, greater) != eof) {
std::cout << "The values are not ordered" << std::endl;
}
I find this a lot simpler.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to set max recursion in boost spirit - c++

Related

boost::spirit qi::uint_ valid number range

Parsing two vectors of strings using boost:qi

Understanding the List Operator (%) in Boost.Spirit

How do you output the original unparsed code (as a comment) from a spirit parser

how to parse and verify an ordered list of integers using qi

Categories

Resources