Use of optional parser in spirit qi - c++

I'm trying to parse either an additive expression of the form "A+C", or "A" alone. After a few tests I realized that the problem is apparently my use of the optional parser, so to exemplify:
qi::rule<string::iterator, string()> Test;
Test =
(
qi::string("A")[qi::_val= qi::_1]
>> -(
qi::string("B")[qi::_val += qi::_1]
>> qi::string("C")[qi::_val += qi::_1]
)
)
;
string s1, s2;
s1 = "AB";
bool a= qi::parse(s1.begin(), s1.end(), Test, s2);
The idea is to parse 'A' or "ABC", but if the s1 value is "AB" without 'C', the value of a is true. I believe that although I put parenthesis after the operator '-' and then I use the ">>" operator, the 'C' part is considered optional, and not the B>>C as a whole. Any ideas?

Container attributes are not backtracked.
That's a performance choice. You need to explicitly control propagation using e.g. qi::hold:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
using It = std::string::const_iterator;
qi::rule<It, std::string()> Test;
Test =
(
qi::char_('A')
>> -qi::hold [
qi::char_('B')
>> qi::char_('C')
]
)
;
for (std::string const input : { "A", "AB", "ABC" })
{
std::cout << "-------------------------\nTesting '" << input << "'\n";
It f = input.begin(), l = input.end();
std::string parsed;
bool ok = qi::parse(f, l, Test, parsed);
if (ok)
std::cout << "Parsed success: " << parsed << "\n";
else
std::cout << "Parsed failed\n";
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Prints:
-------------------------
Testing 'A'
Parsed success: A
-------------------------
Testing 'AB'
Parsed success: A
Remaining unparsed: 'B'
-------------------------
Testing 'ABC'
Parsed success: ABC
Note I have made a number of simplifications.
See also:
Boost Spirit: "Semantic actions are evil"?
boost::spirit::qi duplicate parsing on the output
Understanding Boost.spirit's string parser

Related

Boost Spirit (x3) failing to consume last token when parsing character escapes

Using boost spirit x3 to parse escaped ascii strings I came across this answer but am getting an expectation exception. I have changed the expectation operator in the original to the sequence operator to disable the exception in the code below. Running the code it parses the input and assigns the correct value to the attribute but returns false and is not consuming the input. Any ideas what I've done wrong here?
gcc version 10.3.0
boost 1.71
std = c++17
#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
namespace x3 = boost::spirit::x3;
using namespace std::string_literals;
//changed expectation to sequence
auto const qstring = x3::lexeme['"' >> *(
"\\n" >> x3::attr('\n')
| "\\b" >> x3::attr('\b')
| "\\f" >> x3::attr('\f')
| "\\t" >> x3::attr('\t')
| "\\v" >> x3::attr('\v')
| "\\0" >> x3::attr('\0')
| "\\r" >> x3::attr('\r')
| "\\n" >> x3::attr('\n')
| "\\" >> x3::char_("\"\\")
| "\\\"" >> x3::char_('"')
| ~x3::char_('"')
) >> '"'];
int main(int, char**){
auto const quoted = "\"Hel\\\"lo Wor\\\"ld"s;
auto const expected = "Hel\"lo Wor\"ld"s;
std::string result;
auto first = quoted.begin();
auto const last = quoted.end();
bool ok = x3::phrase_parse(first, last, qstring, x3::ascii::space, result);
std::cout << "parse returned " << std::boolalpha << ok << '\n';
std::cout << result << " == " << expected << " is " << std::boolalpha << (result == expected) << '\n';
std::cout << "first == last = " << (first == last) << '\n';
std::cout << "first = " << *first << '\n';
return 0;
}
Your input isn't terminated with a quote character. Writing it as a raw string literal helps:
std::string const qinput = R"("Hel\"lo Wor\"ld)";
Should be
std::string const qinput = R"("Hel\"lo Wor\"ld")";
Now, the rest is common container handling: in Spirit, when a rule fails (also when it just backtracks a branch) the container attribute is not rolled back. See e.g. boost::spirit::qi duplicate parsing on the output, Understanding Boost.spirit's string parser, etc.
Basically, you cannot rely on the result if the parse failed. This is likely why the original had an expectation point: to raise an exception.
A full demonstration of the correct working:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
auto escapes = []{
x3::symbols<char> sym;
sym.add
("\\b", '\b')
("\\f", '\f')
("\\t", '\t')
("\\v", '\v')
("\\0", '\0')
("\\r", '\r')
("\\n", '\n')
("\\\\", '\\')
("\\\"", '"')
;
return sym;
}();
auto const qstring = x3::lexeme['"' >> *(escapes | ~x3::char_('"')) >> '"'];
int main(){
auto squote = [](std::string_view s) { return std::quoted(s, '\''); };
std::string const expected = R"(Hel"lo Wor"ld)";
for (std::string const qinput : {
R"("Hel\"lo Wor\"ld)", // oops no closing quote
R"("Hel\"lo Wor\"ld")",
"\"Hel\\\"lo Wor\\\"ld\"", // if you insist
R"("Hel\"lo Wor\"ld" trailing data)",
})
{
std::cout << "\n -- input " << squote(qinput) << "\n";
std::string result;
auto first = cbegin(qinput);
auto last = cend(qinput);
bool ok = x3::phrase_parse(first, last, qstring, x3::space, result);
ok &= (first == last);
std::cout << "parse returned " << std::boolalpha << ok << "\n";
std::cout << squote(result) << " == " << squote(expected) << " is "
<< (result == expected) << "\n";
if (first != last)
std::cout << "Remaining input unparsed: " << squote({first, last})
<< "\n";
}
}
Prints
-- input '"Hel\\"lo Wor\\"ld'
parse returned false
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
Remaining input unparsed: '"Hel\\"lo Wor\\"ld'
-- input '"Hel\\"lo Wor\\"ld"'
parse returned true
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
-- input '"Hel\\"lo Wor\\"ld"'
parse returned true
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
-- input '"Hel\\"lo Wor\\"ld" trailing data'
parse returned false
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
Remaining input unparsed: 'trailing data'

use boost spirit parse int pair to vector

The string content is like:
20 10 5 3...
it is a list of pair of int. How to use spirit parse it to std::vector<std::pair<int, int>>?
std::string line;
std::vector<std::pair<int, int>> v;
boost::spirit::qi::phrase_parse(
line.cbegin(),
line.cend(),
(
???
),
boost::spirit::qi::space
);
You could do a simple parser expression like *(int_ >> int_) (see the tutorial and these documentation pages).
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace qi = boost::spirit::qi;
int main() {
std::string line = "20 10 5 3";
std::vector<std::pair<int, int>> v;
qi::phrase_parse(line.cbegin(), line.cend(), *(qi::int_ >> qi::int_), qi::space, v);
for (auto& p : v) {
std::cout << "(" << p.first << ", " << p.second << ")\n";
}
}
Prints
(20, 10)
(5, 3)
Pro Tip 1: Validity
If you want to make sure there's no unwanted/unexpected input, check for remaining data:
check the iterators after parsing
auto f = line.cbegin(), l = line.cend();
qi::phrase_parse(f, l, *(qi::int_ >> qi::int_), qi::space, v);
if (f!=l)
std::cout << "Unparsed input '" << std::string(f,l) << "'\n";
or simple require qi::eoi as part of the parser expression and check the return value:
bool ok = qi::phrase_parse(line.cbegin(), line.cend(), *(qi::int_ >> qi::int_) >> qi::eoi, qi::space, v);
Pro Tip 2: "Look ma, no hands"
Since the grammar is trivially the simplest thing that could parse into this datastructure, you can let Spirit do all the guesswork:
Live On Coliru
qi::phrase_parse(line.begin(), line.end(), qi::auto_, qi::space, v);
That's, a grammar consisting of nothing but a single qi::auto_. Output is still:
(20, 10)
(5, 3)

how to access elements of a tuple in a semantic action

My grammar has various entries which start with a generic name.
After I determined the type I would like to use the expectation operator in order to create parsing errors.
rule1=name >> (type1 > something);
rule2=name >> (type2 > something);
I already figured that I cannot mix the two operators > and >> -- that's why the parenthesis. My guess is that the parenthesis causes a tuple to be created.
How do I access the elements of the tuple in the semantic action?
The following is certainly wrong but should clarify what I want to accomplish.
rule1=(name >> (type1 > something))[qi::_val = boost::phoenix::bind(
create,
qi::_1,
std::get<0>(qi::_2),
std::get<1>(qi::_2))];
thanks
Directly addressing the question:
using px::at_c;
rule1 = (name >> (type1 > something)) [_val = px::bind(create, _1, at_c<0>(_2), at_c<1>(_2))];
However, I'd use this little trick with qi::eps to avoid the complexity:
rule2 = (name >> type1 >> (eps > something)) [_val = px::bind(create, _1, _2, _3)];
Finally, look at boost::phoenix::function<>:
px::function<decltype(&create)> create_(create); // or just decltype(create) if it's a function object
rule3 = (name >> type1 >> (eps > something)) [_val = create_(_1, _2, _3)];
That way you can even have readable code!
DEMO
Just to prove that all three have the same behaviour¹
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/include/at_c.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
static int create(char n, char t, char s) {
assert(n=='n' && t=='t' && s=='s');
return 42;
}
int main() {
using It = std::string::const_iterator;
// fake rules just for demo
qi::rule<It, char()>
name = qi::char_("n"),
type1 = qi::char_("t"),
something = qi::char_("s");
//using boost::fusion::at_c;
qi::rule<It, int(), qi::space_type> rule1, rule2, rule3;
{
using namespace qi;
using px::at_c;
rule1 = (name >> (type1 > something)) [_val = px::bind(create, _1, at_c<0>(_2), at_c<1>(_2))];
rule2 = (name >> type1 >> (eps > something)) [_val = px::bind(create, _1, _2, _3)];
px::function<decltype(&create)> create_(create); // or just decltype(create) if it's a function object
rule3 = (name >> type1 >> (eps > something)) [_val = create_(_1, _2, _3)];
}
for(auto& parser : { rule1, rule2, rule3 }) {
for(std::string const input : { "n t s", "n t !" }) {
std::cout << "Input: '" << input << "'\n";
auto f = input.begin(), l = input.end();
int data;
try {
bool ok = qi::phrase_parse(f, l, parser, qi::space, data);
if (ok) {
std::cout << "Parsing result: " << data << '\n';
} else {
std::cout << "Parsing failed\n";
}
} catch(qi::expectation_failure<It> const& e) {
std::cout << "Expectation failure: " << e.what() << " at '" << std::string(e.first, e.last) << "'\n";
}
if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
std::cout << "-------------------------------------------\n";
}
}
}
Which prints 3x the same output:
Input: 'n t s'
Parsing result: 42
-------------------------------------------
Input: 'n t !'
Expectation failure: boost::spirit::qi::expectation_failure at '!'
Remaining unparsed: 'n t !'
-------------------------------------------
Input: 'n t s'
Parsing result: 42
-------------------------------------------
Input: 'n t !'
Expectation failure: boost::spirit::qi::expectation_failure at '!'
Remaining unparsed: 'n t !'
-------------------------------------------
Input: 'n t s'
Parsing result: 42
-------------------------------------------
Input: 'n t !'
Expectation failure: boost::spirit::qi::expectation_failure at '!'
Remaining unparsed: 'n t !'
-------------------------------------------
¹ PS let this serve as an example of how to create a SSCCE code example in your questions

Empty strings in vector returned from boost spirit x3 parser

I want to check a file for all enums(this is just an MCVE so nothing complicated) and the name of the enums should be stored in an std::vector I build my parsers like this:
auto const any = x3::rule<class any_id, const x3::unused_type>{"any"}
= ~x3::space;
auto const identifier = x3::rule<class identifier_id, std::string>{"identifier"}
= x3::lexeme[x3::char_("A-Za-z_") >> *x3::char_("A-Za-z_0-9")];
auto const enum_finder = x3::rule<class enum_finder_id, std::vector<std::string>>{"enum_finder"}
= *(("enum" >> identifier) | any);
When I am trying to parse a string with this enum_finder into a std::vector, the std::vector also contains a lot of empty string.
Why is this parser also parsing empty strings into the vector?
I've assumed you want to parse "enum " out of free form text ignoring whitespaces.
What you really want is for ("enum" >> identifier | any) to synthesize an optional<string>. Sadly, what you get is variant<string, unused_type> or somesuch.
The same happens when you wrap any with x3::omit[any] - it's still the same unused_type.
Plan B: Since you're really just parsing repeated enum-ids separated by "anything", why not use the list operator:
("enum" >> identifier) % any
This works a little. Now some tweaking: lets avoid eating "any" character by character. In fact, we can likely just consume whole whitespace delimited words: (note +~space is equivalent +graph):
auto const any = x3::rule<class any_id>{"any"}
= x3::lexeme [+x3::graph];
Next, to allow for multiple bogus words to be accepted in a row there's the trick to make the list's subject parser optional:
-("enum" >> identifier) % any;
This parses correctly. See a full demo:
DEMO
Live On Coliru
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
namespace parser {
using namespace x3;
auto any = lexeme [+~space];
auto identifier = lexeme [char_("A-Za-z_") >> *char_("A-Za-z_0-9")];
auto enum_finder = -("enum" >> identifier) % any;
}
#include <iostream>
int main() {
for (std::string input : {
"",
" ",
"bogus",
"enum one",
"enum one enum two",
"enum one bogus bogus more bogus enum two !##!##Yay",
})
{
auto f = input.begin(), l = input.end();
std::cout << "------------ parsing '" << input << "'\n";
std::vector<std::string> data;
if (phrase_parse(f, l, parser::enum_finder, x3::space, data))
{
std::cout << "parsed " << data.size() << " elements:\n";
for (auto& el : data)
std::cout << "\t" << el << "\n";
} else {
std::cout << "Parse failure\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Prints:
------------ parsing ''
parsed 0 elements:
------------ parsing ' '
parsed 0 elements:
------------ parsing 'bogus'
parsed 0 elements:
------------ parsing 'enum one'
parsed 1 elements:
one
------------ parsing 'enum one enum two'
parsed 1 elements:
one
------------ parsing 'enum one bogus bogus more bogus enum two !##!##Yay'
parsed 2 elements:
one
two

Extracting Values from string using spirit parser

i have following line
/90pv-RKSJ-UCS2C usecmap
std::string const line = "/90pv-RKSJ-UCS2C usecmap";
auto first = line.begin(), last = line.end();
std::string label, token;
bool ok = qi::phrase_parse(
first, last,
qi::lexeme [ "/" >> +~qi::char_(" ") ] >> ' ' >> qi::lexeme[+~qi::char_(' ')] , qi::space, label, token);
if (ok)
std::cout << "Parse success: label='" << label << "', token='" << token << "'\n";
else
std::cout << "Parse failed\n";
if (first!=last)
std::cout << "Remaining unparsed input: '" << std::string(first, last) << "'\n";
I want to 90pv-RKSJ-UCS2C in label and usecmap in token variable.
I extract 90pv-RKSJ-UCS2C value but not usecmap
With space the skipper, you cannot ever match ' ' (it is skipped!). See also: Boost spirit skipper issues
So, either don't use a skipper, or allow the skipper to eat it:
bool ok = qi::phrase_parse(
first, last,
qi::lexeme [ "/" >> +qi::graph ] >> qi::lexeme[+qi::graph], qi::blank, label, token);
Notes:
I used qi::graph instead of the ~qi::char_(" ") formulation
I used blank_type because you said
i have following line
Which implies that line-ends should not be skipped
Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
std::string const line = "/90pv-rksj-ucs2c usecmap";
auto first = line.begin(), last = line.end();
std::string label, token;
bool ok = qi::phrase_parse(
first, last,
qi::lexeme [ "/" >> +qi::graph ] >> qi::lexeme[+qi::graph], qi::blank, label, token);
if (ok)
std::cout << "parse success: label='" << label << "', token='" << token << "'\n";
else
std::cout << "parse failed\n";
if (first!=last)
std::cout << "remaining unparsed input: '" << std::string(first, last) << "'\n";
}
Prints:
parse success: label='90pv-rksj-ucs2c', token='usecmap'
If you are using C++11, I suggest using regular expression.
#include <iostream>
#include <regex>
using namespace std;
int main() {
regex re("^/([^\\s]*)\\s([^\\s]*)"); // 1st () captures
// 90pv-RKSJ-UCS2C and 2nd () captures usecmap
smatch sm;
string s="/90pv-RKSJ-UCS2C usecmap";
regex_match(s,sm,re);
for(int i=0;i<sm.size();i++) {
cout<<sm[i]<<endl;
}
string label=sm[1],token=sm[2];
system("pause");
}