Make Boost.Spirit parser to skip all spaces - c++

#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main ()
{
std::string input(" aaa ");
std::string::iterator strbegin = input.begin();
std::string p;
qi::phrase_parse(strbegin, input.end(),
qi::lexeme[+qi::char_],
qi::space,
p);
std::cout << p << std::endl;
std::cout << p.size() << std::endl;
}
In this code parser assigns "aaa " to p. Why doesn't it skip all spaces? I expect p to be "aaa". How it can be fixed?

You are asking Spirit to emit the spaces with qi::lexeme. Compare: http://www.boost.org/doc/libs/1_55_0/libs/spirit/doc/html/spirit/qi/reference/directive/lexeme.html
The rule +(qi::char_ - qi::space) should do.

Related

Boost Spirit X3: skip parser that would do nothing

I'm getting myself familiarized with boost spirit v3. The question I want to ask is how to state the fact that you don't want to use skip parser in any way.
Consider a simple example of parsing comma-separated sequence of integers:
#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/home/x3.hpp>
int main()
{
using namespace boost::spirit::x3;
const std::string input{"2,4,5"};
const auto parser = int_ % ',';
std::vector<int> numbers;
auto start = input.cbegin();
auto r = phrase_parse(start, input.end(), parser, space, numbers);
if(r && start == input.cend())
{
// success
for(const auto &item: numbers)
std::cout << item << std::endl;
return 0;
}
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
This works totally fine. However, I would like to forbid having spaces in between (i.e. "2, 4,5" should not be parsed well).
I tried using eps as a skip parser in phrase_parse, but as you can guess, the program ended up in the infinite loop because eps matches to an empty string.
Solution I found is to use no_skip directive (https://www.boost.org/doc/libs/1_75_0/libs/spirit/doc/html/spirit/qi/reference/directive/no_skip.html). So the parser now becomes:
const auto parser = no_skip[int_ % ','];
This works fine, but I don't find it to be an elegant solution (especially providing "space" parser in phrase_parse when I want no whitespace skips). Are there no skip parsers that would simply do nothing? Am I missing something?
Thanks for Your time. Looking forward to any replies.
You can use either no_skip[] or lexeme[]. They're almost identical, except for pre-skip (Boost Spirit lexeme vs no_skip).
Are there no skip parsers that would simply do nothing? Am I missing something?
A wild guess, but you might be missing the parse API that doesn't accept a skipper in the first place
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
std::string const input{ "2,4,5" };
auto f = begin(input), l = end(input);
const auto parser = x3::int_ % ',';
std::vector<int> numbers;
auto r = parse(f, l, parser, numbers);
if (r) {
// success
for (const auto& item : numbers)
std::cout << item << std::endl;
} else {
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
if (f!=l) {
std::cout << "Remaining input " << std::quoted(std::string(f,l)) << "\n";
return 2;
}
}
Prints
2
4
5

boost spirit parser : getting around the greedy kleene *

I have a grammar that should match sequence of characters followed a single character that is subset of the first one.
For example,
boost::spirit::qi::rule<Iterator, std::string()> grammar = *char_('a', 'z') >> char_('b', 'z').
Since the kleene * is greedy operator it gobbles up everything leaving nothing for the second parser, so it fails to match strings like "abcd"
Is there any way to get around this?
Yes, though your sample lacks context for us to know it.
We need to know what constitutes a complete match, because right now "b" would be a valid match, and "bb" or "bbb". So when the input is "bbb", what is going to be the match? (b, bb or bbb?).
And when you will answer (likely) "Obviously, bbb", then what happens for "bbbb"? When do you stop accepting chars from the subset? If you want the kleene star to not be greedy, do you want it to still be greedy?
The above dialog is annoying but the goal is to make you THINK about what you need. You do not need a non-greedy kleene-star. You probably want a validation constraint on the last char. Most likely, if the input has "bbba" you do not want to simply match "bbb", leaving "a". Instead you likely want to stop parsing because "bbba" is not a valid token.
Assuming That...
I'd write
grammar = +char_("a-z") >> eps(px::back(_val) != 'a');
Meaning that we accept at least 1 char as long as it matches, asserting that the last character wasn't the a.
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template <typename It>
struct P : qi::grammar<It, std::string()>
{
P() : P::base_type(start) {
using namespace qi;
start = +char_("a-z") >> eps(px::back(_val) != 'a');
}
private:
qi::rule<It, std::string()> start;
};
#include <iomanip>
int main() {
using It = std::string::const_iterator;
P<It> const p;
for (std::string const input : { "", "b", "bb", "bbb", "aaab", "a", "bbba" }) {
std::cout << std::quoted(input) << ": ";
std::string out;
It f = input.begin(), l = input.end();
if (parse(f, l, p, out)) {
std::cout << std::quoted(out);
} else {
std::cout << "(failed) ";
}
if (f != l)
std::cout << " Remaining: " << std::quoted(std::string(f,l));
std::cout << "\n";
}
}
Prints
"": (failed)
"b": "b"
"bb": "bb"
"bbb": "bbb"
"aaab": "aaab"
"a": (failed) Remaining: "a"
"bbba": (failed) Remaining: "bbba"
BONUS
A more generic, albeit less efficient, approach would be to match the leading characters with a look-ahead assertion that it isn't the last character of its kind:
start = *(char_("a-z") >> &char_("a-z")) >> char_("b-z");
A benefit here is that no Phoenix usage is required:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename It>
struct P : qi::grammar<It, std::string()>
{
P() : P::base_type(start) {
using namespace qi;
start = *(char_("a-z") >> &char_("a-z")) >> char_("b-z");
}
private:
qi::rule<It, std::string()> start;
};
#include <iomanip>
int main() {
using It = std::string::const_iterator;
P<It> const p;
for (std::string const input : { "", "b", "bb", "bbb", "aaab", "a", "bbba" }) {
std::cout << std::quoted(input) << ": ";
std::string out;
It f = input.begin(), l = input.end();
if (parse(f, l, p, out)) {
std::cout << std::quoted(out);
} else {
std::cout << "(failed) ";
}
if (f != l)
std::cout << " Remaining: " << std::quoted(std::string(f,l));
std::cout << "\n";
}
}

Parsing into structs with containers

How can use boost.spirit x3 to parse into structs like:
struct person{
std::string name;
std::vector<std::string> friends;
}
Coming from boost.spirit v2 I would use a grammar but since X3 doesnt support grammars I have no idea how to do this clean.
EDIT: It would be nice if someone could help me writing a parser parsing a list of strings and returns a person with the first string is the name and the res of the strings are in the friends vector.
Parsing with x3 is much simpler than it was with v2, so you shouldn't have too much trouble moving over. Grammars being gone is a good thing!
Here's how you can parse into a vector of strings:
//#define BOOST_SPIRIT_X3_DEBUG
#include <fstream>
#include <iostream>
#include <string>
#include <type_traits>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
namespace x3 = boost::spirit::x3;
struct person
{
std::string name;
std::vector<std::string> friends;
};
BOOST_FUSION_ADAPT_STRUCT(
person,
(std::string, name)
(std::vector<std::string>, friends)
);
auto const name = x3::rule<struct name_class, std::string> { "name" }
= x3::raw[x3::lexeme[x3::alpha >> *x3::alnum]];
auto const root = x3::rule<struct person_class, person> { "person" }
= name >> *name;
int main(int, char**)
{
std::string const input = "bob john ellie";
auto it = input.begin();
auto end = input.end();
person p;
if (phrase_parse(it, end, root >> x3::eoi, x3::space, p))
{
std::cout << "parse succeeded" << std::endl;
std::cout << p.name << " has " << p.friends.size() << " friends." << std::endl;
}
else
{
std::cout << "parse failed" << std::endl;
if (it != end)
std::cout << "remaining: " << std::string(it, end) << std::endl;
}
return 0;
}
As you can see on Coliru, the output is :
parse succeeded
bob has 2 friends.

boost spirit qi parser failed in release and pass in debug

#include <boost/spirit/include/qi.hpp>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
#include <iostream>
using namespace boost::spirit;
int main()
{
std::string s;
std::getline(std::cin, s);
auto specialtxt = *(qi::char_('-', '.', '_'));
auto txt = no_skip[*(qi::char_("a-zA-Z0-9_.\\:$\'-"))];
auto anytxt = *(qi::char_("a-zA-Z0-9_.\\:${}[]+/()-"));
qi::rule <std::string::iterator, void(),ascii::space_type> rule2 = txt ('=') >> ('[') >> (']');
auto begin = s.begin();
auto end = s.end();
if (qi::phrase_parse(begin, end, rule2, ascii::space))
{
std::cout << "MATCH" << std::endl;
}
else
{
std::cout << "NO MATCH" << std::endl;
}
}
this code works fine in debug mode
parser fails in release mode
rule is to just parse text=[]; any thing else than this should fail it works fine in debug mode but not in release mode it shows result no match for any string.
if i enter string like
abc=[];
this passes in debug as expected but fails in release
You can't use auto with Spirit v2:
Assigning parsers to auto variables
You have Undefined Behaviour
DEMO
I tried to make (more) sense of the rest of the code. There were various instances that would never work:
txt('=') is an invalid Qi expression. I assumed you wanted txt >> ('=') instead
qi::char_("a-zA-Z0-9_.\\:$\\-{}[]+/()") doesn't do what you think because $-{ is actually the character "range" \x24-\x7b... Escape the - (or put it at the very end/start of the set like in the other char_ call).
qi::char_('-','.','_') can't work. Did you mean qi::char_("-._")?
specialtxt and anytxt were unused...
prefer const_iterator
prefer namespace aliases above using namespace to prevent hard-to-detect errors
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main() {
std::string const s = "abc=[];";
auto specialtxt = qi::copy(*(qi::char_("-._")));
auto anytxt = qi::copy(*(qi::char_("a-zA-Z0-9_.\\:$\\-{}[]+/()")));
(void) specialtxt;
(void) anytxt;
auto txt = qi::copy(qi::no_skip[*(qi::char_("a-zA-Z0-9_.\\:$\'-"))]);
qi::rule<std::string::const_iterator, qi::space_type> rule2 = txt >> '=' >> '[' >> ']';
auto begin = s.begin();
auto end = s.end();
if (qi::phrase_parse(begin, end, rule2, qi::space)) {
std::cout << "MATCH" << std::endl;
} else {
std::cout << "NO MATCH" << std::endl;
}
if (begin != end) {
std::cout << "Trailing unparsed: '" << std::string(begin, end) << "'\n";
}
}
Printing
MATCH
Trailing unparsed: ';'

How to extract trimmed text using Boost Spirit?

Using boost spirit, I'd like to extract a string that is followed by some data in parentheses. The relevant string is separated by a space from the opening parenthesis. Unfortunately, the string itself may contain spaces. I'm looking for a concise solution that returns the string without a trailing space.
The following code illustrates the problem:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;
void
test_input(const string &input)
{
string::const_iterator b = input.begin();
string::const_iterator e = input.end();
string parsed;
bool const r = qi::parse(b, e,
*(qi::char_ - qi::char_("(")) >> qi::lit("(Spirit)"),
parsed
);
if(r) {
cout << "PASSED:" << endl;
} else {
cout << "FAILED:" << endl;
}
cout << " Parsed: \"" << parsed << "\"" << endl;
cout << " Rest: \"" << string(b, e) << "\"" << endl;
}
int main()
{
test_input("Fine (Spirit)");
test_input("Hello, World (Spirit)");
return 0;
}
Its output is:
PASSED:
Parsed: "Fine "
Rest: ""
PASSED:
Parsed: "Hello, World "
Rest: ""
With this simple grammar, the extracted string is always followed by a space (that I 'd like to eliminate).
The solution should work within Spirit since this is only part of a larger grammar. (Thus, it would probably be clumsy to trim the extracted strings after parsing.)
Thank you in advance.
Like the comment said, in the case of a single space, you can just hard code it. If you need to be more flexible or tolerant:
I'd use a skipper with raw to "cheat" the skipper for your purposes:
bool const r = qi::phrase_parse(b, e,
qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
qi::space,
parsed
);
This works, and prints
PASSED:
Parsed: "Fine"
Rest: ""
PASSED:
Parsed: "Hello, World"
Rest: ""
See it Live on Coliru
Full program for reference:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;
void
test_input(const string &input)
{
string::const_iterator b = input.begin();
string::const_iterator e = input.end();
string parsed;
bool const r = qi::phrase_parse(b, e,
qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
qi::space,
parsed
);
if(r) {
cout << "PASSED:" << endl;
} else {
cout << "FAILED:" << endl;
}
cout << " Parsed: \"" << parsed << "\"" << endl;
cout << " Rest: \"" << string(b, e) << "\"" << endl;
}
int main()
{
test_input("Fine (Spirit)");
test_input("Hello, World (Spirit)");
return 0;
}