boost spirit qi parser failed in release and pass in debug - c++

#include <boost/spirit/include/qi.hpp>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
#include <iostream>
using namespace boost::spirit;
int main()
{
std::string s;
std::getline(std::cin, s);
auto specialtxt = *(qi::char_('-', '.', '_'));
auto txt = no_skip[*(qi::char_("a-zA-Z0-9_.\\:$\'-"))];
auto anytxt = *(qi::char_("a-zA-Z0-9_.\\:${}[]+/()-"));
qi::rule <std::string::iterator, void(),ascii::space_type> rule2 = txt ('=') >> ('[') >> (']');
auto begin = s.begin();
auto end = s.end();
if (qi::phrase_parse(begin, end, rule2, ascii::space))
{
std::cout << "MATCH" << std::endl;
}
else
{
std::cout << "NO MATCH" << std::endl;
}
}
this code works fine in debug mode
parser fails in release mode
rule is to just parse text=[]; any thing else than this should fail it works fine in debug mode but not in release mode it shows result no match for any string.
if i enter string like
abc=[];
this passes in debug as expected but fails in release

You can't use auto with Spirit v2:
Assigning parsers to auto variables
You have Undefined Behaviour
DEMO
I tried to make (more) sense of the rest of the code. There were various instances that would never work:
txt('=') is an invalid Qi expression. I assumed you wanted txt >> ('=') instead
qi::char_("a-zA-Z0-9_.\\:$\\-{}[]+/()") doesn't do what you think because $-{ is actually the character "range" \x24-\x7b... Escape the - (or put it at the very end/start of the set like in the other char_ call).
qi::char_('-','.','_') can't work. Did you mean qi::char_("-._")?
specialtxt and anytxt were unused...
prefer const_iterator
prefer namespace aliases above using namespace to prevent hard-to-detect errors
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main() {
std::string const s = "abc=[];";
auto specialtxt = qi::copy(*(qi::char_("-._")));
auto anytxt = qi::copy(*(qi::char_("a-zA-Z0-9_.\\:$\\-{}[]+/()")));
(void) specialtxt;
(void) anytxt;
auto txt = qi::copy(qi::no_skip[*(qi::char_("a-zA-Z0-9_.\\:$\'-"))]);
qi::rule<std::string::const_iterator, qi::space_type> rule2 = txt >> '=' >> '[' >> ']';
auto begin = s.begin();
auto end = s.end();
if (qi::phrase_parse(begin, end, rule2, qi::space)) {
std::cout << "MATCH" << std::endl;
} else {
std::cout << "NO MATCH" << std::endl;
}
if (begin != end) {
std::cout << "Trailing unparsed: '" << std::string(begin, end) << "'\n";
}
}
Printing
MATCH
Trailing unparsed: ';'

Related

Boost Spirit X3: skip parser that would do nothing

I'm getting myself familiarized with boost spirit v3. The question I want to ask is how to state the fact that you don't want to use skip parser in any way.
Consider a simple example of parsing comma-separated sequence of integers:
#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/home/x3.hpp>
int main()
{
using namespace boost::spirit::x3;
const std::string input{"2,4,5"};
const auto parser = int_ % ',';
std::vector<int> numbers;
auto start = input.cbegin();
auto r = phrase_parse(start, input.end(), parser, space, numbers);
if(r && start == input.cend())
{
// success
for(const auto &item: numbers)
std::cout << item << std::endl;
return 0;
}
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
This works totally fine. However, I would like to forbid having spaces in between (i.e. "2, 4,5" should not be parsed well).
I tried using eps as a skip parser in phrase_parse, but as you can guess, the program ended up in the infinite loop because eps matches to an empty string.
Solution I found is to use no_skip directive (https://www.boost.org/doc/libs/1_75_0/libs/spirit/doc/html/spirit/qi/reference/directive/no_skip.html). So the parser now becomes:
const auto parser = no_skip[int_ % ','];
This works fine, but I don't find it to be an elegant solution (especially providing "space" parser in phrase_parse when I want no whitespace skips). Are there no skip parsers that would simply do nothing? Am I missing something?
Thanks for Your time. Looking forward to any replies.
You can use either no_skip[] or lexeme[]. They're almost identical, except for pre-skip (Boost Spirit lexeme vs no_skip).
Are there no skip parsers that would simply do nothing? Am I missing something?
A wild guess, but you might be missing the parse API that doesn't accept a skipper in the first place
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
std::string const input{ "2,4,5" };
auto f = begin(input), l = end(input);
const auto parser = x3::int_ % ',';
std::vector<int> numbers;
auto r = parse(f, l, parser, numbers);
if (r) {
// success
for (const auto& item : numbers)
std::cout << item << std::endl;
} else {
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
if (f!=l) {
std::cout << "Remaining input " << std::quoted(std::string(f,l)) << "\n";
return 2;
}
}
Prints
2
4
5

Using too many alternative operator in boost::qi caused segmentation fault

I generated a simple code using qi::spirit:
#include <boost/spirit/include/qi.hpp>
#include <string>
using namespace std;
using namespace boost::spirit;
int main() {
string str = "string";
auto begin = str.begin();
auto symbols = (qi::lit(";") | qi::lit("(") | qi::lit(")") | qi::lit("+") |
qi::lit("/") | qi::lit("-") | qi::lit("*"));
qi::parse(begin, str.end(), *(qi::char_ - symbols));
}
And then this program was terminated by SEGV.Then,
my rewritten code with less alternative operators in rhs of symbols,
#include <boost/spirit/include/qi.hpp>
#include <string>
using namespace std;
using namespace boost::spirit;
int main()
{
string str = "string";
auto begin = str.begin();
auto symbols = (qi::lit(";") | qi::lit("+") | qi::lit("/") | qi::lit("-") |
qi::lit("*"));
qi::parse(begin, str.end(), *(qi::char_ - symbols));
}
now works well. What's the difference between 2 cases?
Your problem is a classic mistake: using auto to store Qi parser expressions: Assigning parsers to auto variables
That leads to UB.
Use a rule, or qi::copy (which is proto::deep_copy under the hooed).
auto symbols = qi::copy(qi::lit(";") | qi::lit("(") | qi::lit(")") | qi::lit("+") |
qi::lit("/") | qi::lit("-") | qi::lit("*"));
Even better, use a character set to match all the characters at once,
auto symbols = qi::copy(qi::omit(qi::char_(";()+/*-")));
The omit[] counteracts the fact that char_ exposes it's attribute (where lit doesn't). But since all you ever you it for is to SUBTRACT from another character-set:
qi::char_ - symbols
You could just as well just write
qi::char_ - qi::char_(";()+/*-")
Now. You might not know, but you can use ~charset to negate it, so it would just become
~qi::char_(";()+/*-")
NOTE - can have special meaning in charsets, which is why I very subtly move it to the end. See docs
Live Demo
Extending a little and showing some subtler patterns:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <string>
using namespace std;
using namespace boost::spirit;
int main() {
string const str = "string;some(thing) else + http://me#host:*/path-element.php";
auto cs = ";()+/*-";
using qi::char_;
{
std::vector<std::string> tokens;
qi::parse(str.begin(), str.end(), +~char_(cs) % +char_(cs), tokens);
std::cout << "Condensing: ";
for (auto& tok : tokens) {
std::cout << " " << std::quoted(tok);
}
std::cout << std::endl;
}
{
std::vector<std::string> tokens;
qi::parse(str.begin(), str.end(), *~char_(cs) % char_(cs), tokens);
std::cout << "Not condensing: ";
for (auto& tok : tokens) {
std::cout << " " << std::quoted(tok);
}
std::cout << std::endl;
}
}
Prints
Condensing: "string" "some" "thing" " else " " http:" "me#host:" "path" "element.php"
Not condensing: "string" "some" "thing" " else " " http:" "" "me#host:" "" "path" "element.php"
X3
If you have c++14, you can use Spirit X3, which doesn't have the "auto problem" (because it doesn't have Proto Expression trees that can get dangling references).
Your original code would have been fine in X3, and it will compile a lot faster.
Here's my example using X3:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
#include <string>
namespace x3 = boost::spirit::x3;
int main() {
std::string const str = "string;some(thing) else + http://me#host:*/path-element.php";
auto const cs = x3::char_(";()+/*-");
std::vector<std::string> tokens;
x3::parse(str.begin(), str.end(), +~cs % +cs, tokens);
//x3::parse(str.begin(), str.end(), *~cs % cs, tokens);
for (auto& tok : tokens) {
std::cout << " " << std::quoted(tok);
}
}
Printing
"string" "some" "thing" " else " " http:" "me#host:" "path" "element.php"
The thing is that using ( and closing it in double quotes is not really safe, it might get interpreted as something else entirely, you might want to try using / to escape it instead.

Spirit X3, How to fail parse on non-ascii input?

So the objective is to not tolerate characters from 80h through FFh in the input string. I was under the impression that
using ascii::char_;
would take care of this. But as you can see in the example code it will happily print Parsing succeeded.
In the following Spirit mailing list post, Joel suggested to let parse to fail on these non-ascii characters. But I'm not sure whether he proceeded in doing so.
[Spirit-general] ascii encoding assert on invalid input ...
Here my example code:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(char_ - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80" "bla bla bla\"" }) {
std::string output;
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "Parsing succeeded\n";
std::cout << "input: " << input << "\n";
std::cout << "output: " << output << "\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
How can I change the example to have Spirit to fail on this invalid input?
Furthermore, but very related, I would like to know how I should use the character parser that defines a char_set encoding. You know char_(charset) from X3 docs: Character Parsers develop branch.
The documentation is lacking so strongly to describe the basic functionality. Why can't the boost top level people force library authors to come with documentation at least on the level of cppreference.com?
Nothing bad about the docs here. It's just a library bug.
Where the code for any_char says:
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) || encoding::ischar(ch_));
}
It should have said
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) && encoding::ischar(ch_));
}
That makes your program behave as expected and required. That behaviour also matches the Qi behaviour:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
int main() {
namespace qi = boost::spirit::qi;
char const* input = "\x80";
assert(!qi::parse(input, input+1, qi::ascii::char_));
}
Filed a bug here: https://github.com/boostorg/spirit/issues/520
You can achieve that by using print parser:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::print;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(print - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80\"", "\"bla bla bla\"" }) {
std::string output;
std::cout << "input: " << input << "\n";
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "output: " << output << "\n";
std::cout << "Parsing succeeded\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
Output:
input: "naughty �"
Parsing failed
input: "bla bla bla"
output: "bla bla bla"
Parsing succeeded
https://wandbox.org/permlink/HSoB8uqMC3WME5yI
It is a surprising fact that for some reason the check for char_ is done only when the sizeof(iterator char type) > sizeof(char):
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <boost/core/demangle.hpp>
#include <typeinfo>
namespace x3 = boost::spirit::x3;
template <typename Char>
void test(Char const* str)
{
std::basic_string<Char> s = str;
std::cout << boost::core::demangle(typeid(Char).name()) << ":\t";
Char c;
auto it = s.begin();
if (x3::parse(it, s.end(), x3::ascii::char_, c) && it == s.end())
std::cout << "OK: " << int(c) << "\n";
else
std::cout << "Failed\n";
}
int main()
{
test("\x80");
test(L"\x80");
test(u8"\x80");
test(u"\x80");
test(U"\x80");
}
Output:
char: OK: -128
wchar_t: Failed
char8_t: OK: 128
char16_t: Failed
char32_t: Failed
https://wandbox.org/permlink/j9PQeRVnGZQeELFA

Boost.Spirit X3 parser "no type named type in(...)"

I was toying with Boost.Spirit X3 calculator example when I encountered an error I couldn't get my head around.
I minimized the program to reduce complexity still throwing the same error.
Say I want to parse an input as a list of statements (strings) followed by a delimiter (';').
This is my structure:
namespace client { namespace ast
{
struct program
{
std::list<std::string> stmts;
};
}}
BOOST_FUSION_ADAPT_STRUCT(client::ast::program,
(std::list<std::string>, stmts)
)
The grammar is as follows:
namespace client
{
namespace grammar
{
x3::rule<class program, ast::program> const program("program");
auto const program_def =
*((*char_) > ';')
;
BOOST_SPIRIT_DEFINE(
program
);
auto calculator = program;
}
using grammar::calculator;
}
Invoked
int
main()
{
std::cout <<"///////////////////////////////////////////\n\n";
std::cout << "Expression parser...\n\n";
std::cout << //////////////////////////////////////////////////\n\n";
std::cout << "Type an expression...or [q or Q] to quit\n\n";
typedef std::string::const_iterator iterator_type;
typedef client::ast::program ast_program;
std::string str;
while (std::getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q')
break;
auto& calc = client::calculator; // Our grammar
ast_program program; // Our program (AST)
iterator_type iter = str.begin();
iterator_type end = str.end();
boost::spirit::x3::ascii::space_type space;
bool r = phrase_parse(iter, end, calc, space, program);
if (r && iter == end)
{
std::cout << "-------------------------\n";
std::cout << "Parsing succeeded\n";
std::cout<< '\n';
std::cout << "-------------------------\n";
}
else
{
std::cout << "-------------------------\n";
std::cout << "Parsing failed\n";
std::cout << "-------------------------\n";
}
}
std::cout << "Bye... :-) \n\n";
return 0;
}
Error I get is
/opt/boost_1_66_0/boost/spirit/home/x3/support/traits/container_traits.hpp: In instantiation of ‘struct boost::spirit::x3::traits::container_value<client::ast::program, void>’:
.
.
.
/opt/boost_1_66_0/boost/spirit/home/x3/support/traits/container_traits.hpp:76:12: error: no type named ‘value_type’ in ‘struct client::ast::program’
struct container_value
/opt/boost_1_66_0/boost/spirit/home/x3/operator/detail/sequence.hpp:497:72: error: no type named ‘type’ in ‘struct boost::spirit::x3::traits::container_value<client::ast::program, void>’
, typename traits::is_substitute<attribute_type, value_type>::type());
^~~~~~
Things I tried:
Following Getting boost::spirit::qi to use stl containers
Even though it uses Qi I nonetheless tried:
namespace boost{namespace spirit{ namespace traits{
template<>
struct container_value<client::ast::program>
//also with struct container<client::ast::program, void>
{
typedef std::list<std::string> type;
};
}}}
You see I'm kinda in the dark, so expectably to no avail.
parser2.cpp:41:8: error: ‘container_value’ is not a class template
struct container_value<client::ast::program>
^~~~~~~~~~~~~~~
In the same SO question I author says "There is one known limitation though, when you try to use a struct that has a single element that is also a container compilation fails unless you add qi::eps >> ... to your rule."
I did try adding a dummy eps also without success.
Please, help me decipher what that error means.
Yup. This looks like another limitation with automatic propagation of attributes when single-element sequences are involved.
I'd probably bite the bullet and change the rule definition from what it is (and what you'd expect to work) to:
x3::rule<class program_, std::vector<std::string> >
That removes the root of the confusion.
Other notes:
you had char_ which also eats ';' so the grammar would never succeed because no ';' would follow a "statement".
your statements aren't lexeme, so whitespace is discarded (is this what you meant? See Boost spirit skipper issues)
your statement could be empty, which meant parsing would ALWAYS fail at the end of input (where it would always read an empty state, and then discover that the expected ';' was missing). Fix it by requiring at least 1 character before accepting a statement.
With some simplifications/style changes:
Live On Coliru
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <list>
namespace x3 = boost::spirit::x3;
namespace ast {
using statement = std::string;
struct program {
std::list<statement> stmts;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::program, stmts)
namespace grammar {
auto statement
= x3::rule<class statement_, ast::statement> {"statement"}
= +~x3::char_(';');
auto program
= x3::rule<class program_, std::list<ast::statement> > {"program"}
= *(statement >> ';');
}
#include <iostream>
#include <iomanip>
int main() {
std::cout << "Type an expression...or [q or Q] to quit\n\n";
using It = std::string::const_iterator;
for (std::string str; std::getline(std::cin, str);) {
if (str.empty() || str[0] == 'q' || str[0] == 'Q')
break;
auto &parser = grammar::program;
ast::program program; // Our program (AST)
It iter = str.begin(), end = str.end();
if (phrase_parse(iter, end, parser, x3::space, program)) {
std::cout << "Parsing succeeded\n";
for (auto& s : program.stmts) {
std::cout << "Statement: " << std::quoted(s, '\'') << "\n";
}
}
else
std::cout << "Parsing failed\n";
if (iter != end)
std::cout << "Remaining unparsed: " << std::quoted(std::string(iter, end), '\'') << "\n";
}
}
Which for input "a;b;c;d;" prints:
Parsing succeeded
Statement: 'a'
Statement: 'b'
Statement: 'c'
Statement: 'd'

Make Boost.Spirit parser to skip all spaces

#include <iostream>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main ()
{
std::string input(" aaa ");
std::string::iterator strbegin = input.begin();
std::string p;
qi::phrase_parse(strbegin, input.end(),
qi::lexeme[+qi::char_],
qi::space,
p);
std::cout << p << std::endl;
std::cout << p.size() << std::endl;
}
In this code parser assigns "aaa " to p. Why doesn't it skip all spaces? I expect p to be "aaa". How it can be fixed?
You are asking Spirit to emit the spaces with qi::lexeme. Compare: http://www.boost.org/doc/libs/1_55_0/libs/spirit/doc/html/spirit/qi/reference/directive/lexeme.html
The rule +(qi::char_ - qi::space) should do.