I generated a simple code using qi::spirit:
#include <boost/spirit/include/qi.hpp>
#include <string>
using namespace std;
using namespace boost::spirit;
int main() {
string str = "string";
auto begin = str.begin();
auto symbols = (qi::lit(";") | qi::lit("(") | qi::lit(")") | qi::lit("+") |
qi::lit("/") | qi::lit("-") | qi::lit("*"));
qi::parse(begin, str.end(), *(qi::char_ - symbols));
}
And then this program was terminated by SEGV.Then,
my rewritten code with less alternative operators in rhs of symbols,
#include <boost/spirit/include/qi.hpp>
#include <string>
using namespace std;
using namespace boost::spirit;
int main()
{
string str = "string";
auto begin = str.begin();
auto symbols = (qi::lit(";") | qi::lit("+") | qi::lit("/") | qi::lit("-") |
qi::lit("*"));
qi::parse(begin, str.end(), *(qi::char_ - symbols));
}
now works well. What's the difference between 2 cases?
Your problem is a classic mistake: using auto to store Qi parser expressions: Assigning parsers to auto variables
That leads to UB.
Use a rule, or qi::copy (which is proto::deep_copy under the hooed).
auto symbols = qi::copy(qi::lit(";") | qi::lit("(") | qi::lit(")") | qi::lit("+") |
qi::lit("/") | qi::lit("-") | qi::lit("*"));
Even better, use a character set to match all the characters at once,
auto symbols = qi::copy(qi::omit(qi::char_(";()+/*-")));
The omit[] counteracts the fact that char_ exposes it's attribute (where lit doesn't). But since all you ever you it for is to SUBTRACT from another character-set:
qi::char_ - symbols
You could just as well just write
qi::char_ - qi::char_(";()+/*-")
Now. You might not know, but you can use ~charset to negate it, so it would just become
~qi::char_(";()+/*-")
NOTE - can have special meaning in charsets, which is why I very subtly move it to the end. See docs
Live Demo
Extending a little and showing some subtler patterns:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <string>
using namespace std;
using namespace boost::spirit;
int main() {
string const str = "string;some(thing) else + http://me#host:*/path-element.php";
auto cs = ";()+/*-";
using qi::char_;
{
std::vector<std::string> tokens;
qi::parse(str.begin(), str.end(), +~char_(cs) % +char_(cs), tokens);
std::cout << "Condensing: ";
for (auto& tok : tokens) {
std::cout << " " << std::quoted(tok);
}
std::cout << std::endl;
}
{
std::vector<std::string> tokens;
qi::parse(str.begin(), str.end(), *~char_(cs) % char_(cs), tokens);
std::cout << "Not condensing: ";
for (auto& tok : tokens) {
std::cout << " " << std::quoted(tok);
}
std::cout << std::endl;
}
}
Prints
Condensing: "string" "some" "thing" " else " " http:" "me#host:" "path" "element.php"
Not condensing: "string" "some" "thing" " else " " http:" "" "me#host:" "" "path" "element.php"
X3
If you have c++14, you can use Spirit X3, which doesn't have the "auto problem" (because it doesn't have Proto Expression trees that can get dangling references).
Your original code would have been fine in X3, and it will compile a lot faster.
Here's my example using X3:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
#include <string>
namespace x3 = boost::spirit::x3;
int main() {
std::string const str = "string;some(thing) else + http://me#host:*/path-element.php";
auto const cs = x3::char_(";()+/*-");
std::vector<std::string> tokens;
x3::parse(str.begin(), str.end(), +~cs % +cs, tokens);
//x3::parse(str.begin(), str.end(), *~cs % cs, tokens);
for (auto& tok : tokens) {
std::cout << " " << std::quoted(tok);
}
}
Printing
"string" "some" "thing" " else " " http:" "me#host:" "path" "element.php"
The thing is that using ( and closing it in double quotes is not really safe, it might get interpreted as something else entirely, you might want to try using / to escape it instead.
Related
I'm getting myself familiarized with boost spirit v3. The question I want to ask is how to state the fact that you don't want to use skip parser in any way.
Consider a simple example of parsing comma-separated sequence of integers:
#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/home/x3.hpp>
int main()
{
using namespace boost::spirit::x3;
const std::string input{"2,4,5"};
const auto parser = int_ % ',';
std::vector<int> numbers;
auto start = input.cbegin();
auto r = phrase_parse(start, input.end(), parser, space, numbers);
if(r && start == input.cend())
{
// success
for(const auto &item: numbers)
std::cout << item << std::endl;
return 0;
}
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
This works totally fine. However, I would like to forbid having spaces in between (i.e. "2, 4,5" should not be parsed well).
I tried using eps as a skip parser in phrase_parse, but as you can guess, the program ended up in the infinite loop because eps matches to an empty string.
Solution I found is to use no_skip directive (https://www.boost.org/doc/libs/1_75_0/libs/spirit/doc/html/spirit/qi/reference/directive/no_skip.html). So the parser now becomes:
const auto parser = no_skip[int_ % ','];
This works fine, but I don't find it to be an elegant solution (especially providing "space" parser in phrase_parse when I want no whitespace skips). Are there no skip parsers that would simply do nothing? Am I missing something?
Thanks for Your time. Looking forward to any replies.
You can use either no_skip[] or lexeme[]. They're almost identical, except for pre-skip (Boost Spirit lexeme vs no_skip).
Are there no skip parsers that would simply do nothing? Am I missing something?
A wild guess, but you might be missing the parse API that doesn't accept a skipper in the first place
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
std::string const input{ "2,4,5" };
auto f = begin(input), l = end(input);
const auto parser = x3::int_ % ',';
std::vector<int> numbers;
auto r = parse(f, l, parser, numbers);
if (r) {
// success
for (const auto& item : numbers)
std::cout << item << std::endl;
} else {
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
if (f!=l) {
std::cout << "Remaining input " << std::quoted(std::string(f,l)) << "\n";
return 2;
}
}
Prints
2
4
5
So the objective is to not tolerate characters from 80h through FFh in the input string. I was under the impression that
using ascii::char_;
would take care of this. But as you can see in the example code it will happily print Parsing succeeded.
In the following Spirit mailing list post, Joel suggested to let parse to fail on these non-ascii characters. But I'm not sure whether he proceeded in doing so.
[Spirit-general] ascii encoding assert on invalid input ...
Here my example code:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(char_ - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80" "bla bla bla\"" }) {
std::string output;
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "Parsing succeeded\n";
std::cout << "input: " << input << "\n";
std::cout << "output: " << output << "\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
How can I change the example to have Spirit to fail on this invalid input?
Furthermore, but very related, I would like to know how I should use the character parser that defines a char_set encoding. You know char_(charset) from X3 docs: Character Parsers develop branch.
The documentation is lacking so strongly to describe the basic functionality. Why can't the boost top level people force library authors to come with documentation at least on the level of cppreference.com?
Nothing bad about the docs here. It's just a library bug.
Where the code for any_char says:
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) || encoding::ischar(ch_));
}
It should have said
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) && encoding::ischar(ch_));
}
That makes your program behave as expected and required. That behaviour also matches the Qi behaviour:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
int main() {
namespace qi = boost::spirit::qi;
char const* input = "\x80";
assert(!qi::parse(input, input+1, qi::ascii::char_));
}
Filed a bug here: https://github.com/boostorg/spirit/issues/520
You can achieve that by using print parser:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::print;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(print - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80\"", "\"bla bla bla\"" }) {
std::string output;
std::cout << "input: " << input << "\n";
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "output: " << output << "\n";
std::cout << "Parsing succeeded\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
Output:
input: "naughty �"
Parsing failed
input: "bla bla bla"
output: "bla bla bla"
Parsing succeeded
https://wandbox.org/permlink/HSoB8uqMC3WME5yI
It is a surprising fact that for some reason the check for char_ is done only when the sizeof(iterator char type) > sizeof(char):
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <boost/core/demangle.hpp>
#include <typeinfo>
namespace x3 = boost::spirit::x3;
template <typename Char>
void test(Char const* str)
{
std::basic_string<Char> s = str;
std::cout << boost::core::demangle(typeid(Char).name()) << ":\t";
Char c;
auto it = s.begin();
if (x3::parse(it, s.end(), x3::ascii::char_, c) && it == s.end())
std::cout << "OK: " << int(c) << "\n";
else
std::cout << "Failed\n";
}
int main()
{
test("\x80");
test(L"\x80");
test(u8"\x80");
test(u"\x80");
test(U"\x80");
}
Output:
char: OK: -128
wchar_t: Failed
char8_t: OK: 128
char16_t: Failed
char32_t: Failed
https://wandbox.org/permlink/j9PQeRVnGZQeELFA
I have a simple text file with the following contents:
VALUE "foo"
ANOTHERVALUE "bar"
YETANOTHERVALUE "barbar"
The values in column 1 are known.
I want to capture everything in column 1 and in column 2.
My solution involves manually writing all the possible values of column 1 (which is known), into the regex string but obviously this is not ideal practice since I am basically repeating code and this does not allow the ordering to be flexible:
const char* re =
"^[[:space:]]*"
"(VALUE)[[:space:]]*\"(.*)\"[[:space:]]*"
"(ANOTHERVALUE)[[:space:]]*\"(.*)\"[[:space:]]*"
"(YETANOTHERVALUE)[[:space:]]*\"(.*)\"[[:space:]]*";
I'm citing commenter Igor Tandetnik here, because he almost gave the complete answer in his comment:
Regular expressions capture exactly as many substrings as there are
left parentheses in the expression [...]
The right way to solve this problem is to write a regex that matches a
single pair, [...]
\s*([a-zA-Z]+)\s*"(.*?)"
Notes:
\s is equivalent to [[:space:]]
.*? is used to stop searching after the 2nd " instead of the last " in the string
and apply it repeatedly, e.g. via std::regex_iterator
The boost equivalent is boost::regex_iterator.
#include <iostream>
#include <string>
#include <algorithm>
#include <boost/regex.hpp>
const boost::regex expr{ R"__(\s*([a-zA-Z]+)\s*"(.*?)")__" };
const std::string s =
R"(VALUE "foo"
ANOTHERVALUE "bar"
YETANOTHERVALUE "barbar"
)";
int main() {
boost::sregex_iterator it{ begin(s), end(s), expr }, itEnd;
std::for_each( it, itEnd, []( const boost::smatch& m ){
std::cout << m[1] << '\n' << m[2] << std::endl;
});
}
Live demo.
Notes:
I'm using raw string literals to make the code cleaner.
I would use a little Spirit Parser here:
Reading Into A Map
Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp> // reading maps
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <iostream>
#include <fstream>
#include <map>
auto read_config_map(std::istream& stream) {
std::map<std::string, std::string> settings;
boost::spirit::istream_iterator f(stream >> std::noskipws), l;
using namespace boost::spirit::x3;
auto key_ = lexeme [ +upper ];
auto value_ = lexeme [ '"' >> *~char_('"') >> '"' ];
if (!phrase_parse(f, l, -(key_ >> value_) % eol >> eoi, blank, settings))
throw std::invalid_argument("cannot parse config map");
return settings;
}
auto read_config_map(std::string const& fname) {
std::ifstream stream(fname);
return read_config_map(stream);
}
int main() {
for (auto&& entry : read_config_map(std::cin))
std::cout << "Key:'" << entry.first << "' Value:'" << entry.second << "'\n";
}
Prints:
Key:'ANOTHERVALUE' Value:'bar'
Key:'VALUE' Value:'foo'
Key:'YETANOTHERVALUE' Value:'barbar'
How can use boost.spirit x3 to parse into structs like:
struct person{
std::string name;
std::vector<std::string> friends;
}
Coming from boost.spirit v2 I would use a grammar but since X3 doesnt support grammars I have no idea how to do this clean.
EDIT: It would be nice if someone could help me writing a parser parsing a list of strings and returns a person with the first string is the name and the res of the strings are in the friends vector.
Parsing with x3 is much simpler than it was with v2, so you shouldn't have too much trouble moving over. Grammars being gone is a good thing!
Here's how you can parse into a vector of strings:
//#define BOOST_SPIRIT_X3_DEBUG
#include <fstream>
#include <iostream>
#include <string>
#include <type_traits>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
namespace x3 = boost::spirit::x3;
struct person
{
std::string name;
std::vector<std::string> friends;
};
BOOST_FUSION_ADAPT_STRUCT(
person,
(std::string, name)
(std::vector<std::string>, friends)
);
auto const name = x3::rule<struct name_class, std::string> { "name" }
= x3::raw[x3::lexeme[x3::alpha >> *x3::alnum]];
auto const root = x3::rule<struct person_class, person> { "person" }
= name >> *name;
int main(int, char**)
{
std::string const input = "bob john ellie";
auto it = input.begin();
auto end = input.end();
person p;
if (phrase_parse(it, end, root >> x3::eoi, x3::space, p))
{
std::cout << "parse succeeded" << std::endl;
std::cout << p.name << " has " << p.friends.size() << " friends." << std::endl;
}
else
{
std::cout << "parse failed" << std::endl;
if (it != end)
std::cout << "remaining: " << std::string(it, end) << std::endl;
}
return 0;
}
As you can see on Coliru, the output is :
parse succeeded
bob has 2 friends.
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
#include <iostream>
using namespace boost::spirit;
int main()
{
std::string s;
std::getline(std::cin, s);
auto specialtxt = *(qi::char_('-', '.', '_'));
auto txt = no_skip[*(qi::char_("a-zA-Z0-9_.\\:$\'-"))];
auto anytxt = *(qi::char_("a-zA-Z0-9_.\\:${}[]+/()-"));
qi::rule <std::string::iterator, void(),ascii::space_type> rule2 = txt ('=') >> ('[') >> (']');
auto begin = s.begin();
auto end = s.end();
if (qi::phrase_parse(begin, end, rule2, ascii::space))
{
std::cout << "MATCH" << std::endl;
}
else
{
std::cout << "NO MATCH" << std::endl;
}
}
this code works fine in debug mode
parser fails in release mode
rule is to just parse text=[]; any thing else than this should fail it works fine in debug mode but not in release mode it shows result no match for any string.
if i enter string like
abc=[];
this passes in debug as expected but fails in release
You can't use auto with Spirit v2:
Assigning parsers to auto variables
You have Undefined Behaviour
DEMO
I tried to make (more) sense of the rest of the code. There were various instances that would never work:
txt('=') is an invalid Qi expression. I assumed you wanted txt >> ('=') instead
qi::char_("a-zA-Z0-9_.\\:$\\-{}[]+/()") doesn't do what you think because $-{ is actually the character "range" \x24-\x7b... Escape the - (or put it at the very end/start of the set like in the other char_ call).
qi::char_('-','.','_') can't work. Did you mean qi::char_("-._")?
specialtxt and anytxt were unused...
prefer const_iterator
prefer namespace aliases above using namespace to prevent hard-to-detect errors
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main() {
std::string const s = "abc=[];";
auto specialtxt = qi::copy(*(qi::char_("-._")));
auto anytxt = qi::copy(*(qi::char_("a-zA-Z0-9_.\\:$\\-{}[]+/()")));
(void) specialtxt;
(void) anytxt;
auto txt = qi::copy(qi::no_skip[*(qi::char_("a-zA-Z0-9_.\\:$\'-"))]);
qi::rule<std::string::const_iterator, qi::space_type> rule2 = txt >> '=' >> '[' >> ']';
auto begin = s.begin();
auto end = s.end();
if (qi::phrase_parse(begin, end, rule2, qi::space)) {
std::cout << "MATCH" << std::endl;
} else {
std::cout << "NO MATCH" << std::endl;
}
if (begin != end) {
std::cout << "Trailing unparsed: '" << std::string(begin, end) << "'\n";
}
}
Printing
MATCH
Trailing unparsed: ';'