Understanding of spirit grammar - c++

While going through the documentation I read that
for a string of doubles separated by a comma we could go like this (which I understand)
double_ >> * (',' >> double_) or double_ %
but what does the following expression mean. Its supposed to split comma separated strings into a vector and it works. I would appreciate it if someone could kindly clarify it. I am confused with - operator I believe its a difference operator but I cant figure out its role here
*(qi::char_ - ',') % ','

*(char_ - ',') means "match zero or more characters but ','", and it can also be written like this: *~char_(","). On the other hand, *char_ means just "match zero or more characters".
To understand, why the exclusion is needed, just try with and without it:
#include <string>
#include <boost/spirit/home/qi.hpp>
int main()
{
using namespace boost::spirit::qi;
std::vector<std::string> out1, out2;
std::string s = "str1, str2, str3";
bool b = parse(s.begin(), s.end(), *~char_(",") % ",", out1); // out1: ["str1", "str2", "str3"]
b = parse(s.begin(), s.end(), *char_ % ",", out2); // out2: ["str1, str2, str3"]
}

qi::char_ - ',' matches all characters but , to prevent the inner expression from being too greedy.

You really need to read EBNF standard to understand Boost.Spirit.

Related

Starting with Spirit X3

I've just started using Spirit X3 and I have a little question related with my first test. Do you know why this function is returning "false"?
bool parse()
{
std::string rc = "a 6 literal 8";
auto iter_begin = rc.begin();
auto iter_end = rc.end();
bool bOK= phrase_parse( iter_begin, iter_end,
// ----- start parser -----
alpha >> *alnum >> "literal" >> *alnum
// ----- end parser -----
, space);
return bOK && iter_begin == iter_end;
}
I've seen the problem is related with how I write the grammar. If I replace it with this one, it returns "true"
alpha >> -alnum >> "literal" >> *alnum
I'm using the Spirit version included in Boost 1.61.0.
Thanks in advance,
Sen
Your problem is a combination of the greediness of operator * and the use of a skipper. You need to keep in mind that alnum is a PrimitiveParser and that means that before every time this parser is tried, Spirit will pre-skip, and so the behaviour of your parser is:
alpha parses a.
The kleene operator starts.
alnum skips the space and then parses 6.
alnum skips the space and then parses l.
alnum parses i.
...
alnum parses l.
alnum skips the space and then parses 8.
alnum tries and fails to parse more. This completes the kleene operator with a parsed attribute of 6literal8.
"literal" tries and fails to parse.
The sequence operator fails and the invocation of phrase_parse returns false.
You can easily avoid this problem using the lexeme directive (barebones x3 docs, qi docs). Something like this should work:
alpha >> lexeme[*alnum] >> "literal" >> lexeme[*alnum];

Parsing a list of strings followed by a list of strings with spirit x3

I am trying to parse a string into a struct using boost spirit x3:
struct identifier {
std::vector<std::string> namespaces;
std::vector<std::string> classes;
std::string identifier;
};
now I have a parser rule to match a strings like this:
foo::bar::baz.bla.blub
foo.bar
boo::bar
foo
my parser rule looks like this.
auto const nested_identifier_def =
x3::lexeme[
-(id_string % "::")
>> -(id_string % ".")
>> id_string
];
where id_string parses combinations of alphanum.
I know this rule doesnt work to parse as I want it, because while parsing foo.bar for example this part of the rule -(id_string % ".") consumes the whole string.
How can i change the rule to parse correctly in the struct?
Assuming your id_string is something like this:
auto const id_string = x3::rule<struct id_string_tag, std::string>{} =
x3::lexeme[
(x3::alpha | '_')
>> *(x3::alnum | '_')
];
then I think this is what you're after:
auto const nested_identifier_def =
*(id_string >> "::")
>> *(id_string >> '.')
>> id_string;
Online Demo
The issue is that p % delimit is shorthand for p >> *(delimit >> p), i.e. it always consumes one p after the delimiter. However what you want is *(p >> delimit) so that no p is consumed after the delimiter and is instead left for the next rule.

how to use list syntax with defaults spirit

I am attempting to parse comma separated integers, with possible blanks. For instance, 1,2,,3,,-1 should be parsed as {1,2,n,3,n,-1} where is n is some constant.
The expression,
(int_ | eps) % ','
works when n == 0. More specifically, the following code works special cased for 0:
#include <boost/spirit/include/qi.hpp>
#include <iostream>
int main() {
using namespace boost::qi;
std::vector<int> v;
std::string s("1,2,,3,4,,-1");
phrase_parse(s.begin(), s.end(),
(int_|eps) % ','
, space, v);
}
I tried the following expression for arbitrary n:
(int_ | eps[_val = 3]) % ','
But apparently this is wrong. The compiler generates an error novel. I refrain from pasting all that here, as most likely what I am trying is incorrect (rather than specific compiler issues).
What would be the right way?
Nick
The attr() parser exists for this purpose:
(int_ | attr(3)) % ','

Parsing a string (with spaces) but ignoring the spaces at the end of the (Spirit)

I have an input string I'm trying to parse. It might look like either of the two:
sys(error1, 2.3%)
sys(error2 , 2.4%)
sys(this error , 3%)
Note the space sometimes before the comma. In my grammer (boost spirit library) I'd like to capture "error1", "error2", and "this error" respectively.
Here is the original grammar I had to capture this - which absorbed the space at the end of the name:
name_string %= lexeme[+(char_ - ',' - '"')];
name_string.name("Systematic Error Name");
start = (lit("sys")|lit("usys")) > '('
> name_string[boost::phoenix::bind(&ErrorValue::SetName, _val, _1)] > ','
> errParser[boost::phoenix::bind(&ErrorValue::CopyErrorAndRelative, _val, _1)]
> ')';
My attempt to fix this was first:
name_string %= lexeme[*(char_ - ',' - '"') > (char_ - ',' - '"' - ' ')];
however that completely failed. Looks like it failes to parse anything with a space in the middle.
I'm fairly new with Spirit - so perhaps I'm missing something simple. Looks like lexeme turns off skipping on the leading edge - I need something that does it on the leading and trailing edge.
Thanks in advance for any help!
Thanks to psur below, I was able to put together an answer. It isn't perfect (see below), but I thought I would update the post for everyone to see it in context and nicely formatted:
qi::rule<Iterator, std::string(), ascii::space_type> name_word;
qi::rule<Iterator, std::string(), ascii::space_type> name_string;
ErrorValueParser<Iterator> errParser;
name_word %= +(qi::char_("_a-zA-Z0-9+"));
//name_string %= lexeme[name_word >> *(qi::hold[+(qi::char_(' ')) >> name_word])];
name_string %= lexeme[+(qi::char_("-_a-zA-Z0-9+")) >> *(qi::hold[+(qi::char_(' ')) >> +(qi::char_("-_a-zA-Z0-9+"))])];
start = (
lit("sys")[bind(&ErrorValue::MakeCorrelated, _val)]
|lit("usys")[bind(&ErrorValue::MakeUncorrelated, _val)]
)
>> '('
>> name_string[bind(&ErrorValue::SetName, _val, _1)] >> *qi::lit(' ')
>> ','
>> errParser[bind(&ErrorValue::CopyErrorAndRelative, _val, _1)]
>> ')';
This works! They key to this is the name_string, and in it the qi::hold, a operator I was not familiar with before this. It is almost like a sub-rule: everything inside qi::hold[...] must successfully parse for it to go. So, above, it will only allow a space after a word if there is another word following. The result is that if a sequence of words end in a space(s), those last spaces will not be parsed! They can be absorbed by the *qi::lit(' ') that follows (see the start rule).
There are two things I'd like to figure out how to improve here:
It would be nice to put the actual string parsing into name_word. The problem is the declaration of name_word - it fails when it is put in the appropriate spot in the definition of name_string.
It would be even better if name_string could include the parsing of the trailing spaces, though its return value did not. I think I know how to do that...
When/if I figure these out I will update this post. Thanks for the help!
Below rules should work for you:
name_word %= +(qi::char_("_a-zA-Z0-9"));
start %= qi::lit("sys(")
>> qi::lexeme[ name_word >> *(qi::hold[ +(qi::char_(' ')) >> name_word ]) ]
>> *qi::lit(' ')
>> qi::lit(',')
// ...
name_word parse only one word in name; I assumed that it contains only letter, digits and underscore.
In start rule qi::hold is important. It will parse space only if next is name_word. In other case parser will rollback and move to *qi::lit(' ') and then to comma.

Parsing string, with Boost Spirit 2, to fill data in user defined struct

I'm using Boost.Spirit which was distributed with Boost-1.42.0 with VS2005. My problem is like this.
I've this string which was delimted with commas. The first 3 fields of it are strings and rest are numbers. like this.
String1,String2,String3,12.0,12.1,13.0,13.1,12.4
My rule is like this
qi::rule<string::iterator, qi::skip_type> stringrule = *(char_ - ',')
qi::rule<string::iterator, qi::skip_type> myrule= repeat(3)[*(char_ - ',') >> ','] >> (double_ % ',') ;
I'm trying to store the data in a structure like this.
struct MyStruct
{
vector<string> stringVector ;
vector<double> doubleVector ;
} ;
MyStruct var ;
I've wrapped it in BOOST_FUSION_ADAPT_STRUCTURE to use it with spirit.
BOOST_FUSION_ADAPT_STRUCT (MyStruct, (vector<string>, stringVector) (vector<double>, doubleVector))
My parse function parses the line and returns true and after
qi::phrase_parse (iterBegin, iterEnd, myrule, boost::spirit::ascii::space, var) ;
I'm expecting var.stringVector and var.doubleVector are properly filled. but it is not the case.
What is going wrong ?
The code sample is located here
Thanks in advance,
Surya
qi::skip_type is not something you could use a skipper. qi::skip_type is the type of the placeholder qi::skip, which is applicable for the skip[] directive only (to enable skipping inside a lexeme[] or to change skipper in use) and which is not a parser component matching any input on its own. You need to specify your specific skipper type instead (in your case that's boost::spirit::ascii:space_type).
Moreover, in order for your rules to return the parsed attribute, you need to specify the type of the expected attribute while defining your rule. That leaves you with:
qi::rule<string::iterator, std::string(), ascii:space_type>
stringrule = *(char_ - ',');
qi::rule<string::iterator, MyStruct(), ascii:space_type>
myrule = repeat(3)[*(char_ - ',') >> ','] >> (double_ % ',');
which should do exactly what you expect.