how does spirit process rules? - c++

This grammar standalone rule code produces the expected result
term = ( double_ >> "+" >> term ) [_val = _1 + _2]|
( double_ >> "-" >> term ) [_val = _1 - _2]|
( double_ >> "*" >> term ) [_val = _1 * _2]|
( double_ >> "/" >> term ) [_val = _1 / _2]|
double_ [_val = _1] ;
while this one does not:
term = ( term >> "+" >> term ) [_val = _1 + _2]|
( term >> "-" >> term ) [_val = _1 - _2]|
( term >> "*" >> term ) [_val = _1 * _2]|
( term >> "/" >> term ) [_val = _1 / _2]|
double_ [_val = _1] ;
I guess this has something to do with recursion ambiguity... What does the second rule tries to do when fed with: "1+2+3" ?
Is there some good document that schematically explains how spirit parsing is performed ? I mean as a pure c or algorithm, with no template or classes.
EDIT:
Actually I think that the second rule should fail at compiler time as it is ambiguous.

Spirit is a PEG parser:
Parsing expression grammar (wikipedia)
See also the About Page introduction on the http://boost-spirit.com site
Parser Expression Grammar in the documentation abstracts
Relevant quote:
Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG
So, no, the second example is not ambiguous at all, it just results in infinite recursion (--> stackoverflow).

Related

Boost::Spirit placeholders and alternative parser

// 1
Mexpression = Mterm >> *(
'+' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '+', _2)]
| '-' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '-', _2)]
);
Mterm = Mfactor >> *(
'*' >> Mfactor [qi::_val = phoenix::new_<BinaryNode>(_1, '*', _2)]
| '/' >> Mfactor [qi::_val = phoenix::new_<BinaryNode>(_1, '/', _2)]
);
Mfactor = Unpack
| '+' >> Mfactor [qi::_val = phoenix::new_<UnaryNode>('+', _1)]
| '-' >> Mfactor [qi::_val = phoenix::new_<UnaryNode>('-', _1)]
| '(' >> Mexpression >> ')';`
`Error 2 error C2664: 'BinaryNode::BinaryNode(const BinaryNode &)' : cannot convert argument 3 from 'boost::mpl::void_' to 'anExpression *' c:\boost_1_55_0\boost\spirit\home\phoenix\object\detail\new_eval.hpp 41 1 ConsoleApplication1
Error 1 error C2338: index_is_out_of_bounds c:\boost_1_55_0\boost\spirit\home\support\argument.hpp 103 1 ConsoleApplication1 `
And
c:\boost_1_55_0\boost\spirit\home\support\argument.hpp(166) : see reference to class template instantiation 'boost::spirit::result_of::get_arg<boost::fusion::vector1<Attribute &>,1>' being compiled with
[
Attribute=anExpression *
]
I'm coding a translator for a model language (there are several ebnf with main compositions given as a task.) and stuck somewhere at arithmetical operations.
(see 1 in paste)
here's a model to parse math' exprs,
unpack is somenode, something that can be executed, converted to anExpression *, and given as arg to BinaryNode
there are following rules.
qi::rule<Iterator, anExpression *()> Unpack;
qi::rule<Iterator, anExpression *()> Mexpression, Mterm, Mfactor;
anExpression is an abstract class (Binary and Unary are public anExpression)
while compiling the whole program I have following errors:
fig2
I think that error 2 is the most important thing to fix first.
something like this in build log
fig3
okay, I think that the mistake is in my way of semantic actions. I think there's not Mterm (or Mfactor) in _2 placeholder. there's something I'm doing wrong with this way of using semantics actions and alternative parser ( '|' )
I'll be glad to hear any ideas from you guys =)
Mexpression = Mterm >> *(
'+' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '+', _2)]
| '-' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '-', _2)]
);
Can't work indeed. You'd need to temporarily store the result attribute of the lhs Mterm. Luckily you should be able to use the result of the rule itself to do this:
Mexpression = Mterm [qi::_val = qi::_1] >> *(
'+' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(qi::_val, '+', qi::_1)]
| '-' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(qi::_val, '-', qi::_1)]
);
However you may have to accomodate this in the constructor for your BinaryNode type.
This said:
I tend avoid semantic actions. As long as you're imperatively spelling each and every step out in the semantic action, what is the real benefit of using a parser generator like Spirit? It's no longer for Rapid Application Development
especially semantic actions doing dynamic allocations; it's very eay to create leaks in the presence of parser backtracking
That said, this kind of expression combination is about the only point where I think semantic actions can still be considered idiomatic in Spirit. The dynamic allocations most certainly are not.

Trying to understand Boost Qi parsing into structs

I've got an embarrassingly simple problem that I can't seem to wrap my head around. I'm reading the boost documentation on how to parse into structs. The sample code provided for that chapter is straightforward - or so I thought. I would like to make a super simple change.
I want to split the start-rule:
start %=
lit("employee")
>> '{'
>> int_ >> ','
>> quoted_string >> ','
>> quoted_string >> ','
>> double_
>> '}'
;
...into two (or later more) rules, like this:
params %=
>> int_ >> ','
>> quoted_string >> ','
>> quoted_string >> ','
>> double_;
start %=
lit("employee")
>> '{'
>> params
>> '}'
;
No matter what I've tried I couldn't get it to parse values correctly into the employee struct. Even when I got a running program that recognized the input, the attributes didn't get written to the struct. It seems parsing only works correctly if everything is specified in the "top-level" rule. Surely, I'm mistaken?! I'll definitely need a more structured approach for the parser I actually need to implement.
Also I'm unclear what the correct type of the params rule should be. I'm thinking qi::rule<Iterator, fusion::vector<int, std::string, std::string, double>, ascii::space_type>, but my compiler didn't seem to like that very much...
I should mention that I'm working with Boost v1.46.1
In this situation, you could really just make params expose an employee attribute directly:
Live On Coliru
qi::rule<Iterator, employee(), ascii::space_type> params;

check doubled symbols with spirit::qi

I need to check text for doubled symbols. For example "1+1*2" should be ok, but "1**2+3" or "--1+4*3" should not. Consider part of spirit calc example.
expression =
term[_val=_1]
>> *( ('+' >> term[_val+=_1])
| ('-' >> term[_val-=_1])
);
term =
factor[_val=_1]
>> *( ('*' >> factor[_val*=_1])
| ('/' >> factor[_val/=_1])
);
factor =
double_[_val=_1]
| '(' >> expression[_val=_1] >> ')'
| ('-' >> factor[_val=_1])
| ('+' >> factor[_val=_1]);
phrase_parse returns true with the expressions like "1+++1" or "1**-1". I tried to use repeat like this:
term =
factor[_val=_1]
>> *( (repeat(0)[char_('*')] >> factor[_val*=_1])
| ('/' >> factor[_val/=_1])
);
But it doesnt help. What do i miss?
Thanks.
EDIT: Found an answer. One should compare string itrators after phrase_parse, but not phrase_parse output.
Found an answer. One should compare string itrators after phrase_parse, but not phrase_parse output.
In this case, '1+++++1' is parsing correctly since factor is recursively accepting +'s (probably intended to be a unary +/-).
Split that up
factor = ('-' >> value[_val=-_1])
| ('+' >> value[_val= _1])
| value [_val = _1];
value = double_ | '(' >> expression >> ')'

boost:spirit::qi and tab as delimeter

I'm newbie in boost. I have string delimeted with tab ( '\t' ).
How can i parse it with boost::spirit?
parser code from boost's samples
The boost sample code isn't the same as the actual boost sample, which was comma delimited, so presumably there are your modifications?
The ascii::space parser will handle the tabs for you as delimiters, so something like:
start %=
lit("employee")
>> '{'
>> int_ >>
>> quoted_string >>
>> quoted_string >>
>> double_
>> '}'
;
Should work (minus the 'lit('\t')'). But, this will also parse other spacing characters (e.g. space, tab).
If you actually need there to explicitly be single tabs ONLY between the terms, then leave in the lit('\t') and wrap it in a lexeme[] to disable skipping by the skip parser.

stopping parser when error found in semantic action

I wish to stop a token parser when the semantic action code finds a problem.
IF x > 10
is syntactically correct, but if x does not exist the the parser should stop
The grammar rule and semantic action look like this
condition
= ( tok.identifier >> tok.oper_ >> tok.value )
[
boost::phoenix::bind( &cRuleKit::AddCondition, &myRulekit,
boost::spirit::_1, boost::spirit::_2, boost::spirit::_3 )
]
;
So now I add a check for the existence of the identifier
condition
= ( tok.identifier[boost::bind(&cRuleKit::CheckIdentifier, &myRulekit, ::_1, ::_3 ) ]
>> tok.oper_ >> tok.value )
[
boost::phoenix::bind( &cRuleKit::AddCondition, &myRulekit,
boost::spirit::_1, boost::spirit::_2, boost::spirit::_3 )
]
;
This works!
I am not thrilled by the elegance. The grammar syntax is now hard to read and mixing use of boost::bind and boost::phoenix::bind is terribly confusing.
How can I improve it? I would like to get at the 'hit' parameter from phoenix::bind so that I can do the check inside cRuleKit::AddCondition() and so keep the grammar and actions seperate and avoid using boost::bind.
The answer is to use the placeholder _pass
condition
= ( tok.identifier >> tok.oper_ >> tok.value )
[
boost::phoenix::bind( &cRuleKit::AddCondition, &myRulekit,
boost::spirit::_pass, boost::spirit::_1, boost::spirit::_2, boost::spirit::_3 )
]
;
Spirit has a special value you can use in a semantic action to make the parse fail. It's called _pass and you should set it to false.
From some of my code:
variable_reference_impl_[_pass = lookup_symbol_(_1, false)][_val = _1]
in this case, lookup_symbol is a Phoenix functor that returns true if the symbol is found, false if not.