I have a grammar that defines the following rules:
constantValue = qi::token(ID_FLOAT) | qi::token(ID_INTEGER);
postfixExpression = primaryExpression |
(postfixExpression >> qi::token(ID_OPENBRACKET) >> qi::token(ID_INTEGER) >> qi::token(ID_CLOSEBRACKET)) |
(postfixExpression >> qi::token(ID_DOT) >> qi::token(ID_IDENTIFIER));
primaryExpression = qi::token(ID_IDENTIFIER) |
constantValue |
(qi::token(ID_OPENPAREN) >> primaryExpression >> qi::token(ID_CLOSEPAREN));
ges = postfixExpression >> qi::eoi;
and I want it to match the following strings:
test[1]
testident.ident
and it should not match
test[1.2]
testident.5
but it fails to match the first 2 strings.
The lexer constructor is as follows:
custom_lexer()
: identifier("[a-zA-Z_][a-zA-Z0-9_]*")
, white_space("[ \\t\\n]+")
, integer_value("[1-9][0-9]*")
, hex_value("0[xX][0-9a-fA-F]+")
, float_value("[0-9]*\\.[0-9]+([eE][+-]?[0-9]+)?")
, float_value2("[0-9]+\\.([eE][+-]?[0-9]+)?")
, punctuator("&>|\\*\\*|\\*|\\+|-|~|!|\\/|%|<<|>>|<|>|<=|>=|==|!=|\\^|&|\\||\\^\\^|&&|\\|\\||\\?|:|,")// [ ] ( ) . &> ** * + - ~ ! / % << >> < > <= >= == != ^ & | ^^ && || ? : ,
{
using boost::spirit::lex::_start;
using boost::spirit::lex::_end;
this->self.add
(identifier, ID_IDENTIFIER)
/*(white_space, ID_WHITESPACE)*/
(integer_value, ID_INTEGER)
(hex_value, ID_INTEGER)
(float_value, ID_FLOAT)
(float_value2, ID_FLOAT)
("\\(", ID_OPENPAREN)
("\\)", ID_CLOSEPAREN)
("\\[", ID_OPENBRACKET)
("\\]", ID_CLOSEBRACKET)
("\\.", ID_DOT)
(punctuator, ID_PUNCTUATOR)
;
this->self("WS") = white_space;
}
Why don't I get a match for the mentioned strings?
Thank you
Tobias
I found the reason - I had to re-phrase the rule:
postfixExpression = primaryExpression >> *((qi::token(ID_OPENBRACKET) >> qi::token(ID_INTEGER) >> qi::token(ID_CLOSEBRACKET)) | (qi::token(ID_DOT) >> qi::token(ID_IDENTIFIER)));
I don't know why it's necessary, but now it seems to work.
Related
I am trying to parse a string into a struct using boost spirit x3:
struct identifier {
std::vector<std::string> namespaces;
std::vector<std::string> classes;
std::string identifier;
};
now I have a parser rule to match a strings like this:
foo::bar::baz.bla.blub
foo.bar
boo::bar
foo
my parser rule looks like this.
auto const nested_identifier_def =
x3::lexeme[
-(id_string % "::")
>> -(id_string % ".")
>> id_string
];
where id_string parses combinations of alphanum.
I know this rule doesnt work to parse as I want it, because while parsing foo.bar for example this part of the rule -(id_string % ".") consumes the whole string.
How can i change the rule to parse correctly in the struct?
Assuming your id_string is something like this:
auto const id_string = x3::rule<struct id_string_tag, std::string>{} =
x3::lexeme[
(x3::alpha | '_')
>> *(x3::alnum | '_')
];
then I think this is what you're after:
auto const nested_identifier_def =
*(id_string >> "::")
>> *(id_string >> '.')
>> id_string;
Online Demo
The issue is that p % delimit is shorthand for p >> *(delimit >> p), i.e. it always consumes one p after the delimiter. However what you want is *(p >> delimit) so that no p is consumed after the delimiter and is instead left for the next rule.
// 1
Mexpression = Mterm >> *(
'+' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '+', _2)]
| '-' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '-', _2)]
);
Mterm = Mfactor >> *(
'*' >> Mfactor [qi::_val = phoenix::new_<BinaryNode>(_1, '*', _2)]
| '/' >> Mfactor [qi::_val = phoenix::new_<BinaryNode>(_1, '/', _2)]
);
Mfactor = Unpack
| '+' >> Mfactor [qi::_val = phoenix::new_<UnaryNode>('+', _1)]
| '-' >> Mfactor [qi::_val = phoenix::new_<UnaryNode>('-', _1)]
| '(' >> Mexpression >> ')';`
`Error 2 error C2664: 'BinaryNode::BinaryNode(const BinaryNode &)' : cannot convert argument 3 from 'boost::mpl::void_' to 'anExpression *' c:\boost_1_55_0\boost\spirit\home\phoenix\object\detail\new_eval.hpp 41 1 ConsoleApplication1
Error 1 error C2338: index_is_out_of_bounds c:\boost_1_55_0\boost\spirit\home\support\argument.hpp 103 1 ConsoleApplication1 `
And
c:\boost_1_55_0\boost\spirit\home\support\argument.hpp(166) : see reference to class template instantiation 'boost::spirit::result_of::get_arg<boost::fusion::vector1<Attribute &>,1>' being compiled with
[
Attribute=anExpression *
]
I'm coding a translator for a model language (there are several ebnf with main compositions given as a task.) and stuck somewhere at arithmetical operations.
(see 1 in paste)
here's a model to parse math' exprs,
unpack is somenode, something that can be executed, converted to anExpression *, and given as arg to BinaryNode
there are following rules.
qi::rule<Iterator, anExpression *()> Unpack;
qi::rule<Iterator, anExpression *()> Mexpression, Mterm, Mfactor;
anExpression is an abstract class (Binary and Unary are public anExpression)
while compiling the whole program I have following errors:
fig2
I think that error 2 is the most important thing to fix first.
something like this in build log
fig3
okay, I think that the mistake is in my way of semantic actions. I think there's not Mterm (or Mfactor) in _2 placeholder. there's something I'm doing wrong with this way of using semantics actions and alternative parser ( '|' )
I'll be glad to hear any ideas from you guys =)
Mexpression = Mterm >> *(
'+' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '+', _2)]
| '-' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(_1, '-', _2)]
);
Can't work indeed. You'd need to temporarily store the result attribute of the lhs Mterm. Luckily you should be able to use the result of the rule itself to do this:
Mexpression = Mterm [qi::_val = qi::_1] >> *(
'+' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(qi::_val, '+', qi::_1)]
| '-' >> Mterm [qi::_val = phoenix::new_<BinaryNode>(qi::_val, '-', qi::_1)]
);
However you may have to accomodate this in the constructor for your BinaryNode type.
This said:
I tend avoid semantic actions. As long as you're imperatively spelling each and every step out in the semantic action, what is the real benefit of using a parser generator like Spirit? It's no longer for Rapid Application Development
especially semantic actions doing dynamic allocations; it's very eay to create leaks in the presence of parser backtracking
That said, this kind of expression combination is about the only point where I think semantic actions can still be considered idiomatic in Spirit. The dynamic allocations most certainly are not.
This grammar standalone rule code produces the expected result
term = ( double_ >> "+" >> term ) [_val = _1 + _2]|
( double_ >> "-" >> term ) [_val = _1 - _2]|
( double_ >> "*" >> term ) [_val = _1 * _2]|
( double_ >> "/" >> term ) [_val = _1 / _2]|
double_ [_val = _1] ;
while this one does not:
term = ( term >> "+" >> term ) [_val = _1 + _2]|
( term >> "-" >> term ) [_val = _1 - _2]|
( term >> "*" >> term ) [_val = _1 * _2]|
( term >> "/" >> term ) [_val = _1 / _2]|
double_ [_val = _1] ;
I guess this has something to do with recursion ambiguity... What does the second rule tries to do when fed with: "1+2+3" ?
Is there some good document that schematically explains how spirit parsing is performed ? I mean as a pure c or algorithm, with no template or classes.
EDIT:
Actually I think that the second rule should fail at compiler time as it is ambiguous.
Spirit is a PEG parser:
Parsing expression grammar (wikipedia)
See also the About Page introduction on the http://boost-spirit.com site
Parser Expression Grammar in the documentation abstracts
Relevant quote:
Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG
So, no, the second example is not ambiguous at all, it just results in infinite recursion (--> stackoverflow).
I need to check text for doubled symbols. For example "1+1*2" should be ok, but "1**2+3" or "--1+4*3" should not. Consider part of spirit calc example.
expression =
term[_val=_1]
>> *( ('+' >> term[_val+=_1])
| ('-' >> term[_val-=_1])
);
term =
factor[_val=_1]
>> *( ('*' >> factor[_val*=_1])
| ('/' >> factor[_val/=_1])
);
factor =
double_[_val=_1]
| '(' >> expression[_val=_1] >> ')'
| ('-' >> factor[_val=_1])
| ('+' >> factor[_val=_1]);
phrase_parse returns true with the expressions like "1+++1" or "1**-1". I tried to use repeat like this:
term =
factor[_val=_1]
>> *( (repeat(0)[char_('*')] >> factor[_val*=_1])
| ('/' >> factor[_val/=_1])
);
But it doesnt help. What do i miss?
Thanks.
EDIT: Found an answer. One should compare string itrators after phrase_parse, but not phrase_parse output.
Found an answer. One should compare string itrators after phrase_parse, but not phrase_parse output.
In this case, '1+++++1' is parsing correctly since factor is recursively accepting +'s (probably intended to be a unary +/-).
Split that up
factor = ('-' >> value[_val=-_1])
| ('+' >> value[_val= _1])
| value [_val = _1];
value = double_ | '(' >> expression >> ')'
I tried to modify the mini_c example of boost::spirit to match to my existing vocabulary.
I therefore added a operator "NOT that should behave equal as "!":
unary_expr =
primary_expr
| ("NOT" > primary_expr [op(op_not)]) // This does not work
| ('!' > primary_expr [op(op_not)])
| ('-' > primary_expr [op(op_neg)])
| ('+' > primary_expr)
;
I can compile the modified source code, but when i try to execute it it fails to parse. How can i solve this?
EDIT:
As my want to access external variables, i had made another modification in order to build a list of these variables when compiling:
identifier %=
raw[lexeme[alpha >> *(alnum | 'ยง' | '_' | '.' | '-' )]]
;
variable =
identifier [add_var(_1)]
;
Where add_var and identifier are defined as
rule<Iterator, std::string(), white_space> identifier;
function<var_adder> add_var;
If i don't use this modification, "NOT" can be used. With the modification, using "NOT" generates a parsing error.
EDIT 2:
The following conditional expressions do work though:
logical_expr =
relational_expr
>> *( ("AND" > relational_expr [op(op_and)])
| ("OR" > relational_expr [op(op_or)])
)
;
With your change the small test:
int main()
{
return NOT 1;
}
parses successfully and returns 0. So it is not obvious to me what doesn't work for you. Could you provide a failing input example as well, please?