Spirit parser for boolean expression

Spirit parser for boolean expression - c++

I'm using spirit first time. I'm trying to write a boolean expression (with only &, | and ! operators) parser. I've defined my grammar like following:
template <typename Iterator>
struct boolean_expression_parser : qi::grammar<Iterator, std::string(), ascii::space_type>
{
boolean_expression_parser() : boolean_expression_parser::base_type(expr)
{
using namespace qi;
using ascii::char_;
using boost::spirit::ascii::alnum;
using namespace qi::labels;
using phoenix::construct;
using phoenix::val;
operand %= lexeme[+(alnum)];
simple_expr %= ('(' > expr > ')') | operand;
unary_expr %= ('!' > simple_expr ) ;
and_expr %= ( expr > '*' > expr);
or_expr %= (expr > '|' > expr);
expr %= simple_expr | unary_expr | *and_expr | *or_expr;
// on_error<fail>
// (
// unary_expr,
// std::cout
// << val("Error! Expecting ")
// << _4 // what failed?
// << val(" here: \"")
// << construct<std::string>(_3, _2) // iterators to error-pos, end
// << val("\"")
// << std::endl
// );
}
qi::rule<Iterator, std::string(), ascii::space_type> operand;
qi::rule<Iterator, std::string(), ascii::space_type> simple_expr;
qi::rule<Iterator, std::string(), ascii::space_type> unary_expr;
qi::rule<Iterator, std::string(), ascii::space_type> and_expr;
qi::rule<Iterator, std::string(), ascii::space_type> or_expr;
qi::rule<Iterator, std::string(), ascii::space_type> expr;
};
I'm facing few hurdles here:
It's not working for any binary expression (like 'A + B'). It's working fine for unary expressions (like '!(A)' or '(!A)'.
Can someone point me what I'm doing wrong?
I want store it in tree form (as I want to build a BDD out of it). Can someone point me how to do that?
Also, why on_error<> is not working even when I enable it?
I'm using boost 1.49 and gcc-4.2.2.
Regards,
~ Soumen

There are quite a lot problems with your parser. First of all, you meet a left-side recursion here, so the parser will crash with Stack Overflow. Your grammar should look like this:
expr = or_expr;
or_expr = and_expr >> -('|' > expr);
and_expr = unary_expr >> -('*' > expr);
unary_expr = ('!' > expr) | operand | ('(' >> expr > ')');
In this case you don't have left-recursion and everything parses.
Why your approach was failing? In your case:
input: A * B
1: expr
1.1: check simple_expr
-> fail at '(', try next
-> operand matches, return from simple_expr
matched, return.
So it should parse only A, and return without fail, but with input not fully parsed.
Also, the operator > you overused. Its purpose is to fail parsing if there is no match after it. On the other hand the operator >> returns and lets the parser check other possibilities.

Related

parsing interleaved lines with boost spirit

I am new to spirit and currently trying to parse an ini like file into a struct. Creating the grammar is ok, but the mapping generation is still some kind of magic to me. The file looks like this:
[fine]
#cmp1
#cmp2
muh=b
[fail]
#cmp1
a=b
#cmp2
it works as long as i have the requirements and properties ordered (section "fine") but I can't get it to work if the requirements and properties are interleaved (section "fail"). My struct definition looks like this:
typedef std::map<std::string, std::string> Pairs;
struct Section
{
std::string name;
std::vector<std::string> requirements;
Pairs properties;
};
BOOST_FUSION_ADAPT_STRUCT(
Section,
(std::string, name)
(std::vector<std::string>, requirements)
(Pairs, properties)
)
My current grammar looks like this:
template <typename Iterator, typename Skipper>
struct SectionParser : qi::grammar<Iterator, Section(), Skipper>
{
qi::rule<Iterator, Section(), Skipper> section;
qi::rule<Iterator, std::pair<std::string, std::string>(), Skipper> pair;
qi::rule<Iterator, std::vector<std::string>()> requirements;
qi::rule<Iterator, std::string()> key, value, sectionIdent, name, component;
SectionParser()
: SectionParser::base_type(section, "entity grammar")
{
using namespace qi;
sectionIdent = *(qi::char_ - (qi::lit('[') | qi::lit(']') | qi::eol));
name = *qi::eol > qi::lit('[') > sectionIdent > qi::lit(']') > (qi::eol | qi::eoi);
component = qi::char_('#') > *(qi::char_ - (qi::eol)) > (qi::eol | qi::eoi);
value = *(qi::char_ - (qi::eol | qi::eoi));
key = qi::char_("a-zA-Z_") > *qi::char_("a-zA-Z_0-9");
pair = key > qi::lit('=') > value > (qi::eol | qi::eoi);
section = name >> *component >> *pair;
}
};
Thats how I run the parser:
std::vector<Section> sections;
bool ok = phrase_parse(first, last, (sectionParser % +qi::eol) >> *qi::eol > qi::eoi, qi::blank, sections);
I also have the feeling that i made the line ending handling more complicated than it should be...

After reading parsing into several vector members i found a solution using only semantic actions.
The struct definition stays the same:
struct Section
{
std::string name;
std::vector<std::string> requirements;
Pairs properties;
};
dont use adapt struct anymore.
The grammar changes to:
template <typename Iterator, typename Skipper>
struct SectionParser : qi::grammar<Iterator, Section(), Skipper>
{
qi::rule<Iterator, Section(), Skipper> start;
qi::rule<Iterator, std::string()> value, ident, name, component;
qi::rule<Iterator, std::pair<std::string, std::string>()> pair;
SectionParser()
: SectionParser::base_type(start, "section grammar")
{
auto add_component = phx::push_back(phx::bind(&Section::requirements, qi::_val), qi::_1);
auto add_pair = phx::insert(phx::bind(&Section::properties, qi::_val), qi::_1);
auto set_name = phx::assign(phx::bind(&Section::name, qi::_val), qi::_1);
ident = +qi::char_("a-zA-Z0-9_");
component = qi::char_('#') > ident >> (qi::eol | qi::eoi);
value = *(qi::char_ - (qi::eol | qi::eoi));
pair = ident > qi::lit('=') > value >> (qi::eol | qi::eoi);
name = qi::lit('[') > ident > qi::lit(']') >> (qi::eol | qi::eoi);
start = name[set_name] >> *(component[add_component] | pair[add_pair]);
}
};

Create and write to vector on the fly

I want to create vector and append values to it (if any) in one spirit rule. Is it possible? I tried something like below but with no success. Read code comments for details please. Thanks.
typedef std::vector<double> number_array;
typedef std::vector<std::string> string_array;
typedef boost::variant<number_array, string_array> node
template<typename Iterator>
struct parser
: qi::grammar<Iterator, node(), ascii::space_type> {
parser(parser_impl* p)
: parser::base_type(expr_, ""),
error_handler(ErrorHandler(p)) {
// Here I want to create vector on the fly
// and append values to newly created vector.
// but this doesn't compile, specifically phoenix::push_back(...)
number_array_ = qi::lit('n[')[qi::_val = construct<number_array>()] >>
-(qi::double_ % ',')[phoenix::push_back(phoenix::ref(qi::_val), qi::_1)] >> ']';
// this doesn't compile too
string_array_ = qi::lit('s[')[qi::_val = construct<string_array>()] >>
-(quoted_string % ',')[phoenix::push_back(phoenix::ref(qi::_val), qi::_1)] >> ']';
quoted_string %= "'" > qi::lexeme[*(qi::char_ - "'")] > "'";
expr_ = number_array_[qi::_val = qi::_1] | string_array_[[qi::_val = qi::_1]];
}
qi::rule<Iterator, number_array(), ascii::space_type> number_array_;
qi::rule<Iterator, string_array(), ascii::space_type> string_array_;
qi::rule<Iterator, std::string(), ascii::space_type> quoted_string;
qi::rule<Iterator, node(), ascii::space_type> expr_;
};

Most important note here is that I think you can just do without all the semantic actions.
They only do what the default attribute rules already do (_val = _1 for scalar attributes, insert(_val, end(_val), _1) for conainer attributes, basically).
This means you can just write the whole shebang as
number_array_ = "n[" >> -(qi::double_ % ',') >> ']';
string_array_ = "s[" >> -(quoted_string % ',') >> ']';
quoted_string = "'" > qi::lexeme[*(qi::char_ - "'")] > "'";
expr_ = number_array_ | string_array_;
And this will work. Note that I fixed the multibyte literals 'n[' and 's[n'.
See also Boost Spirit: "Semantic actions are evil"?

Composite grammar in Boost::Spirit

I have the following grammar which works as expected.
struct query_term {
std::string term;
bool is_tag;
query_term(const std::string &a, bool tag = false): term(a), is_tag(tag) { } };
template<typename Iterator> struct query_grammar: grammar<Iterator, std::vector<query_term>(), space_type> {
query_grammar():
query_grammar::base_type(query) {
word %= +alnum;
tag = (omit[word >> ':'] >> word[_val = phoenix::construct<query_term>(_1, true)]);
non_tag = word[_val = phoenix::construct<query_term>(_1, false)];
query = (
(omit[word >> ':'] >> word[push_back(_val, phoenix::construct<query_term>(_1, true))])
|
word[push_back(_val,
phoenix::construct<query_term>(_1))
]
) % space;
};
qi::rule<Iterator, std::string(), space_type> word;
qi::rule<Iterator, query_term, space_type> tag;
qi::rule<Iterator, query_term, space_type> non_tag;
qi::rule<Iterator, std::vector<query_term>(), space_type> query; };
But when I replace query with
query = (
tag[phoenix::push_back(_val, _1)]
|
word[push_back(_val,
phoenix::construct<query_term>(_1))
]
) % space;
code doesn't compile. Basically I am trying to split the grammar into components that can be reused within larger grammar. When a word or tag is parsed, create a query_term object with appropriate flag in tag and word rule. Re-use these attributes in query rule.
In the previous version, tag and word rules are inlined in the query grammar.
I am not sure what I am missing here. Any help would be greatly appreciated.
FYI: This is not the final code. I am trying out the rules before using it in production code.
Thanx,
-- baliga

The real issue is that you define the attribute for the tag/non_tag rules as query_term (instead of query_term()).
Some minor issues appear to be:
using word instead of non_tag (exposes a std::string which doesn't convert to a query_type)
using % space with a space skipper doesn't really make sense
you probably wanted lexeme in the word rule because otherwise, it will just keep 'eating' chars regardless of whitespace
Other suggestions:
avoid excess scope of using namespace (or avoid it completely). You will run into hard-to-spot or hard-to-fix conflicts (e.g. boost::cref vs. std::cref, std::string vs. qi::string etc).
try to stay low on the Phoenix usage. In this case, I think you'd be far easier off using qi::attr with an adapted struct.
use BOOST_SPIRIT_DEBUG_* macros to get insight in your parser
Here is the entire grammar, the way I'd suggest it:
template<typename Iterator> struct query_grammar: qi::grammar<Iterator, std::vector<query_term>(), qi::space_type>
{
query_grammar() : query_grammar::base_type(query)
{
using namespace qi;
word = lexeme[ +alnum ];
tag = omit[word >> ':'] >> word >> attr(true);
non_tag = word >> attr(false);
query = *(tag | non_tag);
};
qi::rule<Iterator, std::string() , qi::space_type> word;
qi::rule<Iterator, query_term() , qi::space_type> tag, non_tag;
qi::rule<Iterator, std::vector<query_term>(), qi::space_type> query;
};
A fully working example with output (trivially onelined using karma):
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
struct query_term {
std::string term;
bool is_tag;
};
BOOST_FUSION_ADAPT_STRUCT(query_term, (std::string,term)(bool,is_tag));
template<typename Iterator> struct query_grammar: qi::grammar<Iterator, std::vector<query_term>(), qi::space_type>
{
query_grammar() : query_grammar::base_type(query)
{
using namespace qi;
word = lexeme[ +alnum ];
tag = omit[word >> ':'] >> word >> attr(true);
non_tag = word >> attr(false);
query = *(tag | non_tag);
BOOST_SPIRIT_DEBUG_NODE(word);
BOOST_SPIRIT_DEBUG_NODE(tag);
BOOST_SPIRIT_DEBUG_NODE(non_tag);
BOOST_SPIRIT_DEBUG_NODE(query);
};
qi::rule<Iterator, std::string() , qi::space_type> word;
qi::rule<Iterator, query_term() , qi::space_type> tag, non_tag;
qi::rule<Iterator, std::vector<query_term>(), qi::space_type> query;
};
int main()
{
const std::string input = "apple tag:beer banana grape";
typedef std::string::const_iterator It;
query_grammar<It> parser;
std::vector<query_term> data;
It f(input.begin()), l(input.end());
bool ok = qi::phrase_parse(f, l, parser, qi::space, data);
if (ok)
std::cout << karma::format(karma::delimit [ karma::auto_ ] % karma::eol, data) << '\n';
if (f!=l)
std::cerr << "Unparsed: '" << std::string(f,l) << "'\n";
return ok? 0 : 255;
}
Output:
apple false
beer true
banana false
grape false

Compilation error with a boost::spirit parser

I have a strange problem with a calculator made using boost::spirit. This calculator is supposed to take a string as argument representing a series of arithmetical expression separated by commas, like "a+4*5,77,(b-c)*4". It also allows the string "?" and returns the array containing a -1 in this case. The calculator is initialized with a SymTable, which is a template class argument to describe any class offering the [string] -> int operator (example: a map), to resolve the value of variables.
The following code works on my Ubuntu 10.4 with both gcc 4.6.2 and gcc 4.4, and both boost 1.47 and 1.48. It also worked in the past on a Cray Linux machine with gcc 4.5.3 and boost 1.47.
#include <boost/bind.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
namespace sp = boost::spirit;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace Damaris {
template <typename Iterator, typename SymTable>
struct Calc : qi::grammar<Iterator, std::vector<int>(), ascii::space_type>
{
qi::rule<Iterator, std::vector<int>(), ascii::space_type> start;
qi::rule<Iterator, int(), ascii::space_type> expr;
qi::rule<Iterator, int(), ascii::space_type> qmark;
qi::rule<Iterator, int(), ascii::space_type> factor;
qi::rule<Iterator, int(), ascii::space_type> simple;
qi::rule<Iterator, std::string(), ascii::space_type> identifier;
qi::rule<Iterator, int(SymTable), ascii::space_type> value;
/**
* \brief Constructor.
* \param[in] sym : table of symboles.
*/
Calc(SymTable &sym) : Calc::base_type(start)
{
identifier = qi::lexeme[( qi::alpha | '_') >> *( qi::alnum | '_')];
value = identifier[qi::_val = qi::labels::_r1[qi::_1]];
simple = ('(' >> expr >> ')')
| qi::int_
| value(boost::phoenix::ref(sym));
factor %= (simple >> '*' >> factor)[qi::_val = qi::_1 * qi::_2]
| (simple >> '/' >> factor)[qi::_val = qi::_1 / qi::_2]
| (simple >> '%' >> factor)[qi::_val = qi::_1 % qi::_2]
| simple;
expr %= (factor >> '+' >> expr)[qi::_val = qi::_1 + qi::_2]
| (factor >> '-' >> expr)[qi::_val = qi::_1 - qi::_2]
| factor;
qmark = qi::char_('?')[qi::_val = -1];
start = qmark
| (expr % ',');
}
};
}
Today I tried again compiling the same code on the Cray machine (which has been upgraded since then, I think), I tried with gcc 4.6.2 and gcc 4.5.2, and both with boost 1.48 and 1.49, and I always get the same compilation error that I don't understand :
/nics/b/home/mdorier/damaris-0.4/common/Calc.hpp:74:3: instantiated from 'Damaris::Calc<Iterator, SymTable>::Calc(SymTable&) [with Iterator = __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >, SymTable = Damaris::ParameterSet]'
/nics/b/home/mdorier/damaris-0.4/common/MetadataManager.cpp:45:79: instantiated from here
/nics/b/home/mdorier/deploy/include/boost/spirit/home/qi/detail/assign_to.hpp:123:13: error: invalid static_cast from type 'const boost::fusion::vector2<int, int>' to type 'int'
The line 74 in Calc.hpp corresponds to the line "factor = ...".
The instantiation line indicated (MetadataManager.cpp:45) is the following:
layoutInterp = new Calc<std::string::const_iterator,ParameterSet>(*parameters);
with layoutInterp being of type Calc* and parameters being of type ParameterSet*.
Any idea where this error comes from? Thanks

I'm pretty sure you might have been rearranging stuff in your rules. In fact, the %= auto-rule expression assignments won't work because the synthesized type of the parser expression doesn't resemble an int.
Basically, you'd change
factor %= (simple >> '*' >> factor)[ _val = _1 * _2 ]
| (simple >> '/' >> factor)[ _val = _1 / _2 ]
| (simple >> '%' >> factor)[ _val = _1 % _2 ]
| simple;
expr %= (factor >> '+' >> expr)[ _val = _1 + _2 ]
| (factor >> '-' >> expr)[ _val = _1 - _2 ]
| factor;
into
factor = (simple >> '*' >> factor)[ _val = _1 * _2 ]
| (simple >> '/' >> factor)[ _val = _1 / _2 ]
| (simple >> '%' >> factor)[ _val = _1 % _2 ]
| (simple) [_val = _1 ];
expr = (factor >> '+' >> expr)[ _val = _1 + _2 ]
| (factor >> '-' >> expr)[ _val = _1 - _2 ]
| (factor) [_val = _1 ];
I have fixed up some small issues and created a SSCCE of your post that works, as far as I can tell 1:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace sp = boost::spirit;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace karma = boost::spirit::karma;
namespace phx = boost::phoenix;
namespace Damaris {
template <typename Iterator, typename SymTable>
struct Calc : qi::grammar<Iterator, std::vector<int>(), ascii::space_type>
{
qi::rule<Iterator, std::vector<int>(), ascii::space_type> start;
qi::rule<Iterator, int(), ascii::space_type> expr;
qi::rule<Iterator, int(), ascii::space_type> qmark;
qi::rule<Iterator, int(), ascii::space_type> factor;
qi::rule<Iterator, int(), ascii::space_type> simple;
qi::rule<Iterator, std::string(), ascii::space_type> identifier;
qi::rule<Iterator, int(SymTable), ascii::space_type> value;
Calc(SymTable &sym) : Calc::base_type(start)
{
using namespace qi;
identifier = lexeme[( alpha | '_') >> *( alnum | '_')];
value = identifier[ _val = _r1[_1] ];
simple = ('(' >> expr >> ')')
| int_
| value(boost::phoenix::ref(sym));
factor = (simple >> '*' >> factor)[ _val = _1 * _2 ]
| (simple >> '/' >> factor)[ _val = _1 / _2 ]
| (simple >> '%' >> factor)[ _val = _1 % _2 ]
| (simple) [_val = _1 ];
expr = (factor >> '+' >> expr)[ _val = _1 + _2 ]
| (factor >> '-' >> expr)[ _val = _1 - _2 ]
| (factor) [_val = _1 ];
qmark = char_('?')[ _val = -1 ];
start = qmark
| (expr % ',');
BOOST_SPIRIT_DEBUG_NODE(start);
BOOST_SPIRIT_DEBUG_NODE(qmark);
BOOST_SPIRIT_DEBUG_NODE(expr);
BOOST_SPIRIT_DEBUG_NODE(factor);
BOOST_SPIRIT_DEBUG_NODE(simple);
BOOST_SPIRIT_DEBUG_NODE(value);
BOOST_SPIRIT_DEBUG_NODE(identifier);
}
};
}
int main(int argc, const char *argv[])
{
typedef std::map<std::string, int> SymTable;
SymTable symbols;
Damaris::Calc<std::string::const_iterator, SymTable> calc(symbols);
symbols["TheAnswerToLifeUniverse"] = 100;
symbols["Everything"] = -58;
std::string input = "3*4+5/4, TheAnswerToLifeUniverse + Everything";
std::string::const_iterator f(input.begin()), l(input.end());
std::vector<int> data;
if (qi::phrase_parse(f,l,calc,ascii::space,data))
std::cout << "output: " << karma::format(karma::int_ % ", " << karma::eol, data);
else
std::cout << "problem: '" << std::string(f,l) << "'\n";
return 0;
}
Output:
output: 13, 42
1 gcc 4.6.1, boost 1_48

Improve use of alternative parser

I extended the Mini XML example from the spirit manual.
The grammar describes a xml tag that can be closed with '/>' and has no child nodes or which is closed like in the example with a closing tag '' and can optionally have children.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
struct XmlTree;
typedef boost::variant<boost::recursive_wrapper<XmlTree>, std::string>
mini_xml_node;
typedef std::vector<mini_xml_node> Children;
struct XmlTree
{
std::string name;
Children childs;
};
BOOST_FUSION_ADAPT_STRUCT(
XmlTree,
(std::string, name)
(Children, childs)
)
typedef std::string::const_iterator Iterator;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phoenix = boost::phoenix;
class XmlParserGrammar : public qi::grammar<Iterator, XmlTree(), qi::locals<std::string*>, ascii::space_type>
{
public:
XmlParserGrammar() : XmlParserGrammar::base_type(xml, "xml")
{
using qi::lit;
using qi::lexeme;
using qi::attr;
using ascii::space;
using ascii::char_;
using ascii::alnum;
using phoenix::val;
xml %=
startTag[qi::_a = &qi::_1] >>
(
(
lit("/>") > attr(Children()) //can i remove this somehow?
)
|
(
lit(">")
>> *node_
> endTag(*qi::_a)
)
);
startTag %= '<' >> !lit('/') >> lexeme[ +(alnum - (space | '>' | "/>")) ] ;
node_ %= xml | text;
endTag = "</" > lit(qi::_r1) > '>';
text %= lexeme[+(char_ - '<')];
}
private:
qi::rule<Iterator, XmlTree(), qi::locals<std::string*>, ascii::space_type> xml;
qi::rule<Iterator, std::string(), ascii::space_type> startTag;
qi::rule<Iterator, mini_xml_node(), ascii::space_type> node_;
qi::rule<Iterator, void(std::string&), ascii::space_type> endTag;
qi::rule<Iterator, std::string(), ascii::space_type> text;
};
Is it possible to write this rule without the attr(Children()) tag? I think it is more or less a performance lag. I need it to avoid the optional attribute of the alternative parser.
If there are no child tags the attribute should only be an empty vector.

You should be able to write:
xml %= startTag[_a = &_1]
>> attributes
>> ( "/>" >> eps
| ">" >> *node > endTag(*_a)
)
;
That leaves the vector attribute unchanged (and empty).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Spirit parser for boolean expression - c++

Related

parsing interleaved lines with boost spirit

Create and write to vector on the fly

Composite grammar in Boost::Spirit

Compilation error with a boost::spirit parser

Improve use of alternative parser

Categories

Resources