I was attempting to replicate this example in order to implement C++ like operator precedence rules (I started with a subset, but I eventually plan to add the others).
Try as I might, I could not get the grammar to parse a single binary operation. It would parse literals (44, 3.42, "stackoverflow") just fine, but would fail anything like 3 + 4.
I did look at this question, and this one in an attempt to get my solution to work, but got the same result.
(In an attempt to keep things short, I'll post only the relevant bits here, the full code is here)
Relevant data structures for the AST:
enum class BinaryOperator
{
ADD, SUBTRACT, MULTIPLY, DIVIDE, MODULO, LEFT_SHIFT, RIGHT_SHIFT, EQUAL, NOT_EQUAL, LOWER, LOWER_EQUAL, GREATER, GREATER_EQUAL,
};
typedef boost::variant<double, int, std::string> Litteral;
struct Identifier { std::string name; };
typedef boost::variant<
Litteral,
Identifier,
boost::recursive_wrapper<UnaryOperation>,
boost::recursive_wrapper<BinaryOperation>,
boost::recursive_wrapper<FunctionCall>
> Expression;
struct BinaryOperation
{
Expression rhs, lhs;
BinaryOperator op;
BinaryOperation() {}
BinaryOperation(Expression rhs, BinaryOperator op, Expression lhs) : rhs(rhs), op(op), lhs(lhs) {}
};
The grammar:
template<typename Iterator, typename Skipper>
struct BoltGrammar : qi::grammar<Iterator, Skipper, Program()>
{
BoltGrammar() : BoltGrammar::base_type(start, "start")
{
equalOp.add("==", BinaryOperator::EQUAL)("!=", BinaryOperator::NOT_EQUAL);
equal %= (lowerGreater >> equalOp >> lowerGreater);
equal.name("equal");
lowerGreaterOp.add("<", BinaryOperator::LOWER)("<=", BinaryOperator::LOWER_EQUAL)(">", BinaryOperator::GREATER)(">=", BinaryOperator::GREATER_EQUAL);
lowerGreater %= (shift >> lowerGreaterOp >> shift);
lowerGreater.name("lower or greater");
shiftOp.add("<<", BinaryOperator::LEFT_SHIFT)(">>", BinaryOperator::RIGHT_SHIFT);
shift %= (addSub >> shiftOp >> addSub);
shift.name("shift");
addSubOp.add("+", BinaryOperator::ADD)("-", BinaryOperator::SUBTRACT);
addSub %= (multDivMod >> addSubOp >> multDivMod);
addSub.name("add or sub");
multDivModOp.add("*", BinaryOperator::MULTIPLY)("/", BinaryOperator::DIVIDE)("%", BinaryOperator::MODULO);
multDivMod %= (value >> multDivModOp >> value);
multDivMod.name("mult, div, or mod");
value %= identifier | litteral | ('(' > expression > ')');
value.name("value");
start %= qi::eps >> *(value >> qi::lit(';'));
start.name("start");
expression %= identifier | litteral | equal;
expression.name("expression");
identifier %= qi::lexeme[ascii::char_("a-zA-Z") >> *ascii::char_("0-9a-zA-Z")];
identifier.name("identifier");
litteral %= qi::double_ | qi::int_ | quotedString;
litteral.name("litteral");
quotedString %= qi::lexeme['"' >> +(ascii::char_ - '"') >> '"'];
quotedString.name("quoted string");
namespace phx = boost::phoenix;
using namespace qi::labels;
qi::on_error<qi::fail>(start, std::cout << phx::val("Error! Expecting: ") << _4 << phx::val(" here: \"") << phx::construct<std::string>(_3, _2) << phx::val("\"") << std::endl);
}
qi::symbols<char, BinaryOperator> equalOp, lowerGreaterOp, shiftOp, addSubOp, multDivModOp;
qi::rule<Iterator, Skipper, BinaryOperation()> equal, lowerGreater, shift, addSub, multDivMod;
qi::rule<Iterator, Skipper, Expression()> value;
qi::rule<Iterator, Skipper, Program()> start;
qi::rule<Iterator, Skipper, Expression()> expression;
qi::rule<Iterator, Skipper, Identifier()> identifier;
qi::rule<Iterator, Skipper, Litteral()> litteral;
qi::rule<Iterator, Skipper, std::string()> quotedString;
};
The main problem (indeed) appears to be addressed in that second answer you linked to.
Let me address some points:
the main problem was was compound:
your start rule is
start %= qi::eps >> *(value >> qi::lit(';'));
this means it expects values:
value %= identifier | literal | ('(' > expression > ')');
however, since this parses only identifiers and literals or parenthesized subexpressions, the 3+4 binary operation will never be parsed.
your expression rule, again allows identifier or literal first (redundant/confusing):
expression %= identifier | literal | equal;
I think you'd want something more like
expression = '(' >> expression >> ')' | equal | value;
value = identifier | literal;
// and then
start = qi::eps >> -expression % ';';
your BinaryOperation productions allow only for the case where the operator is present; this breaks the way the rules are nested for operator precedence: a multDivOp would never be accepted as match, unless it happens to be followed by an addSubOp:
addSub %= (multDivMod >> addSubOp >> multDivMod);
multDivMod %= (value >> multDivModOp >> value);
This can best be fixed as shown in the linked answer:
addSub = multDivMod >> -(addSubOp >> multDivMod);
multDivMod = value >> -(multDivModOp >> value);
where you can use semantic actions to build the AST nodes "dynamically":
addSub = multDivMod >> -(addSubOp >> multDivMod) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
multDivMod = value >> -(multDivModOp >> value) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
This beats the 'tedious" declarative approach hands-down (which leads to a lot of backtracking, see e.g. Boost spirit poor performance with Alternative parser)
the literal rule will parse an integer as a double, because it isn't strict:
literal %= qi::double_ | qi::int_ | quotedString;
you can fix this like:
qi::real_parser<double, qi::strict_real_policies<double> > strict_double;
literal = quotedString | strict_double | qi::int_;
FunctionCall should adapt functionName as an Identifier (not std::string)
BOOST_FUSION_ADAPT_STRUCT(FunctionCall, (Identifier, functionName)(std::vector<Expression>, args))
You Expression operator<< could (should) be a boost::static_visitor so that you
eliminate magic type switch numbers
get compiler checking of completeness of the switch
can leverage overload resolution to switch on variant member types
Using c++11, the code could still be inside the one function:
std::ostream& operator<<(std::ostream& os, const Expression& expr)
{
os << "Expression ";
struct v : boost::static_visitor<> {
v(std::ostream& os) : os(os) {}
std::ostream& os;
void operator()(Literal const& e) const { os << "(literal: " << e << ")"; }
void operator()(Identifier const& e) const { os << "(identifier: " << e.name << ")"; }
void operator()(UnaryOperation const& e) const { os << "(unary op: " << boost::fusion::as_vector(e) << ")"; }
void operator()(BinaryOperation const& e) const { os << "(binary op: " << boost::fusion::as_vector(e) << ")"; }
void operator()(FunctionCall const& e) const {
os << "(function call: " << e.functionName << "(";
if (e.args.size() > 0) os << e.args.front();
for (auto it = e.args.begin() + 1; it != e.args.end(); it++) { os << ", " << *it; }
os << ")";
}
};
boost::apply_visitor(v(os), expr);
return os;
}
you can use the BOOST_SPIRIT_DEBUG_NODES macro to name your rules
BOOST_SPIRIT_DEBUG_NODES(
(start)(expression)(identifier)(literal)(quotedString)
(equal)(lowerGreater)(shift)(addSub)(multDivMod)(value)
)
you should include from the spirit/include/ directory, which then relays to spirit/home/ or phoenix/include/ instead of including them directly.
Here is a fully working sample, that also improved the grammar rules for readability Live On Coliru:
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/variant.hpp>
#include <iostream>
#include <string>
#include <vector>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
namespace ascii = boost::spirit::ascii;
enum class UnaryOperator
{
NOT,
PLUS,
MINUS,
};
std::ostream& operator<<(std::ostream& os, const UnaryOperator op)
{
switch (op)
{
case UnaryOperator::NOT: return os << "!";
case UnaryOperator::PLUS: return os << "+";
case UnaryOperator::MINUS: return os << "-";
}
assert(false);
}
enum class BinaryOperator
{
ADD, SUBTRACT, MULTIPLY, DIVIDE,
MODULO,
LEFT_SHIFT, RIGHT_SHIFT,
EQUAL, NOT_EQUAL,
LOWER, LOWER_EQUAL,
GREATER, GREATER_EQUAL,
};
std::ostream& operator<<(std::ostream& os, const BinaryOperator op)
{
switch (op)
{
case BinaryOperator::ADD: return os << "+";
case BinaryOperator::SUBTRACT: return os << "-";
case BinaryOperator::MULTIPLY: return os << "*";
case BinaryOperator::DIVIDE: return os << "/";
case BinaryOperator::MODULO: return os << "%";
case BinaryOperator::LEFT_SHIFT: return os << "<<";
case BinaryOperator::RIGHT_SHIFT: return os << ">>";
case BinaryOperator::EQUAL: return os << "==";
case BinaryOperator::NOT_EQUAL: return os << "!=";
case BinaryOperator::LOWER: return os << "<";
case BinaryOperator::LOWER_EQUAL: return os << "<=";
case BinaryOperator::GREATER: return os << ">";
case BinaryOperator::GREATER_EQUAL: return os << ">=";
}
assert(false);
}
typedef boost::variant<
double,
int,
std::string
> Literal;
struct Identifier
{
std::string name;
};
BOOST_FUSION_ADAPT_STRUCT(Identifier, (std::string, name))
struct UnaryOperation;
struct BinaryOperation;
struct FunctionCall;
typedef boost::variant<
Literal,
Identifier,
boost::recursive_wrapper<UnaryOperation>,
boost::recursive_wrapper<BinaryOperation>,
boost::recursive_wrapper<FunctionCall>
> Expression;
struct UnaryOperation
{
Expression rhs;
UnaryOperator op;
};
BOOST_FUSION_ADAPT_STRUCT(UnaryOperation, (Expression,rhs)(UnaryOperator,op))
struct BinaryOperation
{
Expression rhs;
BinaryOperator op;
Expression lhs;
BinaryOperation() {}
BinaryOperation(Expression rhs, BinaryOperator op, Expression lhs) : rhs(rhs), op(op), lhs(lhs) {}
};
BOOST_FUSION_ADAPT_STRUCT(BinaryOperation, (Expression,rhs)(BinaryOperator,op)(Expression,lhs))
struct FunctionCall
{
Identifier functionName;
std::vector<Expression> args;
};
BOOST_FUSION_ADAPT_STRUCT(FunctionCall, (Identifier, functionName)(std::vector<Expression>, args))
struct Program
{
std::vector<Expression> statements;
};
BOOST_FUSION_ADAPT_STRUCT(Program, (std::vector<Expression>, statements))
std::ostream& operator<<(std::ostream& os, const Expression& expr)
{
os << "Expression ";
struct v : boost::static_visitor<> {
v(std::ostream& os) : os(os) {}
std::ostream& os;
void operator()(Literal const& e) const { os << "(literal: " << e << ")"; }
void operator()(Identifier const& e) const { os << "(identifier: " << e.name << ")"; }
void operator()(UnaryOperation const& e) const { os << "(unary op: " << boost::fusion::as_vector(e) << ")"; }
void operator()(BinaryOperation const& e) const { os << "(binary op: " << boost::fusion::as_vector(e) << ")"; }
void operator()(FunctionCall const& e) const {
os << "(function call: " << e.functionName << "(";
if (e.args.size() > 0) os << e.args.front();
for (auto it = e.args.begin() + 1; it != e.args.end(); it++) { os << ", " << *it; }
os << ")";
}
};
boost::apply_visitor(v(os), expr);
return os;
}
std::ostream& operator<<(std::ostream& os, const Program& prog)
{
os << "Program" << std::endl << "{" << std::endl;
for (const Expression& expr : prog.statements)
{
std::cout << "\t" << expr << std::endl;
}
os << "}" << std::endl;
return os;
}
template<typename Iterator, typename Skipper>
struct BoltGrammar : qi::grammar<Iterator, Skipper, Program()>
{
BoltGrammar() : BoltGrammar::base_type(start, "start")
{
using namespace qi::labels;
equalOp.add
("==", BinaryOperator::EQUAL)
("!=", BinaryOperator::NOT_EQUAL);
lowerGreaterOp.add
("<", BinaryOperator::LOWER)
("<=", BinaryOperator::LOWER_EQUAL)
(">", BinaryOperator::GREATER)
(">=", BinaryOperator::GREATER_EQUAL);
shiftOp.add
("<<", BinaryOperator::LEFT_SHIFT)
(">>", BinaryOperator::RIGHT_SHIFT);
addSubOp.add
("+", BinaryOperator::ADD)
("-", BinaryOperator::SUBTRACT);
multDivModOp.add
("*", BinaryOperator::MULTIPLY)
("/", BinaryOperator::DIVIDE)
("%", BinaryOperator::MODULO);
equal = lowerGreater [ _val=_1 ] >> -(equalOp >> lowerGreater) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
lowerGreater = shift [ _val=_1 ] >> -(lowerGreaterOp >> shift) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
shift = addSub [ _val=_1 ] >> -(shiftOp >> addSub) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
addSub = multDivMod [ _val=_1 ] >> -(addSubOp >> multDivMod) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
multDivMod = value [ _val=_1 ] >> -(multDivModOp >> value) [ _val = phx::construct<BinaryOperation>(_val, _1, _2) ];
start = qi::eps >> -expression % ';';
expression = '(' >> expression >> ')' | equal | value;
value = identifier | literal;
identifier = qi::lexeme[ascii::char_("a-zA-Z") >> *ascii::char_("0-9a-zA-Z")];
qi::real_parser<double, qi::strict_real_policies<double> > strict_double;
literal = quotedString | strict_double | qi::int_;
quotedString = qi::lexeme['"' >> +(ascii::char_ - '"') >> '"'];
qi::on_error<qi::fail>(start, std::cout << phx::val("Error! Expecting: ") << _4 << phx::val(" here: \"") << phx::construct<std::string>(_3, _2) << phx::val("\"") << std::endl);
BOOST_SPIRIT_DEBUG_NODES((start)(expression)(identifier)(literal)(quotedString)
(equal)(lowerGreater)(shift)(addSub)(multDivMod)(value)
)
}
qi::symbols<char, BinaryOperator> equalOp, lowerGreaterOp, shiftOp, addSubOp, multDivModOp;
qi::rule<Iterator, Skipper, Expression()> equal, lowerGreater, shift, addSub, multDivMod;
qi::rule<Iterator, Skipper, Expression()> value;
qi::rule<Iterator, Skipper, Program()> start;
qi::rule<Iterator, Skipper, Expression()> expression;
qi::rule<Iterator, Skipper, Identifier()> identifier;
qi::rule<Iterator, Skipper, Literal()> literal;
qi::rule<Iterator, Skipper, std::string()> quotedString;
};
typedef std::string::iterator Iterator;
typedef boost::spirit::ascii::space_type Skipper;
int main()
{
BoltGrammar<Iterator, Skipper> grammar;
std::string str("3; 4.2; \"lounge <c++>\"; 3 + 4;");
Program prog;
Iterator iter = str.begin(), last = str.end();
bool r = phrase_parse(iter, last, grammar, ascii::space, prog);
if (r && iter == last)
{
std::cout << "Parsing succeeded: " << prog << std::endl;
}
else
{
std::cout << "Parsing failed, remaining: " << std::string(iter, last) << std::endl;
}
return 0;
}
Prints:
Parsing succeeded: Program
{
Expression (literal: 3)
Expression (literal: 4.2)
Expression (literal: lounge <c++>)
Expression (binary op: (Expression (literal: 3) + Expression (literal: 4)))
}
Related
I am trying to model a parser for a subset of the C language, for a school project. However, I seem stuck in the process of generating recursive parsing rules for Boost.Spirit, as my rules either overflow the stack or simply do not pick up anything.
For example, I want to model the following syntax:
a ::= ... | A[a] | a1 op a2 | ...
There are some other subsets of syntax for this expression rule, but those are working without problems. For example, if I were to parse A[3*4], it should be read as a recursive parsing where A[...] (A[a] in the syntax) is the array accessor and 3*4 (a1 op a2 in the syntax) is the index.
I've tried defining the following rule objects in the grammar struct:
qi::rule<Iterator, Type(), Skipper> expr_arr;
qi::rule<Iterator, Type(), Skipper> expr_binary_arith;
qi::rule<Iterator, Type(), Skipper> expr_a;
And giving them the following grammar:
expr_arr %= qi::lexeme[identifier >> qi::omit['[']] >> expr_a >> qi::lexeme[qi::omit[']']];
expr_binary_arith %= expr_a >> op_binary_arith >> expr_a;
expr_a %= (expr_binary_arith | expr_arr);
where "op_binary_arith" is a qi::symbol<> object with the allowed operator symbols.
This compiles fine, but upon execution enters a supposedly endless loop, and the stack overflows. I've tried looking at the answer by Sehe in the following question: How to set max recursion in boost spirit.
However, I have been unsuccessful in setting a max recursion depth. Firstly, I failed to make it compile without errors for almost any of my attempts, but on the last attempt it built successfully, albeit with very unexpected results.
Can someone guide me in the right direction, as to how I should go about implementing this grammar correctly?
PEG grammars do not handle left-recursion well. In general you have to split out helper rules to write without left-recursion.
In your particular case, the goal production
a ::= ... | A[a] | a1 op a2 | ...
Seems a little off. This would allows foo[bar] or foo + bar but not foo + bar[qux].
Usually, the choice between array element reference or plain identifier is at a lower level of precedence (often "simple expression").
Here's a tiny elaboration:
literal = number_literal | string_literal; // TODO exapnd?
expr_arr = identifier >> '[' >> (expr_a % ',') >> ']';
simple_expression = literal | expr_arr | identifier;
expr_binary_arith = simple_expression >> op_binary_arith >> expr_a;
expr_a = expr_binary_arith | simple_expression;
Now you can parse e.g.:
for (std::string const& input : {
"A[3*4]",
"A[F[3]]",
"A[8 + F[0x31]]",
"3 * \"foo\"",
})
{
std::cout << std::quoted(input) << " -> ";
It f=begin(input), l=end(input);
AST::Expr e;
if (parse(f,l,g,e)) {
std::cout << "Parsed: " << e << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << "Remaining: " << std::quoted(std::string(f,l)) << "\b";
}
}
Which prints Live On Coliru
"A[3*4]" -> Parsed: A[3*4]
"A[F[3]]" -> Parsed: A[F[3]]
"A[8 + F[0x31]]" -> Parsed: A[8+F[49]]
"3 * \"foo\"" -> Parsed: 3*"foo"
NOTE I deliberately left efficiency and operator precedence out of the picture for now.
These are talked about in detail in other answers:
Boost::Spirit Expression Parser
Implementing operator precedence with boost spirit
Boost::Spirit : Optimizing an expression parser
And many more
Full Demo Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <experimental/iterator>
namespace qi = boost::spirit::qi;
namespace AST {
using Var = std::string;
struct String : std::string {
using std::string::string;
};
using Literal = boost::variant<String, intmax_t, double>;
enum class ArithOp {
addition, subtraction, division, multplication
};
struct IndexExpr;
struct BinOpExpr;
using Expr = boost::variant<
Literal,
Var,
boost::recursive_wrapper<IndexExpr>,
boost::recursive_wrapper<BinOpExpr>
>;
struct IndexExpr {
Expr expr;
std::vector<Expr> indices;
};
struct BinOpExpr {
Expr lhs, rhs;
ArithOp op;
};
std::ostream& operator<<(std::ostream& os, Literal const& lit) {
struct {
std::ostream& os;
void operator()(String const& s) const { os << std::quoted(s); }
void operator()(double d) const { os << d; }
void operator()(intmax_t i) const { os << i; }
} vis {os};
boost::apply_visitor(vis, lit);
return os;
}
std::ostream& operator<<(std::ostream& os, ArithOp const& op) {
switch(op) {
case ArithOp::addition: return os << '+';
case ArithOp::subtraction: return os << '-';
case ArithOp::division: return os << '/';
case ArithOp::multplication: return os << '*';
}
return os << '?';
}
std::ostream& operator<<(std::ostream& os, BinOpExpr const& e) {
return os << e.lhs << e.op << e.rhs;
}
std::ostream& operator<<(std::ostream& os, IndexExpr const& e) {
std::copy(
begin(e.indices),
end(e.indices),
std::experimental::make_ostream_joiner(os << e.expr << '[', ","));
return os << ']';
}
}
BOOST_FUSION_ADAPT_STRUCT(AST::IndexExpr, expr, indices)
BOOST_FUSION_ADAPT_STRUCT(AST::BinOpExpr, lhs, op, rhs)
template <typename Iterator, typename Skipper = qi::space_type>
struct G : qi::grammar<Iterator, AST::Expr()> {
G() : G::base_type(start) {
using namespace qi;
identifier = alpha >> *alnum;
number_literal =
qi::real_parser<double, qi::strict_real_policies<double> >{}
| "0x" >> qi::uint_parser<intmax_t, 16> {}
| qi::int_parser<intmax_t, 10> {}
;
string_literal = '"' >> *('\\' >> char_escape | ~char_('"')) >> '"';
literal = number_literal | string_literal; // TODO exapnd?
expr_arr = identifier >> '[' >> (expr_a % ',') >> ']';
simple_expression = literal | expr_arr | identifier;
expr_binary_arith = simple_expression >> op_binary_arith >> expr_a;
expr_a = expr_binary_arith | simple_expression;
start = skip(space) [expr_a];
BOOST_SPIRIT_DEBUG_NODES(
(start)
(expr_a)(expr_binary_arith)(simple_expression)(expr_a)
(literal)(number_literal)(string_literal)
(identifier))
}
private:
struct escape_sym : qi::symbols<char, char> {
escape_sym() {
this->add
("b", '\b')
("f", '\f')
("r", '\r')
("n", '\n')
("t", '\t')
("\\", '\\')
;
}
} char_escape;
struct op_binary_arith_sym : qi::symbols<char, AST::ArithOp> {
op_binary_arith_sym() {
this->add
("+", AST::ArithOp::addition)
("-", AST::ArithOp::subtraction)
("/", AST::ArithOp::division)
("*", AST::ArithOp::multplication)
;
}
} op_binary_arith;
qi::rule<Iterator, AST::Expr()> start;
qi::rule<Iterator, AST::IndexExpr(), Skipper> expr_arr;
qi::rule<Iterator, AST::BinOpExpr(), Skipper> expr_binary_arith;
qi::rule<Iterator, AST::Expr(), Skipper> simple_expression, expr_a;
// implicit lexemes
qi::rule<Iterator, AST::Literal()> literal, string_literal, number_literal;
qi::rule<Iterator, AST::Var()> identifier;
};
int main() {
using It = std::string::const_iterator;
G<It> const g;
for (std::string const& input : {
"A[3*4]",
"A[F[3]]",
"A[8 + F[0x31]]",
"3 * \"foo\"",
})
{
std::cout << std::quoted(input) << " -> ";
It f=begin(input), l=end(input);
AST::Expr e;
if (parse(f,l,g,e)) {
std::cout << "Parsed: " << e << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << "Remaining: " << std::quoted(std::string(f,l)) << "\b";
}
}
}
I want to find out whether I can detect function call using regex. The basic case is easy: somefunction(1, 2);
But what if I had code:
somefunction(someotherfunction(), someotherotherfunction());
or
somefunction(function () { return 1; }, function() {return 2;});
or
caller_function(somefunction(function () { return 1; }, function() {return 2;}))
In this case I need to match same number of opening braces and closing braces so that I can find end of call to somefunction
Is it possible?
Thanks in advance.
Your question is misleading. It's not as simple as you think.
First, the grammar isn't regular. Regular expressions are not the right tool.
Second, you ask "detecting function call" but the samples show anonymous function definitions, a totally different ball game.
Here's a start using Boost Spirit:
start = skip(space) [ fcall ];
fcall = ident >> '(' >> -args >> ')';
args = arg % ',';
arg = fdef | fcall;
fdef = lexeme["function"] >> '(' >> -formals >> ')' >> body;
formals = ident % ',';
identch = alpha | char_("_");
ident = identch >> *(identch|digit);
body = '{' >> *~char_('}') >> '}';
Which would map onto an AST like:
struct function_definition {
std::vector<std::string> formal_arguments;
std::string body;
};
struct function_call;
using argument = boost::variant<
function_definition,
boost::recursive_wrapper<function_call>
>;
struct function_call {
std::string name;
std::vector<argument> args;
};
DEMO
Live On Coliru
// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/fusion/adapted/struct.hpp>
struct function_definition {
std::vector<std::string> formal_arguments;
std::string body;
};
struct function_call;
using argument = boost::variant<
function_definition,
boost::recursive_wrapper<function_call>
>;
struct function_call {
std::string name;
std::vector<argument> args;
};
BOOST_FUSION_ADAPT_STRUCT(function_call, name, args)
BOOST_FUSION_ADAPT_STRUCT(function_definition, formal_arguments, body)
namespace qi = boost::spirit::qi;
template <typename It>
struct Parser : qi::grammar<It, function_call()> {
Parser() : Parser::base_type(start) {
using namespace qi;
start = skip(space) [ fcall ];
fcall = ident >> '(' >> -args >> ')';
args = arg % ',';
arg = fdef | fcall;
fdef = lexeme["function"] >> '(' >> -formals >> ')' >> body;
formals = ident % ',';
identch = alpha | char_("_");
ident = identch >> *(identch|digit);
body = '{' >> *~char_('}') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start)(fcall)(args)(arg)(fdef)(formals)(ident)(body))
}
private:
using Skipper = qi::space_type;
qi::rule<It, function_call()> start;
qi::rule<It, function_call(), Skipper> fcall;
qi::rule<It, argument(), Skipper> arg;
qi::rule<It, std::vector<argument>(), Skipper> args;
qi::rule<It, function_definition(), Skipper> fdef;
qi::rule<It, std::vector<std::string>(), Skipper> formals;
qi::rule<It, char()> identch;
qi::rule<It, std::string()> ident, body;
};
// for debug:
#include <experimental/iterator>
static inline std::ostream& operator<<(std::ostream& os, function_definition const& v) {
os << "function(";
std::copy(v.formal_arguments.begin(), v.formal_arguments.end(), std::experimental::make_ostream_joiner(os, ", "));
return os << ") {" << v.body << "}";
}
static inline std::ostream& operator<<(std::ostream& os, function_call const& v) {
os << v.name << "(";
std::copy(v.args.begin(), v.args.end(), std::experimental::make_ostream_joiner(os, ", "));
return os << ")";
}
int main() {
std::string const input = "caller_function(somefunction(function () { return 1; }, function() {return 2;}))";
using It = std::string::const_iterator;
Parser<It> const p;
It f = input.begin(), l = input.end();
function_call parsed;
bool ok = parse(f, l, p, parsed);
if (ok) {
std::cout << "Parsed ok: " << parsed << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints
Parsed ok: caller_function(somefunction(function() { return 1; }, function() {return 2;}))
edit : I have ripped out the lexer as it does not cleanly integrate with Qi and just obfuscates grammars (see here).
on_success isn't well documented and I am trying to wire it up to my parser. The examples dealing with on_success deal with parsers just built on qi --i.e., no lex.
This is how I am trying to introduce the construct :
using namespace qi::labels;
qi::on_success(event_entry_,std::cout << _val << _1);
But it won't compile. I am dreading the problem being lex. Could someone tell me what I am doing wrong and secondly tell me what all placeholders are available, there type and what they represent (since they aren't documented).
The full file is as follows:
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/home/phoenix/bind/bind_member_variable.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/none.hpp>
#include <boost/cstdint.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <string>
#include <exception>
#include <vector>
namespace lex = boost::spirit::lex;
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Lexer>
struct tokens : lex::lexer<Lexer>
{
tokens()
: left_curly("\"{\""),
right_curly("\"}\""),
left_paren("\"(\""),
right_paren("\")\""),
colon(":"),
scolon(";"),
namespace_("(?i:namespace)"),
event("(?i:event)"),
optional("(?i:optional)"),
required("(?i:required)"),
repeated("(?i:repeated)"),
t_int_4("(?i:int4)"),
t_int_8("(?i:int8)"),
t_string("(?i:string)"),
ordinal("\\d+"),
identifier("\\w+")
{
using boost::spirit::lex::_val;
this->self
=
left_curly [ std::cout << px::val("lpar") << std::endl]
| right_curly [ std::cout << px::val("rpar") << std::endl]
| left_paren
| right_paren
| colon [ std::cout << px::val("colon") << std::endl]
| scolon
| namespace_ [ std::cout << px::val("kw namesapce") << std::endl]
| event [ std::cout << px::val("kw event") << std::endl]
| optional [ std::cout << px::val("optional ") << "-->" << _val << "<--" << std::endl]
| required [ std::cout << px::val("required") << std::endl]
| repeated
| t_int_4
| t_int_8
| t_string
| ordinal [ std::cout << px::val("val ordinal (") << _val << ")" << std::endl]
| identifier [std::cout << px::val("val identifier(") << _val << ")" << std::endl];
this->self("WS") = lex::token_def<>("[ \\t\\n]+");
}
lex::token_def<lex::omit> left_curly, right_curly, colon, scolon,repeated, left_paren, right_paren;
lex::token_def<lex::omit> namespace_, event, optional, required,t_int_4, t_int_8, t_string;
lex::token_def<boost::uint32_t> ordinal;
lex::token_def<> identifier;
};
enum event_entry_qualifier
{
ENTRY_OPTIONAL,
ENTRY_REQUIRED,
ENTRY_REPEATED
};
enum entry_type
{
RBL_INT4,
RBL_INT8,
RBL_STRING,
RBL_EVENT
};
struct oid
{
boost::uint32_t ordinal;
std::string name;
};
BOOST_FUSION_ADAPT_STRUCT
(
oid,
(boost::uint32_t, ordinal)
(std::string, name)
)
struct type_descriptor
{
entry_type type_id;
std::string referenced_event;
};
BOOST_FUSION_ADAPT_STRUCT
(
type_descriptor,
(entry_type, type_id)
(std::string, referenced_event)
)
struct event_entry
{
event_entry_qualifier qualifier;
oid identifier;
type_descriptor descriptor;
};
BOOST_FUSION_ADAPT_STRUCT
(
event_entry,
(event_entry_qualifier, qualifier)
(oid, identifier)
(type_descriptor, descriptor)
)
struct event_descriptor
{
oid identifier;
std::vector<event_entry> event_entries;
};
BOOST_FUSION_ADAPT_STRUCT
(
event_descriptor,
(oid, identifier)
(std::vector<event_entry>, event_entries)
)
template <typename Iterator, typename Lexer>
struct grammar : qi::grammar<Iterator,event_descriptor(), qi::in_state_skipper<Lexer> >
{
template <typename TokenDef>
grammar(TokenDef const& tok)
: grammar::base_type(event_descriptor_)
{
using qi::_val;
//start = event;
event_descriptor_ = tok.event >> oid_ >> tok.left_curly >> *(event_entry_) >> tok.right_curly;
event_entry_ = event_qualifier >> oid_ >> type_descriptor_ >> tok.scolon;
event_qualifier = tok.optional [ _val = ENTRY_OPTIONAL]
| tok.required [ _val = ENTRY_REQUIRED]
| tok.repeated [ _val = ENTRY_REPEATED];
oid_ = tok.ordinal
>> tok.colon
>> tok.identifier;
type_descriptor_
= (( atomic_type >> qi::attr(""))
| ( event_type >> tok.left_paren >> tok.identifier >> tok.right_paren));
atomic_type = tok.t_int_4 [ _val = RBL_INT4]
| tok.t_int_8 [ _val = RBL_INT8]
| tok.t_string [ _val = RBL_STRING];
event_type = tok.event [_val = RBL_EVENT];
using namespace qi::labels;
qi::on_success(event_entry_,std::cout << _val << _1);
}
qi::rule<Iterator> start;
qi::rule<Iterator, event_descriptor(), qi::in_state_skipper<Lexer> > event_descriptor_;
qi::rule<Iterator, event_entry(), qi::in_state_skipper<Lexer> > event_entry_;
qi::rule<Iterator, event_entry_qualifier()> event_qualifier;
qi::rule<Iterator, entry_type()> atomic_type;
qi::rule<Iterator, entry_type()> event_type;
qi::rule<Iterator, type_descriptor(),qi::in_state_skipper<Lexer> > type_descriptor_;
qi::rule<Iterator, oid()> oid_;
};
std::string test = " EVENT 1:sihan { OPTIONAL 123:hassan int4; OPTIONAL 123:hassan int4; } ";
int main()
{
typedef lex::lexertl::token<std::string::iterator, boost::mpl::vector<boost::uint32_t, std::string> > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
typedef tokens<lexer_type>::iterator_type iterator_type;
tokens<lexer_type> token_lexer;
grammar<iterator_type,tokens<lexer_type>::lexer_def> grammar(token_lexer);
std::string::iterator it = test.begin();
iterator_type first = token_lexer.begin(it, test.end());
iterator_type last = token_lexer.end();
bool r;
r = qi::phrase_parse(first, last, grammar, qi::in_state("WS")[token_lexer.self]);
if(r)
;
else
{
std::cout << "parsing failed" << std::endl;
}
}
Looking at the header files I think the meaning of the placeholders is:
_1 = Iterator position when the rule was tried.
_2 = Iterator to the end of the input.
_3 = Iterator position right after the rule has been successfully matched.
(Since I'm not sure that the lines above are understandable, here's a little example with your input)
rule being tried
_________________________________
´ `
[EVENT][1][:][sihan][{][OPTIONAL][123][:][hassan][int4][;][OPTIONAL][321][:][hassan2][int4][;][}]
_1 _3 _2
As GManNickG mentions in the comments these are lexer iterators, and you can't access easily the original string with them. The conjure2 example combines the use of a lexer and on_error/on_success. To accomplish that it uses a special kind of token, position_token. This token always has access to the pair of iterators of the original string associated with itself (the normal token loses this information when you use lex::omit). position_token has several interesting methods. matched() returns an iterator_range<OriginalIterator>, and begin() and end() return the corresponding iterators.
In the code below I chose to create a phoenix::function that takes two lexer iterators (called with _1 and _3) and returns a string that covers the distance between them (using std::string(begin_iter->begin(), end_iter->begin())).
One problem I found was that the fact that the whitespace was in a diferent state caused that the iterators the position_token returned were invalid. What I did to solve this was put everything in the same state and then simply use lex::_pass = lex::pass_flags::pass_ignore with the whitespace.
The last (minor) problem is that if you want to use std::cout << _val you need to define operator<< for the types you are interested in.
PS: I always use BOOST_SPIRIT_USE_PHOENIX_V3 and this requires that every spirit/phoenix include comes from boost/spirit/include/.... If, for any reason, you need/want to use V2 you'll need to change the phoenix::function. I also am incapable of using an old style for loop, so if you can't use c++11 you'll have to change the definition of operator<< for event_descriptor.
#define BOOST_SPIRIT_USE_PHOENIX_V3
// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_bind.hpp> //CHANGED
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/lex_lexertl_position_token.hpp> //ADDED
#include <boost/none.hpp>
#include <boost/cstdint.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <string>
#include <exception>
#include <vector>
namespace lex = boost::spirit::lex;
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Lexer>
struct tokens : lex::lexer<Lexer>
{
tokens()
: left_curly("\"{\""),
right_curly("\"}\""),
left_paren("\"(\""),
right_paren("\")\""),
colon(":"),
scolon(";"),
namespace_("(?i:namespace)"),
event("(?i:event)"),
optional("(?i:optional)"),
required("(?i:required)"),
repeated("(?i:repeated)"),
t_int_4("(?i:int4)"),
t_int_8("(?i:int8)"),
t_string("(?i:string)"),
ordinal("\\d+"),
identifier("\\w+")
{
using boost::spirit::lex::_val;
this->self
=
left_curly //[ std::cout << px::val("lpar") << std::endl]
| right_curly //[ std::cout << px::val("rpar") << std::endl]
| left_paren
| right_paren
| colon //[ std::cout << px::val("colon") << std::endl]
| scolon
| namespace_ // [ std::cout << px::val("kw namesapce") << std::endl]
| event // [ std::cout << px::val("kw event") << std::endl]
| optional //[ std::cout << px::val("optional ") << "-->" << _val << "<--" << std::endl]
| required //[ std::cout << px::val("required") << std::endl]
| repeated
| t_int_4
| t_int_8
| t_string
| ordinal //[ std::cout << px::val("val ordinal (") << _val << ")" << std::endl]
| identifier //[std::cout << px::val("val identifier(") << _val << ")" << std::endl]
| lex::token_def<>("[ \\t\\n]+") [lex::_pass = lex::pass_flags::pass_ignore] //CHANGED
;
}
lex::token_def<lex::omit> left_curly, right_curly, left_paren, right_paren, colon, scolon;
lex::token_def<lex::omit> namespace_, event, optional, required, repeated, t_int_4, t_int_8, t_string;
lex::token_def<boost::uint32_t> ordinal;
lex::token_def<> identifier;
};
enum event_entry_qualifier
{
ENTRY_OPTIONAL,
ENTRY_REQUIRED,
ENTRY_REPEATED
};
enum entry_type
{
RBL_INT4,
RBL_INT8,
RBL_STRING,
RBL_EVENT
};
struct oid
{
boost::uint32_t ordinal;
std::string name;
};
BOOST_FUSION_ADAPT_STRUCT
(
oid,
(boost::uint32_t, ordinal)
(std::string, name)
)
std::ostream& operator<<(std::ostream& os, const oid& val) //ADDED
{
return os << val.ordinal << "-" << val.name;
}
struct type_descriptor
{
entry_type type_id;
std::string referenced_event;
};
BOOST_FUSION_ADAPT_STRUCT
(
type_descriptor,
(entry_type, type_id)
(std::string, referenced_event)
)
std::ostream& operator<<(std::ostream& os, const type_descriptor& val) //ADDED
{
return os << val.type_id << "-" << val.referenced_event;
}
struct event_entry
{
event_entry_qualifier qualifier;
oid identifier;
type_descriptor descriptor;
};
BOOST_FUSION_ADAPT_STRUCT
(
event_entry,
(event_entry_qualifier, qualifier)
(oid, identifier)
(type_descriptor, descriptor)
)
std::ostream& operator<<(std::ostream& os, const event_entry& val) //ADDED
{
return os << val.qualifier << "-" << val.identifier << "-" << val.descriptor;
}
struct event_descriptor
{
oid identifier;
std::vector<event_entry> event_entries;
};
BOOST_FUSION_ADAPT_STRUCT
(
event_descriptor,
(oid, identifier)
(std::vector<event_entry>, event_entries)
)
std::ostream& operator<<(std::ostream& os, const event_descriptor& val) //ADDED
{
os << val.identifier << "[";
for(const auto& entry: val.event_entries) //C++11
os << entry;
os << "]";
return os;
}
struct build_string_impl //ADDED
{
template <typename Sig>
struct result;
template <typename This, typename Iter1, typename Iter2>
struct result<This(Iter1,Iter2)>
{
typedef std::string type;
};
template <typename Iter1, typename Iter2>
std::string operator()(Iter1 begin, Iter2 end) const
{
return std::string(begin->begin(),end->begin());
}
};
px::function<build_string_impl> build_string;
template <typename Iterator, typename Lexer>
struct grammar : qi::grammar<Iterator,event_descriptor() >
{
template <typename TokenDef>
grammar(TokenDef const& tok)
: grammar::base_type(event_descriptor_)
{
using qi::_val;
//start = event;
event_descriptor_ = tok.event >> oid_ >> tok.left_curly >> *(event_entry_) >> tok.right_curly;
event_entry_ = event_qualifier >> oid_ >> type_descriptor_ >> tok.scolon;
event_qualifier = tok.optional [ _val = ENTRY_OPTIONAL]
| tok.required [ _val = ENTRY_REQUIRED]
| tok.repeated [ _val = ENTRY_REPEATED];
oid_ = tok.ordinal
>> tok.colon
>> tok.identifier;
type_descriptor_
= (( atomic_type >> qi::attr(""))
| ( event_type >> tok.left_paren >> tok.identifier >> tok.right_paren));
atomic_type = tok.t_int_4 [ _val = RBL_INT4]
| tok.t_int_8 [ _val = RBL_INT8]
| tok.t_string [ _val = RBL_STRING];
event_type = tok.event [_val = RBL_EVENT];
using namespace qi::labels;
qi::on_success(event_entry_,std::cout << _val << " " << build_string(_1,_3) << std::endl); //CHANGED
// BOOST_SPIRIT_DEBUG_NODES( (event_descriptor_)(event_entry_)(event_qualifier)(oid_)(type_descriptor_)(atomic_type)(event_type) );
}
qi::rule<Iterator> start;
qi::rule<Iterator, event_descriptor()> event_descriptor_;
qi::rule<Iterator, event_entry()> event_entry_;
qi::rule<Iterator, event_entry_qualifier()> event_qualifier;
qi::rule<Iterator, entry_type()> atomic_type;
qi::rule<Iterator, entry_type()> event_type;
qi::rule<Iterator, type_descriptor()> type_descriptor_;
qi::rule<Iterator, oid()> oid_;
};
std::string test = " EVENT 1:sihan { OPTIONAL 123:hassan int4; OPTIONAL 321:hassan2 int4; } ";
int main()
{
typedef lex::lexertl::position_token<std::string::iterator, boost::mpl::vector<boost::uint32_t, std::string> > token_type; //CHANGED
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
typedef tokens<lexer_type>::iterator_type iterator_type;
tokens<lexer_type> token_lexer;
grammar<iterator_type,tokens<lexer_type>::lexer_def> grammar(token_lexer);
std::string::iterator it = test.begin();
iterator_type first = token_lexer.begin(it, test.end());
iterator_type last = token_lexer.end();
bool r;
r = qi::parse(first, last, grammar); //CHANGED
if(r)
;
else
{
std::cout << "parsing failed" << std::endl;
}
}
I want to use boost spirit to parse an expression like
function1(arg1, arg2, function2(arg1, arg2, arg3),
function3(arg1,arg2))
and call corresponding c++ functions. What should be the grammar to parse above expression and call the corresponding c++ function by phoneix::bind()?
I have 2 types of functions to call
1) string functions;
wstring GetSubString(wstring stringToCut, int position, int length);
wstring GetStringToken(wstring stringToTokenize, wstring seperators,
int tokenNumber );
2) Functions that return integer;
int GetCount();
int GetId(wstring srcId, wstring srcType);
Second Answer (more pragmatic)
Here's a second take, for comparison:
Just in case you really didn't want to parse into an abstract syntax tree representation, but rather evaluate the functions on-the-fly during parsing, you can simplify the grammar.
It comes in at 92 lines as opposed to 209 lines in the first answer. It really depends on what you're implementing which approach is more suitable.
This shorter approach has some downsides:
less flexible (not reusable)
less robust (if functions have side effects, they will happen even if parsing fails halfway)
less extensible (the supported functions are hardwired into the grammar1)
Full code:
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/phoenix/function.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef boost::variant<int, std::string> value;
//////////////////////////////////////////////////
// Demo functions:
value AnswerToLTUAE() {
return 42;
}
value ReverseString(value const& input) {
auto& as_string = boost::get<std::string>(input);
return std::string(as_string.rbegin(), as_string.rend());
}
value Concatenate(value const& a, value const& b) {
std::ostringstream oss;
oss << a << b;
return oss.str();
}
BOOST_PHOENIX_ADAPT_FUNCTION_NULLARY(value, AnswerToLTUAE_, AnswerToLTUAE)
BOOST_PHOENIX_ADAPT_FUNCTION(value, ReverseString_, ReverseString, 1)
BOOST_PHOENIX_ADAPT_FUNCTION(value, Concatenate_, Concatenate, 2)
//////////////////////////////////////////////////
// Parser grammar
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, value(), Skipper>
{
parser() : parser::base_type(expr_)
{
using namespace qi;
function_call_ =
(lit("AnswerToLTUAE") > '(' > ')')
[ _val = AnswerToLTUAE_() ]
| (lit("ReverseString") > '(' > expr_ > ')')
[ _val = ReverseString_(_1) ]
| (lit("Concatenate") > '(' > expr_ > ',' > expr_ > ')')
[ _val = Concatenate_(_1, _2) ]
;
string_ = as_string [
lexeme [ "'" >> *~char_("'") >> "'" ]
];
value_ = int_ | string_;
expr_ = function_call_ | value_;
on_error<fail> ( expr_, std::cout
<< phx::val("Error! Expecting ") << _4 << phx::val(" here: \"")
<< phx::construct<std::string>(_3, _2) << phx::val("\"\n"));
BOOST_SPIRIT_DEBUG_NODES((expr_)(function_call_)(value_)(string_))
}
private:
qi::rule<It, value(), Skipper> value_, function_call_, expr_, string_;
};
int main()
{
for (const std::string input: std::vector<std::string> {
"-99",
"'string'",
"AnswerToLTUAE()",
"ReverseString('string')",
"Concatenate('string', 987)",
"Concatenate('The Answer Is ', AnswerToLTUAE())",
})
{
auto f(std::begin(input)), l(std::end(input));
const static parser<decltype(f)> p;
value direct_eval;
bool ok = qi::phrase_parse(f,l,p,qi::space,direct_eval);
if (!ok)
std::cout << "invalid input\n";
else
{
std::cout << "input:\t" << input << "\n";
std::cout << "eval:\t" << direct_eval << "\n\n";
}
if (f!=l) std::cout << "unparsed: '" << std::string(f,l) << "'\n";
}
}
Note how, instead of using BOOST_PHOENIX_ADAPT_FUNCTION* we could have directly used boost::phoenix::bind.
The output is still the same:
input: -99
eval: -99
input: 'string'
eval: string
input: AnswerToLTUAE()
eval: 42
input: ReverseString('string')
eval: gnirts
input: Concatenate('string', 987)
eval: string987
input: Concatenate('The Answer Is ', AnswerToLTUAE())
eval: The Answer Is 42
1 This last downside is easily remedied by using the 'Nabialek Trick'
First Answer (complete)
I've gone and implemented a simple recursive expression grammar for functions having up-to-three parameters:
for (const std::string input: std::vector<std::string> {
"-99",
"'string'",
"AnswerToLTUAE()",
"ReverseString('string')",
"Concatenate('string', 987)",
"Concatenate('The Answer Is ', AnswerToLTUAE())",
})
{
auto f(std::begin(input)), l(std::end(input));
const static parser<decltype(f)> p;
expr parsed_script;
bool ok = qi::phrase_parse(f,l,p,qi::space,parsed_script);
if (!ok)
std::cout << "invalid input\n";
else
{
const static generator<boost::spirit::ostream_iterator> g;
std::cout << "input:\t" << input << "\n";
std::cout << "tree:\t" << karma::format(g, parsed_script) << "\n";
std::cout << "eval:\t" << evaluate(parsed_script) << "\n";
}
if (f!=l) std::cout << "unparsed: '" << std::string(f,l) << "'\n";
}
Which prints:
input: -99
tree: -99
eval: -99
input: 'string'
tree: 'string'
eval: string
input: AnswerToLTUAE()
tree: nullary_function_call()
eval: 42
input: ReverseString('string')
tree: unary_function_call('string')
eval: gnirts
input: Concatenate('string', 987)
tree: binary_function_call('string',987)
eval: string987
input: Concatenate('The Answer Is ', AnswerToLTUAE())
tree: binary_function_call('The Answer Is ',nullary_function_call())
eval: The Answer Is 42
Some notes:
I separated parsing from execution (which is always a good idea IMO)
I implemented function evaluation for zero, one or two parameters (this should be easy to extend)
Values are assumed to be integers or strings (should be easy to extend)
I added a karma generator to display the parsed expression (with a TODO marked in the comment)
I hope this helps:
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/variant/recursive_wrapper.hpp>
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
namespace phx = boost::phoenix;
typedef boost::variant<int, std::string> value;
typedef boost::variant<value, boost::recursive_wrapper<struct function_call> > expr;
typedef std::function<value() > nullary_function_impl;
typedef std::function<value(value const&) > unary_function_impl;
typedef std::function<value(value const&, value const&)> binary_function_impl;
typedef boost::variant<nullary_function_impl, unary_function_impl, binary_function_impl> function_impl;
typedef qi::symbols<char, function_impl> function_table;
struct function_call
{
typedef std::vector<expr> arguments_t;
function_call() = default;
function_call(function_impl f, arguments_t const& arguments)
: f(f), arguments(arguments) { }
function_impl f;
arguments_t arguments;
};
BOOST_FUSION_ADAPT_STRUCT(function_call, (function_impl, f)(function_call::arguments_t, arguments))
#ifdef BOOST_SPIRIT_DEBUG
namespace std
{
static inline std::ostream& operator<<(std::ostream& os, nullary_function_impl const& f) { return os << "<nullary_function_impl>"; }
static inline std::ostream& operator<<(std::ostream& os, unary_function_impl const& f) { return os << "<unary_function_impl>"; }
static inline std::ostream& operator<<(std::ostream& os, binary_function_impl const& f) { return os << "<binary_function_impl>"; }
}
static inline std::ostream& operator<<(std::ostream& os, function_call const& call) { return os << call.f << "(" << call.arguments.size() << ")"; }
#endif
//////////////////////////////////////////////////
// Evaluation
value evaluate(const expr& e);
struct eval : boost::static_visitor<value>
{
eval() {}
value operator()(const value& v) const
{
return v;
}
value operator()(const function_call& call) const
{
return boost::apply_visitor(invoke(call.arguments), call.f);
}
private:
struct invoke : boost::static_visitor<value>
{
function_call::arguments_t const& _args;
invoke(function_call::arguments_t const& args) : _args(args) {}
value operator()(nullary_function_impl const& f) const {
return f();
}
value operator()(unary_function_impl const& f) const {
auto a = evaluate(_args.at(0));
return f(a);
}
value operator()(binary_function_impl const& f) const {
auto a = evaluate(_args.at(0));
auto b = evaluate(_args.at(1));
return f(a, b);
}
};
};
value evaluate(const expr& e)
{
return boost::apply_visitor(eval(), e);
}
//////////////////////////////////////////////////
// Demo functions:
value AnswerToLTUAE() {
return 42;
}
value ReverseString(value const& input) {
auto& as_string = boost::get<std::string>(input);
return std::string(as_string.rbegin(), as_string.rend());
}
value Concatenate(value const& a, value const& b) {
std::ostringstream oss;
oss << a << b;
return oss.str();
}
//////////////////////////////////////////////////
// Parser grammar
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, expr(), Skipper>
{
parser() : parser::base_type(expr_)
{
using namespace qi;
n_ary_ops.add
("AnswerToLTUAE", nullary_function_impl{ &::AnswerToLTUAE })
("ReverseString", unary_function_impl { &::ReverseString })
("Concatenate" , binary_function_impl { &::Concatenate });
function_call_ = n_ary_ops > '(' > expr_list > ')';
string_ = qi::lexeme [ "'" >> *~qi::char_("'") >> "'" ];
value_ = qi::int_ | string_;
expr_list = -expr_ % ',';
expr_ = function_call_ | value_;
on_error<fail> ( expr_, std::cout
<< phx::val("Error! Expecting ") << _4 << phx::val(" here: \"")
<< phx::construct<std::string>(_3, _2) << phx::val("\"\n"));
BOOST_SPIRIT_DEBUG_NODES((expr_)(expr_list)(function_call_)(value_)(string_))
}
private:
function_table n_ary_ops;
template <typename Attr> using Rule = qi::rule<It, Attr(), Skipper>;
Rule<std::string> string_;
Rule<value> value_;
Rule<function_call> function_call_;
Rule<std::vector<expr>> expr_list;
Rule<expr> expr_;
};
//////////////////////////////////////////////////
// Output generator
template <typename It>
struct generator : karma::grammar<It, expr()>
{
generator() : generator::base_type(expr_)
{
using namespace karma;
nullary_ = eps << "nullary_function_call"; // TODO reverse lookup :)
unary_ = eps << "unary_function_call";
binary_ = eps << "binary_function_call";
function_ = nullary_ | unary_ | binary_;
function_call_ = function_ << expr_list;
expr_list = '(' << -(expr_ % ',') << ')';
value_ = karma::int_ | ("'" << karma::string << "'");
expr_ = function_call_ | value_;
}
private:
template <typename Attr> using Rule = karma::rule<It, Attr()>;
Rule<nullary_function_impl> nullary_;
Rule<unary_function_impl> unary_;
Rule<binary_function_impl> binary_;
Rule<function_impl> function_;
Rule<function_call> function_call_;
Rule<value> value_;
Rule<std::vector<expr>> expr_list;
Rule<expr> expr_;
};
int main()
{
for (const std::string input: std::vector<std::string> {
"-99",
"'string'",
"AnswerToLTUAE()",
"ReverseString('string')",
"Concatenate('string', 987)",
"Concatenate('The Answer Is ', AnswerToLTUAE())",
})
{
auto f(std::begin(input)), l(std::end(input));
const static parser<decltype(f)> p;
expr parsed_script;
bool ok = qi::phrase_parse(f,l,p,qi::space,parsed_script);
if (!ok)
std::cout << "invalid input\n";
else
{
const static generator<boost::spirit::ostream_iterator> g;
std::cout << "input:\t" << input << "\n";
std::cout << "tree:\t" << karma::format(g, parsed_script) << "\n";
std::cout << "eval:\t" << evaluate(parsed_script) << "\n\n";
}
if (f!=l) std::cout << "unparsed: '" << std::string(f,l) << "'\n";
}
}
I’m trying to write a grammar for a language which allows the following expressions:
Function calls of the form f args (note: no parentheses!)
Addition (and more complex expressions but that’s not the point here) of the form a + b
For example:
f 42 => f(42)
42 + b => (42 + b)
f 42 + b => f(42 + b)
The grammar is unambiguous (every expression can be parsed in exactly one way) but I don’t know how to write this grammar as a PEG since both productions potentially start with the same token, id. This is my wrong PEG. How can I rewrite it to make it valid?
expression ::= call / addition
call ::= id addition*
addition ::= unary
( ('+' unary)
/ ('-' unary) )*
unary ::= primary
/ '(' ( ('+' unary)
/ ('-' unary)
/ expression)
')'
primary ::= number / id
number ::= [1-9]+
id ::= [a-z]+
Now, when this grammar tries to parse the input “a + b” it parses “a” as a function call with zero arguments and chokes on “+ b”.
I’ve uploaded a C++ / Boost.Spirit.Qi implementation of the grammar in case anybody wants to play with it.
(Note that unary disambiguates unary operations and additions: In order to call a function with a negative number as an argument, you need to specify parentheses, e.g. f (-1).)
As proposed in chat you could start out with something like:
expression = addition | simple;
addition = simple >>
( ('+' > expression)
| ('-' > expression)
);
simple = '(' > expression > ')' | call | unary | number;
call = id >> *expression;
unary = qi::char_("-+") > expression;
// terminals
id = qi::lexeme[+qi::char_("a-z")];
number = qi::double_;
Since then I implemented this in C++ with an AST presentation, so you can get a feel for how this grammar actually build the expression tree by pretty printing it.
All source code is on github: https://gist.github.com/2152518
There are two versions (scroll down to 'Alternative' to read more
Grammar:
template <typename Iterator>
struct mini_grammar : qi::grammar<Iterator, expression_t(), qi::space_type>
{
qi::rule<Iterator, std::string(), qi::space_type> id;
qi::rule<Iterator, expression_t(), qi::space_type> addition, expression, simple;
qi::rule<Iterator, number_t(), qi::space_type> number;
qi::rule<Iterator, call_t(), qi::space_type> call;
qi::rule<Iterator, unary_t(), qi::space_type> unary;
mini_grammar() : mini_grammar::base_type(expression)
{
expression = addition | simple;
addition = simple [ qi::_val = qi::_1 ] >>
+(
(qi::char_("+-") > simple) [ phx::bind(&append_term, qi::_val, qi::_1, qi::_2) ]
);
simple = '(' > expression > ')' | call | unary | number;
call = id >> *expression;
unary = qi::char_("-+") > expression;
// terminals
id = qi::lexeme[+qi::char_("a-z")];
number = qi::double_;
}
};
The corresponding AST structures are defined quick-and-dirty using the very powerful Boost Variant:
struct addition_t;
struct call_t;
struct unary_t;
typedef double number_t;
typedef boost::variant<
number_t,
boost::recursive_wrapper<call_t>,
boost::recursive_wrapper<unary_t>,
boost::recursive_wrapper<addition_t>
> expression_t;
struct addition_t
{
expression_t lhs;
char binop;
expression_t rhs;
};
struct call_t
{
std::string id;
std::vector<expression_t> args;
};
struct unary_t
{
char unop;
expression_t operand;
};
BOOST_FUSION_ADAPT_STRUCT(addition_t, (expression_t, lhs)(char,binop)(expression_t, rhs));
BOOST_FUSION_ADAPT_STRUCT(call_t, (std::string, id)(std::vector<expression_t>, args));
BOOST_FUSION_ADAPT_STRUCT(unary_t, (char, unop)(expression_t, operand));
In the full code, I've also overloaded operator<< for these structures.
Full Demo
//#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <iterator>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/optional.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
struct addition_t;
struct call_t;
struct unary_t;
typedef double number_t;
typedef boost::variant<
number_t,
boost::recursive_wrapper<call_t>,
boost::recursive_wrapper<unary_t>,
boost::recursive_wrapper<addition_t>
> expression_t;
struct addition_t
{
expression_t lhs;
char binop;
expression_t rhs;
friend std::ostream& operator<<(std::ostream& os, const addition_t& a)
{ return os << "(" << a.lhs << ' ' << a.binop << ' ' << a.rhs << ")"; }
};
struct call_t
{
std::string id;
std::vector<expression_t> args;
friend std::ostream& operator<<(std::ostream& os, const call_t& a)
{ os << a.id << "("; for (auto& e : a.args) os << e << ", "; return os << ")"; }
};
struct unary_t
{
char unop;
expression_t operand;
friend std::ostream& operator<<(std::ostream& os, const unary_t& a)
{ return os << "(" << a.unop << ' ' << a.operand << ")"; }
};
BOOST_FUSION_ADAPT_STRUCT(addition_t, (expression_t, lhs)(char,binop)(expression_t, rhs));
BOOST_FUSION_ADAPT_STRUCT(call_t, (std::string, id)(std::vector<expression_t>, args));
BOOST_FUSION_ADAPT_STRUCT(unary_t, (char, unop)(expression_t, operand));
void append_term(expression_t& lhs, char op, expression_t operand)
{
lhs = addition_t { lhs, op, operand };
}
template <typename Iterator>
struct mini_grammar : qi::grammar<Iterator, expression_t(), qi::space_type>
{
qi::rule<Iterator, std::string(), qi::space_type> id;
qi::rule<Iterator, expression_t(), qi::space_type> addition, expression, simple;
qi::rule<Iterator, number_t(), qi::space_type> number;
qi::rule<Iterator, call_t(), qi::space_type> call;
qi::rule<Iterator, unary_t(), qi::space_type> unary;
mini_grammar() : mini_grammar::base_type(expression)
{
expression = addition | simple;
addition = simple [ qi::_val = qi::_1 ] >>
+(
(qi::char_("+-") > simple) [ phx::bind(&append_term, qi::_val, qi::_1, qi::_2) ]
);
simple = '(' > expression > ')' | call | unary | number;
call = id >> *expression;
unary = qi::char_("-+") > expression;
// terminals
id = qi::lexeme[+qi::char_("a-z")];
number = qi::double_;
BOOST_SPIRIT_DEBUG_NODE(expression);
BOOST_SPIRIT_DEBUG_NODE(call);
BOOST_SPIRIT_DEBUG_NODE(addition);
BOOST_SPIRIT_DEBUG_NODE(simple);
BOOST_SPIRIT_DEBUG_NODE(unary);
BOOST_SPIRIT_DEBUG_NODE(id);
BOOST_SPIRIT_DEBUG_NODE(number);
}
};
std::string read_input(std::istream& stream) {
return std::string(
std::istreambuf_iterator<char>(stream),
std::istreambuf_iterator<char>());
}
int main() {
std::cin.unsetf(std::ios::skipws);
std::string const code = read_input(std::cin);
auto begin = code.begin();
auto end = code.end();
try {
mini_grammar<decltype(end)> grammar;
qi::space_type space;
std::vector<expression_t> script;
bool ok = qi::phrase_parse(begin, end, *(grammar > ';'), space, script);
if (begin!=end)
std::cerr << "Unparsed: '" << std::string(begin,end) << "'\n";
std::cout << std::boolalpha << "Success: " << ok << "\n";
if (ok)
{
for (auto& expr : script)
std::cout << "AST: " << expr << '\n';
}
}
catch (qi::expectation_failure<decltype(end)> const& ex) {
std::cout << "Failure; parsing stopped after \""
<< std::string(ex.first, ex.last) << "\"\n";
}
}
Alternative:
I have an alternative version that build addition_t iteratively instead of recursively, so to say:
struct term_t
{
char binop;
expression_t rhs;
};
struct addition_t
{
expression_t lhs;
std::vector<term_t> terms;
};
This removes the need to use Phoenix to build the expression:
addition = simple >> +term;
term = qi::char_("+-") > simple;