I am facing these below issues when I upgrade from boost_1_73 and c++14 to boost_1_77 and c++17.
What will be the problem?
**Error 1:**
include/boost/utility/result_of.hpp:218:8: error: 'GB* (boost::intrusive_ptr::*)() const noexcept' is not a class, struct, or union type
**Error 2:**
include/boost/phoenix/core/detail/function_eval.hpp:119:21: error: no type named 'type' in 'struct boost::result_of<GB* (boost::intrusive_ptr::* const(boost::intrusive_ptr&))() const noexcept>'
Here is a snip of code causing the issue but can't share more code sorry.
run = qi::lit("g_d")[qi::_val = phoenix::new_<GB>()] > qi::lit("{") >
-(*iPtr)[phoenix::bind(&ECProperty::addToList,
phoenix::bind(&GBPtr::get, qi::_val), qi::_1)] >
+(&!qi::lit("}") > widthAndHeight(qi::_val)) > qi::lit("}");
Like I said, there's not enough information. Let me just use the crystal ball and assume types:
struct ECProperty {
virtual ~ECProperty() = default;
void addToList(int i) { _list.push_back(i); }
void addWH(int width, int height) { _dimensions.emplace_back(width, height); }
std::vector<int> _list;
std::vector<std::tuple<int, int> > _dimensions;
};
struct GB : ECProperty {
};
Now like I said, GBPtr cannot be std::shared_ptr, boost::shared_ptr or even boost::scoped_ptr because they lack implicit conversion (for good reason). So, we have to assume some kind of bespoke variation, let's call it DumbPointer:
template <typename T> struct DumbPointer : public std::shared_ptr<T> {
using SP = std::shared_ptr<T>;
using SP::SP;
// A smart pointer with implicit conversion constructor... not so smart
// Don't try this at home
/*implicit*/ DumbPointer(T* take_ownership = nullptr) : SP(take_ownership)
{ }
};
using GBPtr = DumbPointer<GB>;
Okay, now we can look at the grammar. Let's assume iPtr is really a "lazy rule" of some kind, so *iPtr is not a parser expression using Kleene-star, but really just derefences the pointer:
auto const* iPtr = std::addressof(qi::int_);
Next, let's assume withAndHeight is a parameterized rule (why? oh well apparently there was some reason):
qi::rule<It, void(GBPtr)> widthAndHeight;
So we can make the grammar complete and self-contained:
qi::rule<It, GBPtr(), qi::space_type> run = //
qi::lit("g_d")[qi::_val = px::new_<GB>()] > "{" //
> -(*iPtr)[ //
px::bind(&ECProperty::addToList, //
px::bind(&GBPtr::get, qi::_val),
qi::_1)] //
> +((&!qi::lit("}")) > widthAndHeight(qi::_val)) //
> qi::lit("}");
Note a few very minor simplifications.
Sidenote: &!qi::lit is funny, perhaps it should just have been !qi::lit (same effect). Also, assuming that withAndHeight cannot start with } anyways, it's completely redundant anyways.
Similarly, -(*iPtr) is just the same as *iPtr which is why I had to assume that iPtr was of pointer type.
Now, the compiler error reads:
/home/sehe/custom/boost_1_77_0/boost/utility/result_of.hpp|215 col 8
error: ‘GB* (std::__shared_ptr<GB, __gnu_cxx::_S_atomic>::*)() const noexcept’ is not a class, struct, or union type
It is clear that Phoenix's result_of doesn't accept a raw pointer-to-member-function there. No doubt, the noexcept throws it off (noexcept didn't exist back in the day). So, let's help the compiler using std::mem_fn:
qi::rule<It, GBPtr(), qi::space_type> run = //
qi::lit("g_d")[qi::_val = px::new_<GB>()] > "{" //
> -(*iPtr)[ //
px::bind(&ECProperty::addToList,
px::bind(std::mem_fn(&GBPtr::get), qi::_val), //
qi::_1)] //
> +((&!qi::lit("}")) > widthAndHeight(qi::_val)) //
> qi::lit("}");
Now it compiles. Good, let's imagine some useful definition of withAndHeight:
qi::rule<It, void(GBPtr)> widthAndHeight = //
(qi::int_ >> "x" >> qi::int_)[ //
px::bind(&ECProperty::addWH,
px::bind(std::mem_fn(&GBPtr::get), qi::_r1), qi::_1,
qi::_2)];
Now we can test:
int main()
{
using It = std::string::const_iterator;
auto const* iPtr = std::addressof(qi::int_);
qi::rule<It, void(GBPtr)> widthAndHeight = //
(qi::int_ >> "x" >> qi::int_)[ //
px::bind(&ECProperty::addWH,
px::bind(std::mem_fn(&GBPtr::get), qi::_r1), qi::_1,
qi::_2)];
qi::rule<It, GBPtr(), qi::space_type> run = //
qi::lit("g_d")[qi::_val = px::new_<GB>()] > "{" //
> -(*iPtr)[ //
px::bind(&ECProperty::addToList,
px::bind(std::mem_fn(&GBPtr::get), qi::_val), //
qi::_1)] //
> +((&!qi::lit("}")) > widthAndHeight(qi::_val)) //
> qi::lit("}");
BOOST_SPIRIT_DEBUG_NODES((run)(widthAndHeight))
for (std::string const s :
{
"",
"g_d { 42 1200x800 400x768 }",
}) //
{
fmt::print("===== '{}' =====\n", s);
It f = begin(s), l = end(s);
GBPtr val;
try {
if (phrase_parse(f, l, run, qi::space, val)) {
fmt::print("Parsed _list: {} _dimensions: {}\n", val->_list,
val->_dimensions);
} else {
fmt::print("Parse failed\n");
}
} catch(qi::expectation_failure<It> const& ef) {
fmt::print(stderr, "Expected {} at '{}'\n", ef.what_,
std::string(ef.first, ef.last));
}
if (f != l) {
fmt::print("Remaining input '{}'\n", std::string(f, l));
}
}
}
Which prints Live On Compiler Explorer:
===== '' =====
Parse failed
===== 'g_d { 42 1200x800 400x768 }' =====
Parsed _list: {42} _dimensions: {(1200, 800), (400, 768)}
Buttt... This Is C++17, So...
Perhaps you could simplify. An idea is to extract the phoenix actors:
auto const gb_ = px::bind(std::mem_fn(&GBPtr::get), _val);
qi::rule<It, GBPtr(), qi::space_type> run = //
qi::lit("g_d")[_val = px::new_<GB>()] > "{" //
> -(*iPtr)[px::bind(&ECProperty::addToList, gb_, _1)] //
> +widthAndHeight(_val) //
> "}";
Still the same output: https://compiler-explorer.com/z/ec331KW4W
However, that is a bit like turd polish. Let's embrace c++17 with polymorphic lambdas and CTAD:
px::function addWH = [](auto& ecp, int w, int h) { ecp->addWH(w, h); };
px::function addToList = [](auto& ecp, int i) { ecp->addToList(i); };
Or even better, without polymorphic lambdas:
px::function addWH = [](GB& p, int w, int h) { p.addWH(w, h); };
px::function addToList = [](GB& p, int i) { p.addToList(i); };
Now we can simply write:
qi::rule<It, void(GBPtr)> widthAndHeight = //
(qi::int_ >> "x" >> qi::int_)[addWH(*_r1, _1, _2)];
qi::rule<It, GBPtr(), qi::space_type> run = //
qi::lit("g_d")[_val = px::new_<GB>()] > "{" //
> -(*iPtr)[addToList(*_val, _1)] //
> +widthAndHeight(_val) //
> "}";
See it Live Again
Other Notes
Why have the inherited attribute? It complicates things:
using WandH = std::tuple<int, int>;
px::function addWH = [](GB& p, WandH wxh) { p.addWH(wxh); };
px::function addToList = [](GB& p, int i) { p.addToList(i); };
qi::rule<It, WandH()> widthAndHeight = //
qi::int_ >> "x" >> qi::int_;
qi::rule<It, GBPtr(), qi::space_type> run = //
qi::lit("g_d")[_val = px::new_<GB>()] > "{" //
> -(*iPtr)[addToList(*_val, _1)] //
> +widthAndHeight[addWH(*_val, _1)] //
> "}";
That's strictly more consistent. See it Live Again
In fact, why not just do it without any semantic action:
struct ECProperty {
std::vector<int> lst;
std::vector<WandH> dims;
};
BOOST_FUSION_ADAPT_STRUCT(ECProperty, lst, dims)
Now you can simply:
qi::rule<Iterator, WandH()> widthAndHeight = qi::int_ >> "x" >> qi::int_;
qi::rule<Iterator, ECProperty(), qi::space_type> run = qi::eps //
> "g_d" > '{' //
> qi::repeat(0, 1)[*iPtr] //
> +widthAndHeight //
> '}';
And still have the exact same output: https://compiler-explorer.com/z/3PqYMEqqP. Full listing for reference:
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
using WandH = std::tuple<int, int>;
struct ECProperty {
std::vector<int> lst;
std::vector<WandH> dims;
};
BOOST_FUSION_ADAPT_STRUCT(ECProperty, lst, dims)
int main()
{
using Iterator = std::string::const_iterator;
auto const* iPtr = std::addressof(qi::int_);
qi::rule<Iterator, WandH()> widthAndHeight = qi::int_ >> "x" >> qi::int_;
qi::rule<Iterator, ECProperty(), qi::space_type> run = qi::eps //
> "g_d" > '{' //
> qi::repeat(0, 1)[*iPtr] //
> +widthAndHeight //
> '}';
BOOST_SPIRIT_DEBUG_NODES((run)(widthAndHeight))
for (std::string const s :
{
"",
"g_d { 42 1200x800 400x768 }",
}) //
{
fmt::print("===== '{}' =====\n", s);
Iterator f = begin(s), l = end(s);
ECProperty val;
try {
if (phrase_parse(f, l, run, qi::space, val)) {
fmt::print("Parsed lst: {} dims: {}\n", val.lst, val.dims);
} else {
fmt::print("Parse failed\n");
}
} catch(qi::expectation_failure<Iterator> const& ef) {
fmt::print(stderr, "Expected {} at '{}'\n", ef.what_,
std::string(ef.first, ef.last));
}
if (f != l) {
fmt::print("Remaining input '{}'\n", std::string(f, l));
}
}
}
Related
This does not compile (code below).
There was another question here with the same error. But I don't understand the answer. I already tried inserting qi::eps in places -- but without success.
I also tried already adding meta functions (boost::spirit::raits::is_container) for the types used -- but this also does not help.
I also tried using the same variant containing all types I need to use everywhere. Same problem.
Has anybody gotten this working for a lexer returning something else than double or int or string? And for the parser also returning non-trivial objects?
I've tried implementing semantic functions everywhere returning default objects. But this also does not help.
Here comes the code:
// spirit_error.cpp : Defines the entry point for the console application.
//
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/phoenix/object.hpp>
#include <boost/spirit/include/qi_char_class.hpp>
#include <boost/spirit/include/phoenix_bind.hpp>
#include <boost/mpl/index_of.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/intrusive_ptr.hpp>
#include <boost/smart_ptr/intrusive_ref_counter.hpp>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace frank
{
class ref_counter:public boost::intrusive_ref_counter<ref_counter>
{ public:
virtual ~ref_counter(void)
{
}
};
class symbol:public ref_counter
{ public:
typedef boost::intrusive_ptr<const symbol> symbolPtr;
typedef std::vector<symbolPtr> symbolVector;
struct push_scope
{ push_scope()
{
}
~push_scope(void)
{
}
};
};
class nature:public symbol
{ public:
enum enumAttribute
{ eAbstol,
eAccess,
eDDT,
eIDT,
eUnits
};
struct empty
{ bool operator<(const empty&) const
{ return false;
}
friend std::ostream &operator<<(std::ostream &_r, const empty&)
{ return _r;
}
};
typedef boost::variant<empty, std::string> attributeValue;
};
class discipline:public symbol
{ public:
enum enumDomain
{ eDiscrete,
eContinuous
};
};
class type:public ref_counter
{ public:
typedef boost::intrusive_ptr<type> typePtr;
};
struct myIterator:std::iterator<std::random_access_iterator_tag, char, std::ptrdiff_t, const char*, const char&>
{ std::string *m_p;
std::size_t m_iPos;
myIterator(void)
:m_p(nullptr),
m_iPos(~std::size_t(0))
{
}
myIterator(std::string &_r, const bool _bEnd = false)
:m_p(&_r),
m_iPos(_bEnd ? ~std::size_t(0) : 0)
{
}
myIterator(const myIterator &_r)
:m_p(_r.m_p),
m_iPos(_r.m_iPos)
{
}
myIterator &operator=(const myIterator &_r)
{ if (this != &_r)
{ m_p = _r.m_p;
m_iPos = _r.m_iPos;
}
return *this;
}
const char &operator*(void) const
{ return m_p->at(m_iPos);
}
bool operator==(const myIterator &_r) const
{ return m_p == _r.m_p && m_iPos == _r.m_iPos;
}
bool operator!=(const myIterator &_r) const
{ return m_p != _r.m_p || m_iPos != _r.m_iPos;
}
myIterator &operator++(void)
{ ++m_iPos;
if (m_iPos == m_p->size())
m_iPos = ~std::size_t(0);
return *this;
}
myIterator operator++(int)
{ const myIterator s(*this);
operator++();
return s;
}
myIterator &operator--(void)
{ --m_iPos;
return *this;
}
myIterator operator--(int)
{ const myIterator s(*this);
operator--();
return s;
}
bool operator<(const myIterator &_r) const
{ if (m_p == _r.m_p)
return m_iPos < _r.m_iPos;
else
return m_p < _r.m_p;
}
std::ptrdiff_t operator-(const myIterator &_r) const
{ return m_iPos - _r.m_iPos;
}
};
struct onInclude
{ auto operator()(myIterator &_rStart, myIterator &_rEnd) const
{ // erase what has been matched (the include statement)
_rStart.m_p->erase(_rStart.m_iPos, _rEnd.m_iPos - _rStart.m_iPos);
// and insert the contents of the file
_rStart.m_p->insert(_rStart.m_iPos, "abcd");
_rEnd = _rStart;
return lex::pass_flags::pass_ignore;
}
};
template<typename LEXER>
class lexer:public lex::lexer<LEXER>
{ public:
lex::token_def<type::typePtr> m_sKW_real, m_sKW_integer, m_sKW_string;
lex::token_def<lex::omit> m_sLineComment, m_sCComment;
lex::token_def<lex::omit> m_sWS;
lex::token_def<lex::omit> m_sSemicolon, m_sEqual, m_sColon, m_sInclude, m_sCharOP, m_sCharCP,
m_sComma;
lex::token_def<std::string> m_sIdentifier, m_sString;
lex::token_def<double> m_sReal;
lex::token_def<int> m_sInteger;
lex::token_def<lex::omit> m_sKW_units, m_sKW_access, m_sKW_idt_nature, m_sKW_ddt_nature, m_sKW_abstol,
m_sKW_nature, m_sKW_endnature, m_sKW_continuous, m_sKW_discrete,
m_sKW_potential, m_sKW_flow, m_sKW_domain, m_sKW_discipline, m_sKW_enddiscipline, m_sKW_module,
m_sKW_endmodule, m_sKW_parameter;
//typedef const type *typePtr;
template<typename T>
struct extractValue
{ T operator()(const myIterator &_rStart, const myIterator &_rEnd) const
{ return boost::lexical_cast<T>(std::string(_rStart, _rEnd));
}
};
struct extractString
{ std::string operator()(const myIterator &_rStart, const myIterator &_rEnd) const
{ const auto s = std::string(_rStart, _rEnd);
return s.substr(1, s.size() - 2);
}
};
lexer(void)
:m_sWS("[ \\t\\n\\r]+"),
m_sKW_parameter("\"parameter\""),
m_sKW_real("\"real\""),
m_sKW_integer("\"integer\""),
m_sKW_string("\"string\""),
m_sLineComment("\\/\\/[^\\n]*"),
m_sCComment("\\/\\*"
"("
"[^*]"
"|" "[\\n]"
"|" "([*][^/])"
")*"
"\\*\\/"),
m_sSemicolon("\";\""),
m_sEqual("\"=\""),
m_sColon("\":\""),
m_sCharOP("\"(\""),
m_sCharCP("\")\""),
m_sComma("\",\""),
m_sIdentifier("[a-zA-Z_]+[a-zA-Z0-9_]*"),
m_sString("[\\\"]"
//"("
// "(\\[\"])"
// "|"
//"[^\"]"
//")*"
"[^\\\"]*"
"[\\\"]"),
m_sKW_units("\"units\""),
m_sKW_access("\"access\""),
m_sKW_idt_nature("\"idt_nature\""),
m_sKW_ddt_nature("\"ddt_nature\""),
m_sKW_abstol("\"abstol\""),
m_sKW_nature("\"nature\""),
m_sKW_endnature("\"endnature\""),
m_sKW_continuous("\"continuous\""),
m_sKW_discrete("\"discrete\""),
m_sKW_domain("\"domain\""),
m_sKW_discipline("\"discipline\""),
m_sKW_enddiscipline("\"enddiscipline\""),
m_sKW_potential("\"potential\""),
m_sKW_flow("\"flow\""),
//realnumber ({uint}{exponent})|((({uint}\.{uint})|(\.{uint})){exponent}?)
//exponent [Ee][+-]?{uint}
//uint [0-9][_0-9]*
m_sReal("({uint}{exponent})"
"|"
"("
"(({uint}[\\.]{uint})|([\\.]{uint})){exponent}?"
")"
),
m_sInteger("{uint}"),
m_sInclude("\"`include\""),
m_sKW_module("\"module\""),
m_sKW_endmodule("\"endmodule\"")
{ this->self.add_pattern
("uint", "[0-9]+")
("exponent", "[eE][\\+\\-]?{uint}");
this->self = m_sSemicolon
| m_sEqual
| m_sColon
| m_sCharOP
| m_sCharCP
| m_sComma
| m_sString[lex::_val = boost::phoenix::bind(extractString(), lex::_start, lex::_end)]
| m_sKW_real//[lex::_val = boost::phoenix::bind(&type::getReal)]
| m_sKW_integer//[lex::_val = boost::phoenix::bind(&type::getInteger)]
| m_sKW_string//[lex::_val = boost::phoenix::bind(&type::getString)]
| m_sKW_parameter
| m_sKW_units
| m_sKW_access
| m_sKW_idt_nature
| m_sKW_ddt_nature
| m_sKW_abstol
| m_sKW_nature
| m_sKW_endnature
| m_sKW_continuous
| m_sKW_discrete
| m_sKW_domain
| m_sKW_discipline
| m_sKW_enddiscipline
| m_sReal[lex::_val = boost::phoenix::bind(extractValue<double>(), lex::_start, lex::_end)]
| m_sInteger[lex::_val = boost::phoenix::bind(extractValue<int>(), lex::_start, lex::_end)]
| m_sKW_potential
| m_sKW_flow
| m_sKW_module
| m_sKW_endmodule
| m_sIdentifier
| m_sInclude [ lex::_state = "INCLUDE" ]
;
this->self("INCLUDE") += m_sString [
lex::_state = "INITIAL", lex::_pass = boost::phoenix::bind(onInclude(), lex::_start, lex::_end)
];
this->self("WS") = m_sWS
| m_sLineComment
| m_sCComment
;
}
};
template<typename Iterator, typename Lexer>
class natureParser:public qi::grammar<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> >
{ qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sStart;
qi::rule<Iterator, std::pair<nature::enumAttribute, nature::attributeValue>(void), qi::in_state_skipper<Lexer> > m_sProperty;
qi::rule<Iterator, std::string(), qi::in_state_skipper<Lexer> > m_sName;
public:
template<typename Tokens>
natureParser(const Tokens &_rTokens)
:natureParser::base_type(m_sStart)
{ m_sProperty = (_rTokens.m_sKW_units
>> _rTokens.m_sEqual
>> _rTokens.m_sString
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_access
>> _rTokens.m_sEqual
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_idt_nature
>> _rTokens.m_sEqual
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_ddt_nature
>> _rTokens.m_sEqual
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_abstol
>> _rTokens.m_sEqual
>> _rTokens.m_sReal
>> _rTokens.m_sSemicolon
)
;
m_sName = (_rTokens.m_sColon >> _rTokens.m_sIdentifier);
m_sStart = (_rTokens.m_sKW_nature
>> _rTokens.m_sIdentifier
>> -m_sName
>> _rTokens.m_sSemicolon
>> *(m_sProperty)
>> _rTokens.m_sKW_endnature
);
m_sStart.name("start");
m_sProperty.name("property");
}
};
/*
// Conservative discipline
discipline electrical;
potential Voltage;
flow Current;
enddiscipline
*/
// a parser for a discipline declaration
template<typename Iterator, typename Lexer>
class disciplineParser:public qi::grammar<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> >
{ qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sStart;
typedef std::pair<bool, boost::intrusive_ptr<const nature> > CPotentialAndNature;
struct empty
{ bool operator<(const empty&) const
{ return false;
}
friend std::ostream &operator<<(std::ostream &_r, const empty&)
{ return _r;
}
};
typedef boost::variant<empty, CPotentialAndNature, discipline::enumDomain> property;
qi::rule<Iterator, discipline::enumDomain(), qi::in_state_skipper<Lexer> > m_sDomain;
qi::rule<Iterator, property(void), qi::in_state_skipper<Lexer> > m_sProperty;
public:
template<typename Tokens>
disciplineParser(const Tokens &_rTokens)
:disciplineParser::base_type(m_sStart)
{ m_sDomain = _rTokens.m_sKW_continuous
| _rTokens.m_sKW_discrete
;
m_sProperty = (_rTokens.m_sKW_potential >> _rTokens.m_sIdentifier >> _rTokens.m_sSemicolon)
| (_rTokens.m_sKW_flow >> _rTokens.m_sIdentifier >> _rTokens.m_sSemicolon)
| (_rTokens.m_sKW_domain >> m_sDomain >> _rTokens.m_sSemicolon)
;
m_sStart = (_rTokens.m_sKW_discipline
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
>> *m_sProperty
>> _rTokens.m_sKW_enddiscipline
);
}
};
template<typename Iterator, typename Lexer>
class moduleParser:public qi::grammar<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> >
{ public:
qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sStart;
qi::rule<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> > m_sModulePortList;
qi::rule<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> > m_sPortList;
qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sPort;
qi::rule<Iterator, std::shared_ptr<symbol::push_scope>(void), qi::in_state_skipper<Lexer> > m_sModule;
typedef boost::intrusive_ptr<const ref_counter> intrusivePtr;
typedef std::vector<intrusivePtr> vectorOfPtr;
qi::rule<Iterator, vectorOfPtr(void), qi::in_state_skipper<Lexer> > m_sModuleItemList;
qi::rule<Iterator, intrusivePtr(void), qi::in_state_skipper<Lexer> > m_sParameter;
qi::rule<Iterator, intrusivePtr(void), qi::in_state_skipper<Lexer> > m_sModuleItem;
qi::rule<Iterator, type::typePtr(void), qi::in_state_skipper<Lexer> > m_sType;
template<typename Tokens>
moduleParser(const Tokens &_rTokens)
:moduleParser::base_type(m_sStart)
{ m_sPort = _rTokens.m_sIdentifier;
m_sPortList %= m_sPort % _rTokens.m_sComma;
m_sModulePortList %= _rTokens.m_sCharOP >> m_sPortList >> _rTokens.m_sCharCP;
m_sModule = _rTokens.m_sKW_module;
m_sType = _rTokens.m_sKW_real | _rTokens.m_sKW_integer | _rTokens.m_sKW_string;
m_sParameter = _rTokens.m_sKW_parameter
>> m_sType
>> _rTokens.m_sIdentifier
;
m_sModuleItem = m_sParameter;
m_sModuleItemList %= *m_sModuleItem;
m_sStart = (m_sModule
>> _rTokens.m_sIdentifier
>> m_sModulePortList
>> m_sModuleItemList
>> _rTokens.m_sKW_endmodule);
}
};
template<typename Iterator, typename Lexer>
class fileParser:public qi::grammar<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> >
{ public:
disciplineParser<Iterator, Lexer> m_sDiscipline;
natureParser<Iterator, Lexer> m_sNature;
moduleParser<Iterator, Lexer> m_sModule;
qi::rule<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> > m_sStart;
qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sItem;
//public:
template<typename Tokens>
fileParser(const Tokens &_rTokens)
:fileParser::base_type(m_sStart),
m_sNature(_rTokens),
m_sDiscipline(_rTokens),
m_sModule(_rTokens)
{ m_sItem = m_sDiscipline | m_sNature | m_sModule;
m_sStart = *m_sItem;
}
};
}
int main()
{ std::string sInput = "\
nature Current;\n\
units = \"A\";\n\
access = I;\n\
idt_nature = Charge;\n\
abstol = 1e-12;\n\
endnature\n\
\n\
// Charge in coulombs\n\
nature Charge;\n\
units = \"coul\";\n\
access = Q;\n\
ddt_nature = Current;\n\
abstol = 1e-14;\n\
endnature\n\
\n\
// Potential in volts\n\
nature Voltage;\n\
units = \"V\";\n\
access = V;\n\
idt_nature = Flux;\n\
abstol = 1e-6;\n\
endnature\n\
\n\
discipline electrical;\n\
potential Voltage;\n\
flow Current;\n\
enddiscipline\n\
";
typedef lex::lexertl::token<frank::myIterator, boost::mpl::vector<frank::type::typePtr, std::string, double, int> > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
typedef frank::lexer<lexer_type>::iterator_type iterator_type;
typedef frank::fileParser<iterator_type, frank::lexer<lexer_type>::lexer_def> grammar_type;
frank::lexer<lexer_type> sLexer;
grammar_type sParser(sLexer);
frank::symbol::push_scope sPush;
auto pStringBegin = frank::myIterator(sInput);
auto pBegin(sLexer.begin(pStringBegin, frank::myIterator(sInput, true)));
const auto b = qi::phrase_parse(pBegin, sLexer.end(), sParser, qi::in_state("WS")[sLexer.self]);
}
Has anybody gotten this working for a lexer returning something else than double or int or string?
Sure. Simple examples might be found on this site
And for the parser also returning non-trivial objects?
Here's your real problem. Spirit is nice for a subset of parsers that are expressed easily in a eDSL, and has the huge benefit of "magically" mapping to a selection of attributes.
Some of the realities are:
attributes are expected to have value-semantic; using polymorphic attributes is hard (How can I use polymorphic attributes with boost::spirit::qi parsers?, e.g.)
using Lex makes most of the sweet-spot disappear since all "highlevel" parsers (like real_parser, [u]int_parser) are out the window. The Spirit devs are on record they prefer not to use Lex. Moreover, Spirit X3 doesn't have Lex support anymore.
Background Information:
I'd very much consider parsing the source as-is, into direct value-typed AST nodes. I know, this is probably what you consider "trivial objects", but don't be deceived by apparent simplicity: recursive variant trees have some expressive power.
Examples
Here's a trivial AST to represent JSON in <20 LoC: Boost Karma generator for composition of classes¹
Here we represent the Graphviz source format with full fidelity: How to use boost spirit list operator with mandatory minimum amount of elements?
I've since created the code to transform that AST into a domain representation with fully correct ownership, cascading lexically scoped node/edge attributes and cross references. I have just recovered that work and put it up on github if you're interested, mainly because the task is pretty similar in many respects, like the overriding/inheriting of properties and resolving identifiers within scopes: https://github.com/sehe/spirit-graphviz/blob/master/spirit-graphviz.cpp#L660
Suggestions, Ideas
In your case I'd take similar approach to retain simplicity. The code shown doesn't (yet) cover the trickiest ingredients (like nature attribute overrides within a discipline).
Once you start implementing use-cases like resolving compatible disciplines and the absolute tolerances at a given node, you want a domain model with full fidelity. Preferrably, there would be no loss of source information, and immutable AST information².
As a middle ground, you could probably avoid building an entire source-AST in memory only to transform it in one big go, at the top-level you could have:
file = qi::skip(skipper) [
*(m_sDiscipline | m_sNature | m_sModule) [process_ast(_1)]
];
Where process_ast would apply the "trivial" AST representation into the domain types, one at a time. That way you keep only small bits of temporary AST representation around.
The domain representation can be arbitrarily sophisticated to support all your logic and use-cases.
Let's "Show, Don't Tell"
Baking the simplest AST that comes to mind matching the grammar³:
namespace frank { namespace ast {
struct nature {
struct empty{};
std::string name;
std::string inherits;
enum class Attribute { units, access, idt, ddt, abstol };
using Value = boost::variant<int, double, std::string>;
std::map<Attribute, Value> attributes;
};
struct discipline {
enum enumDomain { eUnspecified, eDiscrete, eContinuous };
struct properties_t {
enumDomain domain = eUnspecified;
boost::optional<std::string> flow, potential;
};
std::string name;
properties_t properties;
};
// TODO
using module = qi::unused_type;
using file = std::vector<boost::variant<nature, discipline, module> >;
enum class type { real, integer, string };
} }
This is trivial and maps 1:1 onto the grammar productions, which means we have very little impedance.
Tokens? We Don't Need Lex For That
You can have common token parsers without requiring the complexities of Lex
Yes, Lex (especially statically generated) can potentially improve performance, but
if you need that, I wager Spirit Qi is not your best option anyways
premature optimization...
What I did:
struct tokens {
// implicit lexemes
qi::rule<It, std::string()> string, identifier;
qi::rule<It, double()> real;
qi::rule<It, int()> integer;
qi::rule<It, ast::nature::Value()> value;
qi::rule<It, ast::nature::Attribute()> attribute;
qi::rule<It, ast::discipline::enumDomain()> domain;
struct attribute_sym_t : qi::symbols<char, ast::nature::Attribute> {
attribute_sym_t() {
this->add
("units", ast::nature::Attribute::units)
("access", ast::nature::Attribute::access)
("idt_nature", ast::nature::Attribute::idt)
("ddt_nature", ast::nature::Attribute::ddt)
("abstol", ast::nature::Attribute::abstol);
}
} attribute_sym;
struct domain_sym_t : qi::symbols<char, ast::discipline::enumDomain> {
domain_sym_t() {
this->add
("discrete", ast::discipline::eDiscrete)
("continuous", ast::discipline::eContinuous);
}
} domain_sym;
tokens() {
using namespace qi;
auto kw = qr::distinct(copy(char_("a-zA-Z0-9_")));
string = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
real = double_;
integer = int_;
attribute = kw[attribute_sym];
domain = kw[domain_sym];
value = string | identifier | real | integer;
BOOST_SPIRIT_DEBUG_NODES((string)(identifier)(real)(integer)(value)(domain)(attribute))
}
};
Liberating, isn't it? Note how
all attributes are automatically propagated
strings handle escapes (this bit was commented out in your Lex approach). We don't even need semantic actions to (badly) pry out the unquoted/unescaped value
we used distinct to ensure keyword parsing matches only full identifiers. (See How to parse reserved words correctly in boost spirit).
This is actually where you notice the lack of separate lexer.
On the flipside, this makes context-sensitive keywords a breeze (lex can easily prioritizes keywords over identifiers that occur in places where keywords cannot occur.⁴)
What About Skipping Space/Comments?
We could have added a token, but for reasons of convention I made it a parser:
struct skipParser : qi::grammar<It> {
skipParser() : skipParser::base_type(spaceOrComment) {
using namespace qi;
spaceOrComment = space
| ("//" >> *(char_ - eol) >> (eoi|eol))
| ("/*" >> *(char_ - "*/") >> "*/");
BOOST_SPIRIT_DEBUG_NODES((spaceOrComment))
}
private:
qi::rule<It> spaceOrComment;
};
natureParser
We inherit our AST parsers from tokens:
struct natureParser : tokens, qi::grammar<It, ast::nature(), skipParser> {
And from there it is plain sailing:
property = attribute >> '=' >> value >> ';';
nature
= kw["nature"] >> identifier >> -(':' >> identifier) >> ';'
>> *property
>> kw["endnature"];
disciplineParser
discipline = kw["discipline"] >> identifier >> ';'
>> properties
>> kw["enddiscipline"]
;
properties
= kw["domain"] >> domain >> ';'
^ kw["flow"] >> identifier >> ';'
^ kw["potential"] >> identifier >> ';'
;
This shows a competing approach that uses the permutation operator (^) to parse optional alternatives in any order into a fixed frank::ast::discipline properties struct. Of course, you might elect to have a more generic representation here, like we had with ast::nature.
Module AST is left as an exercise for the reader, though the parser rules are implemented below.
Top Level, Encapsulating The Skipper
I hate having to specify the skipper from the calling code (it's more complex than required, and changing the skipper changes the grammar). So, I encapsulate it in the top-level parser:
struct fileParser : qi::grammar<It, ast::file()> {
fileParser() : fileParser::base_type(file) {
file = qi::skip(qi::copy(m_sSkip)) [
*(m_sDiscipline | m_sNature | m_sModule)
];
BOOST_SPIRIT_DEBUG_NODES((file))
}
private:
disciplineParser m_sDiscipline;
natureParser m_sNature;
moduleParser m_sModule;
skipParser m_sSkip;
qi::rule<It, ast::file()> file;
};
Demo Time
This demo adds operator<< for the enums, and a variant visitor to print some AST details for debug/demonstrational purposes (print_em).
Then we have a test driver:
int main() {
using iterator_type = std::string::const_iterator;
iterator_type iter = sInput.begin(), last = sInput.end();
frank::Parsers<iterator_type>::fileParser parser;
print_em print;
frank::ast::file file;
bool ok = qi::parse(iter, last, parser, file);
if (ok) {
for (auto& symbol : file)
print(symbol);
}
else {
std::cout << "Parse failed\n";
}
if (iter != last) {
std::cout << "Remaining unparsed: '" << std::string(iter,last) << "'\n";
}
}
With the sample input from your question we get the following output:
Live On Coliru
-- Nature
name: Current
inherits:
attribute: units = A
attribute: access = I
attribute: idt = Charge
attribute: abstol = 1e-12
-- Nature
name: Charge
inherits:
attribute: units = coul
attribute: access = Q
attribute: ddt = Current
attribute: abstol = 1e-14
-- Nature
name: Voltage
inherits:
attribute: units = V
attribute: access = V
attribute: idt = Flux
attribute: abstol = 1e-06
-- Discipline
name: electrical
domain: (unspecified)
flow: Current
potential: Voltage
Remaining unparsed: '
'
With BOOST_SPIRIT_DEBUG defined, you get rich debug information: Live On Coliru
Full Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <map>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapted.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace qi = boost::spirit::qi;
namespace frank { namespace ast {
struct nature {
struct empty{};
std::string name;
std::string inherits;
enum class Attribute { units, access, idt, ddt, abstol };
using Value = boost::variant<int, double, std::string>;
std::map<Attribute, Value> attributes;
};
struct discipline {
enum enumDomain { eUnspecified, eDiscrete, eContinuous };
struct properties_t {
enumDomain domain = eUnspecified;
boost::optional<std::string> flow, potential;
};
std::string name;
properties_t properties;
};
// TODO
using module = qi::unused_type;
using file = std::vector<boost::variant<nature, discipline, module> >;
enum class type { real, integer, string };
} }
BOOST_FUSION_ADAPT_STRUCT(frank::ast::nature, name, inherits, attributes)
BOOST_FUSION_ADAPT_STRUCT(frank::ast::discipline, name, properties)
BOOST_FUSION_ADAPT_STRUCT(frank::ast::discipline::properties_t, domain, flow, potential)
namespace frank {
namespace qr = boost::spirit::repository::qi;
template <typename It> struct Parsers {
struct tokens {
// implicit lexemes
qi::rule<It, std::string()> string, identifier;
qi::rule<It, double()> real;
qi::rule<It, int()> integer;
qi::rule<It, ast::nature::Value()> value;
qi::rule<It, ast::nature::Attribute()> attribute;
qi::rule<It, ast::discipline::enumDomain()> domain;
struct attribute_sym_t : qi::symbols<char, ast::nature::Attribute> {
attribute_sym_t() {
this->add
("units", ast::nature::Attribute::units)
("access", ast::nature::Attribute::access)
("idt_nature", ast::nature::Attribute::idt)
("ddt_nature", ast::nature::Attribute::ddt)
("abstol", ast::nature::Attribute::abstol);
}
} attribute_sym;
struct domain_sym_t : qi::symbols<char, ast::discipline::enumDomain> {
domain_sym_t() {
this->add
("discrete", ast::discipline::eDiscrete)
("continuous", ast::discipline::eContinuous);
}
} domain_sym;
tokens() {
using namespace qi;
auto kw = qr::distinct(copy(char_("a-zA-Z0-9_")));
string = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
real = double_;
integer = int_;
attribute = kw[attribute_sym];
domain = kw[domain_sym];
value = string | identifier | real | integer;
BOOST_SPIRIT_DEBUG_NODES((string)(identifier)(real)(integer)(value)(domain)(attribute))
}
};
struct skipParser : qi::grammar<It> {
skipParser() : skipParser::base_type(spaceOrComment) {
using namespace qi;
spaceOrComment = space
| ("//" >> *(char_ - eol) >> (eoi|eol))
| ("/*" >> *(char_ - "*/") >> "*/");
BOOST_SPIRIT_DEBUG_NODES((spaceOrComment))
}
private:
qi::rule<It> spaceOrComment;
};
struct natureParser : tokens, qi::grammar<It, ast::nature(), skipParser> {
natureParser() : natureParser::base_type(nature) {
using namespace qi;
auto kw = qr::distinct(copy(char_("a-zA-Z0-9_")));
property = attribute >> '=' >> value >> ';';
nature
= kw["nature"] >> identifier >> -(':' >> identifier) >> ';'
>> *property
>> kw["endnature"];
BOOST_SPIRIT_DEBUG_NODES((nature)(property))
}
private:
using Attribute = std::pair<ast::nature::Attribute, ast::nature::Value>;
qi::rule<It, ast::nature(), skipParser> nature;
qi::rule<It, Attribute(), skipParser> property;
using tokens::attribute;
using tokens::value;
using tokens::identifier;
};
struct disciplineParser : tokens, qi::grammar<It, ast::discipline(), skipParser> {
disciplineParser() : disciplineParser::base_type(discipline) {
auto kw = qr::distinct(qi::copy(qi::char_("a-zA-Z0-9_")));
discipline = kw["discipline"] >> identifier >> ';'
>> properties
>> kw["enddiscipline"]
;
properties
= kw["domain"] >> domain >> ';'
^ kw["flow"] >> identifier >> ';'
^ kw["potential"] >> identifier >> ';'
;
BOOST_SPIRIT_DEBUG_NODES((discipline)(properties))
}
private:
qi::rule<It, ast::discipline(), skipParser> discipline;
qi::rule<It, ast::discipline::properties_t(), skipParser> properties;
using tokens::domain;
using tokens::identifier;
};
struct moduleParser : tokens, qi::grammar<It, ast::module(), skipParser> {
moduleParser() : moduleParser::base_type(module) {
auto kw = qr::distinct(qi::copy(qi::char_("a-zA-Z0-9_")));
m_sPort = identifier;
m_sPortList = m_sPort % ',';
m_sModulePortList = '(' >> m_sPortList >> ')';
m_sModule = kw["module"];
m_sType = kw["real"] | kw["integer"] | kw["string"];
m_sParameter = kw["parameter"] >> m_sType >> identifier;
m_sModuleItem = m_sParameter;
m_sModuleItemList = *m_sModuleItem;
module =
(m_sModule >> identifier >> m_sModulePortList >> m_sModuleItemList >> kw["endmodule"]);
}
private:
qi::rule<It, ast::module(), skipParser> module;
qi::rule<It, skipParser> m_sModulePortList;
qi::rule<It, skipParser> m_sPortList;
qi::rule<It, skipParser> m_sPort;
qi::rule<It, skipParser> m_sModule;
qi::rule<It, skipParser> m_sModuleItemList;
qi::rule<It, skipParser> m_sParameter;
qi::rule<It, skipParser> m_sModuleItem;
qi::rule<It, skipParser> m_sType;
using tokens::identifier;
};
struct fileParser : qi::grammar<It, ast::file()> {
fileParser() : fileParser::base_type(file) {
file = qi::skip(qi::copy(m_sSkip)) [
*(m_sDiscipline | m_sNature | m_sModule)
];
BOOST_SPIRIT_DEBUG_NODES((file))
}
private:
disciplineParser m_sDiscipline;
natureParser m_sNature;
moduleParser m_sModule;
skipParser m_sSkip;
qi::rule<It, ast::file()> file;
};
};
}
extern std::string const sInput;
// just for demo
#include <boost/optional/optional_io.hpp>
namespace frank { namespace ast {
//static inline std::ostream &operator<<(std::ostream &os, const nature::empty &) { return os; }
static inline std::ostream &operator<<(std::ostream &os, nature::Attribute a) {
switch(a) {
case nature::Attribute::units: return os << "units";
case nature::Attribute::access: return os << "access";
case nature::Attribute::idt: return os << "idt";
case nature::Attribute::ddt: return os << "ddt";
case nature::Attribute::abstol: return os << "abstol";
};
return os << "?";
}
static inline std::ostream &operator<<(std::ostream &os, discipline::enumDomain d) {
switch(d) {
case discipline::eDiscrete: return os << "discrete";
case discipline::eContinuous: return os << "continuous";
case discipline::eUnspecified: return os << "(unspecified)";
};
return os << "?";
}
} }
struct print_em {
using result_type = void;
template <typename V>
void operator()(V const& variant) const {
boost::apply_visitor(*this, variant);
}
void operator()(frank::ast::nature const& nature) const {
std::cout << "-- Nature\n";
std::cout << "name: " << nature.name << "\n";
std::cout << "inherits: " << nature.inherits << "\n";
for (auto& a : nature.attributes) {
std::cout << "attribute: " << a.first << " = " << a.second << "\n";
}
}
void operator()(frank::ast::discipline const& discipline) const {
std::cout << "-- Discipline\n";
std::cout << "name: " << discipline.name << "\n";
std::cout << "domain: " << discipline.properties.domain << "\n";
std::cout << "flow: " << discipline.properties.flow << "\n";
std::cout << "potential: " << discipline.properties.potential << "\n";
}
void operator()(frank::ast::module const&) const {
std::cout << "-- Module (TODO)\n";
}
};
int main() {
using iterator_type = std::string::const_iterator;
iterator_type iter = sInput.begin(), last = sInput.end();
frank::Parsers<iterator_type>::fileParser parser;
print_em print;
frank::ast::file file;
bool ok = parse(iter, last, parser, file);
if (ok) {
for (auto& symbol : file)
print(symbol);
}
else {
std::cout << "Parse failed\n";
}
if (iter != last) {
std::cout << "Remaining unparsed: '" << std::string(iter,last) << "'\n";
}
}
std::string const sInput = R"(
nature Current;
units = "A";
access = I;
idt_nature = Charge;
abstol = 1e-12;
endnature
// Charge in coulombs
nature Charge;
units = "coul";
access = Q;
ddt_nature = Current;
abstol = 1e-14;
endnature
// Potential in volts
nature Voltage;
units = "V";
access = V;
idt_nature = Flux;
abstol = 1e-6;
endnature
discipline electrical;
potential Voltage;
flow Current;
enddiscipline
)";
¹ incidentally, the other answer there demonstrates the "impedance mismatch" with polymorphic attributes and Spirit - this time on the Karma side of it
² (to prevent subtle bugs that depend on evaluation order or things like that, e.g.)
³ (gleaning some from here but not importing too much complexity that wasn't reflected in your Lex approach)
⁴ (In fact, this is where you'd need state-switching inside the grammar, an area notoriously underdeveloped and practically unusable in Spirit Lex: e.g. when it works how to avoid defining token which matchs everything in boost::spirit::lex or when it goes badly: Boost.Spirit SQL grammar/lexer failure)
One solution would be to use a std::string everywhere and define a boost::variant with everything needed but not use it anywhere in the parser or lexer directly but only serialize & deserialize it into/from the string.
Is this what the originators of boost::spirit intended?
Cannot figure out why this rule unary_msg doesnt work, it says the attribute type is qi::unused_type but this makes no sense to me. Why does boost torment me like this?
template<class It, class Skip= boost::spirit::ascii::space_type>
struct g3: qi::grammar<It, ast::expr(), Skip>
{
template<typename...Args>
using R = qi::rule<It, Args...>;
R<ast::expr(), Skip> start, expr_, term_, unary_term;
R<ast::intlit()> int_;
R<std::string()> selector_;
R<boost::fusion::vector<ast::expr, std::vector<std::string>>, Skip> unary_msg;
g3(): g3::base_type(start)
{
namespace ph = boost::phoenix;
using namespace boost::spirit::qi;
int_ = qi::int_;
selector_ = lexeme[+qi::alnum];
term_ = int_;
unary_msg = term_ >> *selector_;
unary_term = unary_msg[ qi::_val = ph::bind(&collect_unary, qi::_1) ];
expr_ = unary_term;
start = expr_;
}
};
full code: http://coliru.stacked-crooked.com/a/e9afef4585ce76c3
Like cv_and_he mentions, add the parens.
Working example with many cleanup suggestions:
Live On Coliru
Notes
don't use using namespace at toplevel
don't use conflicting namespaces (using std and boost are very likely to lead to surprises or conflicts)
don't use internal attribute types like fusion::vector
use modern style BOOST_FUSION_ADAPT_STRUCT
some minor style issues
For example the following function
ast::expr collect_unary (const boost::fusion::vector<ast::expr, std::vector<std::string>>& parts)
//ast::expr collect_unary (const ast::expr& a, const std::vector<std::string>& msgs)
{
ast::expr res = boost::fusion::at_c<0>(parts);//a;
const auto& msgs = boost::fusion::at_c<1>(parts);
for(const auto& m: msgs)
{
ast::message msg;
msg.name = m;
msg.args.push_back(res);
res = msg;
}
return res;
}
was changed into:
ast::expr collect_unary(ast::expr accum, const std::vector<std::string>& msgs) {
for (const auto &m : msgs)
accum = ast::message { m, { accum } };
return accum;
}
Full Listing And Output
Live On Coliru
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <iostream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace ast {
struct intlit {
int value;
intlit(int i = 0) : value(i) { }
intlit(intlit const&other) = default;
};
struct nil {};
struct message;
using expr = boost::make_recursive_variant<nil, intlit, message>::type;
struct message {
std::string name;
std::vector<ast::expr> args;
};
}
#include <boost/fusion/include/adapt_struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::intlit, value)
BOOST_FUSION_ADAPT_STRUCT(ast::message, name, args)
struct ast_print {
void operator()(ast::nil &) const { std::cout << "nil"; }
void operator()(ast::intlit &i) const { std::cout << i.value; }
void operator()(ast::message &m) const {
std::cout << "(" << m.name;
for (auto &it : m.args) {
std::cout << " ";
boost::apply_visitor(ast_print(), it);
}
std::cout << ")" << std::endl;
}
};
ast::expr collect_unary(ast::expr accum, const std::vector<std::string>& msgs)
{
for (const auto &m : msgs)
accum = ast::message { m, { accum } };
return accum;
}
template <class It, class Skip = boost::spirit::ascii::space_type> struct g3 : qi::grammar<It, ast::expr(), Skip> {
g3() : g3::base_type(start) {
using namespace boost::spirit::qi;
namespace ph = boost::phoenix;
int_ = qi::int_;
selector_ = +qi::alnum;
term_ = int_;
unary_msg = (term_ >> *selector_) [ _val = ph::bind(collect_unary, _1, _2) ];
unary_term = unary_msg;
expr_ = unary_term;
start = expr_;
}
private:
template <typename Attr, typename... Args> using R = qi::rule<It, Attr(), Args...>;
R<ast::expr, Skip> start, expr_, term_, unary_term, unary_msg;
R<ast::intlit> int_;
R<std::string> selector_;
};
template <class Parser, typename Result> bool test(const std::string &input, const Parser &parser, Result &result) {
auto first = input.begin(), last = input.end();
return qi::phrase_parse(first, last, parser, boost::spirit::ascii::space, result);
}
int main() {
std::string const input = "42 x y";
g3<std::string::const_iterator> p;
ast::expr res;
if (test(input, p, res)) {
std::cout << "parse ok " << std::endl;
boost::apply_visitor(ast_print(), res);
}
}
Prints
parse ok
(y (x 42)
)
I'm trying to write a parser to create an AST using boost::spirit. As a first step I'm trying to wrap numerical values in an AST node. This is the code I'm using:
AST_NodePtr make_AST_NodePtr(const int& i) {
return std::make_shared<AST_Node>(i);
}
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace l = qi::labels;
template<typename Iterator>
struct test_grammar : qi::grammar<Iterator, AST_NodePtr(), ascii::space_type> {
test_grammar() : test_grammar::base_type(test) {
test = qi::int_ [qi::_val = make_AST_NodePtr(qi::_1)];
}
qi::rule<Iterator, AST_NodePtr(), ascii::space_type> test;
};
As far as I understood it from the documentation q::_1 should contain the value parsed by qi::int_, but the above code always gives me an error along the lines
invalid initialization of reference of type ‘const int&’ from expression of type ‘const _1_type {aka const boost::phoenix::actor<boost::spirit::argument<0> >}
Why does this not work even though qi::_1 is supposed to hold the parsed valued? How else would I parse the input into a custom AST?
You're using a regular function inside the semantic action.
This means that in the contructor the compiler will try to invoke that make_AST_NodePtr function with the argument supplied: qi::_1.
Q. Why does this not work even though qi::_1 is supposed to hold the parsed valued?
A. qi::_1 does not hold the parsed value. It represents (is-a-placeholder-for) the first unbound argument in the function call
This can /obviously/ never work. The function expects an integer.
So what gives?
You need to make a "lazy" or "deferred" function for use in the semantic action. Using only pre-supplied Boost Phoenix functors, you could spell it out:
test = qi::int_ [ qi::_val = px::construct<AST_NodePtr>(px::new_<AST_Node>(qi::_1)) ];
You don't need the helper function this way. But the result is both ugly and suboptimal. So, let's do better!
Using a Phoenix Function wrapper
struct make_shared_f {
std::shared_ptr<AST_Node> operator()(int v) const {
return std::make_shared<AST_Node>(v);
}
};
px::function<make_shared_f> make_shared_;
With this defined, you can simplify the semantic action to:
test = qi::int_ [ qi::_val = make_shared_(qi::_1) ];
Actually, if you make it generic you can reuse it for many types:
template <typename T>
struct make_shared_f {
template <typename... Args>
std::shared_ptr<T> operator()(Args&&... args) const {
return std::make_shared<T>(std::forward<Args>(args)...);
}
};
px::function<make_shared_f<AST_Node> > make_shared_;
DEMO
Here's a self-contained example showing some style fixes in the process:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <memory>
struct AST_Node {
AST_Node(int v) : _value(v) {}
int value() const { return _value; }
private:
int _value;
};
using AST_NodePtr = std::shared_ptr<AST_Node>;
AST_NodePtr make_AST_NodePtr(const int& i) {
return std::make_shared<AST_Node>(i);
}
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template<typename Iterator>
struct test_grammar : qi::grammar<Iterator, AST_NodePtr()> {
test_grammar() : test_grammar::base_type(start) {
using boost::spirit::ascii::space;
start = qi::skip(space) [ test ];
test = qi::int_ [ qi::_val = make_shared_(qi::_1) ];
}
private:
struct make_shared_f {
std::shared_ptr<AST_Node> operator()(int v) const {
return std::make_shared<AST_Node>(v);
}
};
px::function<make_shared_f> make_shared_;
//
qi::rule<Iterator, AST_NodePtr()> start;
qi::rule<Iterator, AST_NodePtr(), boost::spirit::ascii::space_type> test;
};
int main() {
AST_NodePtr parsed;
std::string const input ("42");
auto f = input.begin(), l = input.end();
test_grammar<std::string::const_iterator> g;
bool ok = qi::parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << (parsed? std::to_string(parsed->value()) : "nullptr") << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l)
{
std::cout << "Remaining input: '" << std::string(f, l) << "'\n";
}
}
Prints
Parsed: 42
BONUS: Alternative using BOOST_PHOENIX_ADAPT_FUNCTION
You can actually use your free function if you wish, and use it as follows:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <memory>
struct AST_Node {
AST_Node(int v) : _value(v) {}
int value() const { return _value; }
private:
int _value;
};
using AST_NodePtr = std::shared_ptr<AST_Node>;
AST_NodePtr make_AST_NodePtr(int i) {
return std::make_shared<AST_Node>(i);
}
BOOST_PHOENIX_ADAPT_FUNCTION(AST_NodePtr, make_AST_NodePtr_, make_AST_NodePtr, 1)
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template<typename Iterator>
struct test_grammar : qi::grammar<Iterator, AST_NodePtr()> {
test_grammar() : test_grammar::base_type(start) {
using boost::spirit::ascii::space;
start = qi::skip(space) [ test ] ;
test = qi::int_ [ qi::_val = make_AST_NodePtr_(qi::_1) ] ;
}
private:
qi::rule<Iterator, AST_NodePtr()> start;
qi::rule<Iterator, AST_NodePtr(), boost::spirit::ascii::space_type> test;
};
int main() {
AST_NodePtr parsed;
std::string const input ("42");
auto f = input.begin(), l = input.end();
test_grammar<std::string::const_iterator> g;
bool ok = qi::parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << (parsed? std::to_string(parsed->value()) : "nullptr") << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l)
{
std::cout << "Remaining input: '" << std::string(f, l) << "'\n";
}
}
I am trying to create a parser using boost's spirit qi parser. It is parsing a string that contains three types of values. A constant, a variable, or a function. The functions can be nested inside of each other. The test string is f(a, b) = f(g(z, x), g(x, h(x)), c), where a-e are constants, f-r are functions, and s-z are variables. I successfully created a rule that can correctly parse the expression. The trouble arose when I changed the function parsing the rule into a grammar. There were several errors that I was able to fix. I almost got the grammar to parse the expression and turn it into an abstract syntax tree I created. However I got this error about a file contained in the boost library and I could not figure out where it is coming from because I don't understand the compiler message. I was following the example put up on the website for putting data from a parser to a struct using the employee example: http://www.boost.org/doc/libs/1_41_0/libs/spirit/example/qi/employee.cpp
main.cpp
#include "Parser.h"
#include "Term.h"
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>
#include <list>
using std::string;
using std::cout;
using std::endl;
int main()
{
cout << "Unification Algorithm" << endl << endl;
string phrase = "f(a, b) = f(g(z, x), g(x, h(x)), c)";
string::const_iterator itr = phrase.begin();
string::const_iterator last = phrase.end();
cout << phrase << endl;
// Parser grammar
Parser<string::const_iterator> g;
// Output data
Expression expression;
if (phrase_parse(itr, last, g, boost::spirit::ascii::space, expression))
{
cout << "Expression parsed." << endl;
}
else
{
cout << "Could not parse expression." << endl;
}
}
Parser.h
#ifndef _Parser_h_
#define _Parser_h_
#include "Term.h"
#include <boost/spirit/include/qi.hpp>
#include <vector>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator>
struct Parser : qi::grammar<Iterator, Expression(), ascii::space_type>
{
Parser() : Parser::base_type(expression)
{
using qi::char_;
const_char = char_("a-eA-E");
fn_char = char_("f-rF-R");
var_char = char_("s-zS-Z");
basic_fn = fn_char >> char_('(') >> (const_char | var_char) % char_(',') >> char_(')');
first_fn_wrapper = fn_char >> char_('(') >> (basic_fn | const_char | var_char) % char_(',') >> char_(')');
nested_fn = fn_char >> char_('(') >> (first_fn_wrapper | const_char | var_char) % char_(',') >> char_(')');
expression = nested_fn >> char_("=") >> nested_fn;
}
// Constant character a - e
qi::rule<Iterator, T_Cons, ascii::space_type> const_char;
// Function character f - r
qi::rule<Iterator, char(), ascii::space_type> fn_char;
// Variable character s - z
qi::rule<Iterator, T_Var, ascii::space_type> var_char;
// Allows for basic function parsing eg. f(x, y, z)
qi::rule<Iterator, T_Fn, ascii::space_type> basic_fn;
// Allows for single nested functions eg. f(g(x), y, z)
qi::rule<Iterator, T_Fn, ascii::space_type> first_fn_wrapper;
// Allows for fully nested functions eg. f(g(x, h(y)), z) and so on
qi::rule<Iterator, T_Fn, ascii::space_type> nested_fn;
// Full rule for a nested function expression
qi::rule<Iterator, Expression, ascii::space_type> expression;
};
#endif // _Parser_h_
Term.h
#ifndef _Term_h_
#define _Term_h_
#include <boost/fusion/include/adapt_struct.hpp>
#include <vector>
struct Term
{
char name;
};
BOOST_FUSION_ADAPT_STRUCT(Term, (char, name))
struct T_Cons : Term
{
};
BOOST_FUSION_ADAPT_STRUCT(T_Cons, (char, name))
struct T_Var : Term
{
};
BOOST_FUSION_ADAPT_STRUCT(T_Var, (char, name))
struct T_Fn : Term
{
std::vector<Term> * params;
T_Fn() { params = new std::vector<Term>(); }
~T_Fn() { delete params; }
};
BOOST_FUSION_ADAPT_STRUCT(T_Fn, (std::vector<Term>*, params))
struct Expression
{
Term lh_term;
Term rh_term;
};
BOOST_FUSION_ADAPT_STRUCT(Expression, (char, name) (Term, lh_term) (Term, rh_term))
#endif // _Term_h_
I cannot link the entire error message from the compiler because it is extremely long, but here are the last few. These are the compile errors that it gave:
boost_1_46_0\boost\mpl\assert.hpp|360|error: no matching function for call to 'assertion_failed(mpl_::failed************ (boost::spirit::qi::grammar<Iterator, T1, T2, T3, T4>::grammar(const boost::spirit::qi::rule<Iterator_, T1_, T2_, T3_, T4_>&, const string&) [with Iterator_ = __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >; T1_ = Expression; T2_ = boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::space, boost::spirit::char_encoding::asci|
boost_1_46_0\boost\proto\extends.hpp|540|error: use of deleted function 'boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::qi::reference<const boost::spirit::qi::rule<__gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >, Expression(), boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::space, boost::spirit::char_encoding::ascii> >, 0l>, boost::spirit::unused_type, boost::spirit::unused_type> > >, 0l>:|
boost_1_46_0\boost\proto\detail\expr0.hpp|165|error: no matching function for call to 'boost::spirit::qi::reference<const boost::spirit::qi::rule<__gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >, Expression(), boost::proto::exprns_::expr<boost::proto::tag::terminal, boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::space, boost::spirit::char_encoding::ascii> >, 0l>, boost::spirit::unused_type, boost::spirit::unused_type> >::reference()'|
UPDATE Showing a simplified parser with a a recursive ast parsing the sample expression shown
As always, the assertion message leads to exactly the problem:
// If you see the assertion below failing then the start rule
// passed to the constructor of the grammar is not compatible with
// the grammar (i.e. it uses different template parameters).
BOOST_SPIRIT_ASSERT_MSG(
(is_same<start_type, rule<Iterator_, T1_, T2_, T3_, T4_> >::value)
, incompatible_start_rule, (rule<Iterator_, T1_, T2_, T3_, T4_>));
So it tells you you should match the grammar with the start rule: you have
struct Parser : qi::grammar<Iterator, Expression(), ascii::space_type>
but
qi::rule<Iterator, Expression, ascii::space_type> expression;
Clearly you forgot parentheses there:
qi::rule<Iterator, Expression(), ascii::space_type> expression;
Guidelines when using generic libraries:
Some of these "rules" are generically applicable, with the exception of no. 2 which is specifically related to Boost Spirit:
baby steps; start small (empty, even)
start with the AST to match the grammar exactly
build gradually,
compiling every step along the way
UPDATE
Here's a much simplified grammar. As mentioned, in the "first rules of spirit" just before, start with the AST to match the grammar exactly:
namespace ast {
namespace tag {
struct constant;
struct variable;
struct function;
}
template <typename Tag> struct Identifier { char name; };
using Constant = Identifier<tag::constant>;
using Variable = Identifier<tag::variable>;
using Function = Identifier<tag::function>;
struct FunctionCall;
using Expression = boost::make_recursive_variant<
Constant,
Variable,
boost::recursive_wrapper<FunctionCall>
>::type;
struct FunctionCall {
Function function;
std::vector<Expression> params;
};
struct Equation {
Expression lhs, rhs;
};
}
Of course this could be much simpler still since all identifiers are just char and you could do the switching dynamically (impression).
Now, the grammar will have to follow. 1. Keep it simple 2. Format carefully 3. Match the ast directly, 4. add debug macros:
template <typename It, typename Skipper = ascii::space_type>
struct Parser : qi::grammar<It, ast::Equation(), Skipper>
{
Parser() : Parser::base_type(equation_)
{
using namespace qi;
constant_ = qi::eps >> char_("a-eA-E");
function_ = qi::eps >> char_("f-rF-R");
variable_ = qi::eps >> char_("s-zS-Z");
function_call = function_ >> '(' >> -(expression_ % ',') >> ')';
expression_ = constant_ | variable_ | function_call;
equation_ = expression_ >> '=' >> expression_;
BOOST_SPIRIT_DEBUG_NODES((constant_)(function_)(variable_)(function_call)(expression_)(equation_))
}
qi::rule<It, ast::Constant()> constant_;
qi::rule<It, ast::Function()> function_;
qi::rule<It, ast::Variable()> variable_;
qi::rule<It, ast::FunctionCall(), Skipper> function_call;
qi::rule<It, ast::Expression(), Skipper> expression_;
qi::rule<It, ast::Equation(), Skipper> equation_;
};
Note how the comments have become completely unneeded. Also note how recursively using expression_ solved your biggest headache!
Full Program
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace ast {
namespace tag {
struct constant;
struct variable;
struct function;
}
template <typename Tag> struct Identifier { char name; };
using Constant = Identifier<tag::constant>;
using Variable = Identifier<tag::variable>;
using Function = Identifier<tag::function>;
struct FunctionCall;
using Expression = boost::make_recursive_variant<
Constant,
Variable,
boost::recursive_wrapper<FunctionCall>
>::type;
struct FunctionCall {
Function function;
std::vector<Expression> params;
};
struct Equation {
Expression lhs, rhs;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::Constant, (char, name))
BOOST_FUSION_ADAPT_STRUCT(ast::Variable, (char, name))
BOOST_FUSION_ADAPT_STRUCT(ast::Function, (char, name))
BOOST_FUSION_ADAPT_STRUCT(ast::FunctionCall, (ast::Function, function)(std::vector<ast::Expression>, params))
BOOST_FUSION_ADAPT_STRUCT(ast::Equation, (ast::Expression, lhs)(ast::Expression, rhs))
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename It, typename Skipper = ascii::space_type>
struct Parser : qi::grammar<It, ast::Equation(), Skipper>
{
Parser() : Parser::base_type(equation_)
{
using namespace qi;
constant_ = qi::eps >> char_("a-eA-E");
function_ = qi::eps >> char_("f-rF-R");
variable_ = qi::eps >> char_("s-zS-Z");
function_call = function_ >> '(' >> -(expression_ % ',') >> ')';
expression_ = constant_ | variable_ | function_call;
equation_ = expression_ >> '=' >> expression_;
BOOST_SPIRIT_DEBUG_NODES((constant_)(function_)(variable_)(function_call)(expression_)(equation_))
}
qi::rule<It, ast::Constant()> constant_;
qi::rule<It, ast::Function()> function_;
qi::rule<It, ast::Variable()> variable_;
qi::rule<It, ast::FunctionCall(), Skipper> function_call;
qi::rule<It, ast::Expression(), Skipper> expression_;
qi::rule<It, ast::Equation(), Skipper> equation_;
};
int main() {
std::cout << "Unification Algorithm\n\n";
std::string const phrase = "f(a, b) = f(g(z, x), g(x, h(x)), c)";
using It = std::string::const_iterator;
It itr = phrase.begin(), last = phrase.end();
std::cout << phrase << std::endl;
Parser<It> g;
ast::Equation parsed;
if (phrase_parse(itr, last, g, ascii::space, parsed)) {
std::cout << "Expression parsed.\n";
} else {
std::cout << "Could not parse equation.\n";
}
if (itr != last) {
std::cout << "Remaining unparsed input: '" << std::string(itr,last) << "'\n";
}
}
A vanilla C++ solution (as per popular request)
I compiled it with MSVC 2013.
Lack of unrestricted unions support lead me to duplicate the 3 possible values of an argument.
There are workarounds for this limitation, but (like so many other things in C++) they are rather messy, so I kept them out to limit code obfuscation.
#include <string>
#include <vector>
#include <iostream>
using namespace std;
// possible token types
enum tTokenType {
T_CONST, // s-z
T_VAR, // a-e
T_FUNC, // f-r
T_EQUAL, // =
T_COMMA, // ,
T_OBRACE, // (
T_CBRACE, // )
T_SPACE, // ' ' or '\t'
T_ERROR, // anything but spaces
T_EOI // end of input
};
// tokens
struct tToken {
tTokenType _type; // lexical element type
char _value; // the actual const/var/func letter
size_t _index; // position in translation unit
static const string constants, variables, functions, spacing;
static const char * type_name[];
tToken(tTokenType t, size_t index) : _type(t), _value(0), _index(index) {}
static tTokenType type(char c)
{
if (constants.find(c) != string::npos) return T_CONST;
if (variables.find(c) != string::npos) return T_VAR;
if (functions.find(c) != string::npos) return T_FUNC;
if (spacing .find(c) != string::npos) return T_SPACE;
if (c == '=') return T_EQUAL;
if (c == ',') return T_COMMA;
if (c == '(') return T_OBRACE;
if (c == ')') return T_CBRACE;
return T_ERROR;
}
tToken(char c, size_t index) : _value(c), _index(index)
{
_type = type(c);
}
void croak(tTokenType type)
{
string err(_index - 1, '-');
cerr << err << "^ expecting " << type_name[(int)type] << "\n";
}
};
const string tToken::variables("abcde");
const string tToken::functions("fghijklmnopqr");
const string tToken::constants("stuvwxyz");
const string tToken::spacing (" \t");
const char * tToken::type_name[] = { "constant", "variable", "function", "=", ",", "(", ")", "space", "error", "end of input" };
// parser
class Parser {
friend class Compiler;
string _input; // remaining program input
size_t _index; // current token index (for error tracking)
void skip_spaces(void)
{
while (_input.length() != 0 && tToken::type(_input[0]) == T_SPACE) next();
}
void next(void)
{
_input.erase(0, 1);
_index++;
}
public:
void read (string program)
{
_input = program;
_index = 0;
skip_spaces();
}
tToken get(void)
{
tToken res = peek();
next();
skip_spaces();
return res;
}
tToken peek(void)
{
if (_input.length() == 0) return tToken(T_EOI, _index);
return tToken (_input[0], _index);
}
tToken accept(tTokenType type)
{
tToken t = get();
return (t._type == type) ? t : tToken (T_ERROR, _index-1);
}
bool consume(tTokenType type)
{
tToken t = get();
bool res = t._type == type;
if (!res) t.croak(type);
return res;
}
};
// syntactic elements
struct tSyntacticElement {
char name;
bool valid;
tSyntacticElement() : name('?'), valid(false) {}
tSyntacticElement(char c) : name(c), valid(false) {}
};
class tConstant : private tSyntacticElement {
friend class tArgument;
tConstant() {}
tConstant(tToken t) : tSyntacticElement(t._value) { }
};
class tVariable : private tSyntacticElement {
friend class tArgument;
tVariable() {}
tVariable(tToken t) : tSyntacticElement(t._value) { }
};
class tFunCall : private tSyntacticElement {
friend class Compiler;
friend class tProgram;
friend class tArgument;
vector<tArgument>params;
tFunCall() {}
tFunCall(tToken t) : tSyntacticElement(t._value) { }
void add_argument(tArgument a);
string dump(void);
};
class tArgument {
friend class Compiler;
friend class tProgram;
friend class tFunCall;
tTokenType type;
// MSVC 2013 does not support unrestricted unions, so for the
// sake of simplicity I'll leave it as 3 separate attributes
tConstant c;
tVariable v;
tFunCall f;
tArgument() {}
tArgument(tToken val) : type(val._type)
{
if (val._type == T_CONST) c = val;
if (val._type == T_VAR ) v = val;
}
tArgument(tFunCall f) : type(T_FUNC ), f(f) {}
string dump(void)
{
if (type == T_VAR) return string("$") + v.name;
if (type == T_CONST) return string("#") + c.name;
if (type == T_FUNC) return f.dump();
return "!";
}
};
class tProgram {
friend class Compiler;
tArgument left;
tArgument right;
bool valid;
string dump(void) { return left.dump() + " = " + right.dump(); }
};
// syntactic analyzer
void tFunCall::add_argument(tArgument a) { params.push_back(a); }
string tFunCall::dump(void)
{
string res(1, name);
res += '(';
// it's 2015 and still no implode() in C++...
for (size_t i = 0; i != params.size(); i++)
{
res += params[i].dump();
if (i != params.size() - 1) res += ',';
}
res += ')';
return res;
}
class Compiler {
Parser parser;
tProgram program;
tFunCall parse_function(void)
{
tToken f = parser.accept(T_FUNC);
tFunCall res (f);
parser.accept(T_OBRACE);
for (;;)
{
tArgument a = parse_argument();
res.add_argument(a);
tToken next = parser.get();
if (next._type == T_CBRACE) break;
if (next._type != T_COMMA) return res;
}
res.valid = true;
return res;
}
tArgument parse_argument(void)
{
tToken id = parser.peek();
if (id._type == T_FUNC) return parse_function();
id = parser.get();
if (id._type == T_CONST) return id;
if (id._type == T_VAR) return id;
return tArgument(tToken (T_ERROR, id._index));
}
public:
void analyze(string input)
{
parser.read(input);
cerr << input << "\n";
program.left = parse_argument();
program.valid &= parser.consume(T_EQUAL);
program.right = parse_argument();
program.valid &= parser.consume(T_EOI);
}
string dump(void)
{
return program.dump();
}
};
int main(int argc, char * argv[])
{
Compiler compiler;
// compiler.analyze("f(a, b) = f(g(z, x), g(x, h(x)), c)");
compiler.analyze(argv[1]);
cout << compiler.dump();
return 0;
}
Grammar
Given the rather terse problem definition, I invented a grammar that should at least match the test input:
program : argument = argument
argument: variable
| constant
| fun_call
fun_call: fun_name ( arg_list )
arg_list: argument
| argument , arg_list
Parsing
Given the simplicity of the syntax, parsing is pretty straightforward.
Each character is basically something valid, a space or something invalid.
Spaces are silently consumed, so that the analyzer only gets useful tokens to process.
Analyze
Since I'm doing this barehanded, I simply define a function for each grammatical rule (program, fun_call, arg_list, argument).
The grammar is predictive (can't remember how it's called in posh books, LL1 maybe?) and there are no arithmetic expressions so the code is relatively lightweight.
Error reporting
Bah, just the barest minimum, and I did not really test it.
Proper error handling can easily double code size (even with yacc), so I drew the line early.
Invalid characters will be replaced by "!", and some expected symbols will be pointed at in a semblance of vintage C compilers output.
There are absolutely no re-synchronization attempts, so a typo inside a function call (especially a braces imbalance) will likely cast the rest of the translation unit to the bin.
Using the hard earned syntactic tree
The mighty compiler manages to spit out an equivalent of the input.
Just to show that something was done beside trimming white spaces, variables are preceded by a '$' and constants by a '#' (showing a deplorable lack of imagination).
Sample output
ExpressionCompiler "f(a) = z"
f(a) = z
f($a) = #z
ExpressionCompiler "f(a) = f(c,z)"
f(a) = f(c,z)
f($a) = f($c,#z)
ExpressionCompiler "f(a, b) = f(g(z, x), g(x, h(x)), c)"
f(a, b) = f(g(z, x), g(x, h(x)), c)
f($a,$b) = f(g(#z,#x),g(#x,h(#x)),$c)
ExpressionCompiler "f(a, b) + f(g(z, x), g(x, h(x)), c)"
f(a, b) + f(g(z, x), g(x, h(x)), c)
-------^ expecting =
f($a,$b) = f(g(#z,#x),g(#x,h(#x)),$c)
ExpressionCompiler "f(A, b) = f(g(z, x), g(x, h(x)), c)"
f(A, b) = f(g(z, x), g(x, h(x)), c)
f(!,$b) = f(g(#z,#x),g(#x,h(#x)),$c)
ExpressionCompiler "f(a, b) = K(g(z, x), g(x, h(x)), c)"
f(a, b) = K(g(z, x), g(x, h(x)), c)
----------^ expecting end of input
f($a,$b) = !
I want to parse something like the following:
1;2
=1200
3;4
5;6
lines can appear in any order. Lines starting with the = sign can be more than one and only the last one matters; lines containing a ; represent a pair of values that I want to store in a map. Reading the answer to this question I came up with some code that should be good enough (sorry but I'm still a noob with Spirit) and should do what I'm trying to achieve. Here's the code:
#define BOOST_SPIRIT_USE_PHOENIX_V3
#define DATAPAIR_PAIR
#include <iostream>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/mpl/bool.hpp>
#include <map>
#if !defined(DATAPAIR_PAIR)
#include <vector>
#endif
static const char g_data[] = "1;2\n=1200\n3;4\n5;6\n";
typedef std::string DataTypeFirst;
#if defined(DATAPAIR_PAIR)
typedef std::string DataTypeSecond;
typedef std::pair<DataTypeFirst, DataTypeSecond> DataPair;
typedef std::map<DataTypeFirst, DataTypeSecond> DataMap;
#else
typedef std::vector<DataTypeFirst> DataPair;
typedef std::map<DataTypeFirst, DataTypeFirst> DataMap;
#endif
struct MyContainer {
DataMap data;
double number;
};
namespace boost { namespace spirit { namespace traits {
template<> struct is_container<MyContainer> : boost::mpl::true_ {};
template<>
struct container_value<MyContainer> {
typedef boost::variant<double, DataPair> type;
};
template <>
struct push_back_container<MyContainer, double> {
static bool call ( MyContainer& parContainer, double parValue ) {
parContainer.number = parValue;
return true;
}
};
template <>
struct push_back_container<MyContainer, DataPair> {
static bool call ( MyContainer& parContainer, const DataPair& parValue ) {
#if defined(DATAPAIR_PAIR)
parContainer.data[parValue.first] = parValue.second;
#else
parContainer.data[parValue[0]] = parValue[1];
#endif
return true;
}
};
} } }
template <typename Iterator>
struct TestGrammar : boost::spirit::qi::grammar<Iterator, MyContainer()> {
TestGrammar ( void );
boost::spirit::qi::rule<Iterator, MyContainer()> start;
boost::spirit::qi::rule<Iterator, DataPair()> data;
boost::spirit::qi::rule<Iterator, double()> num;
};
template <typename Iterator>
TestGrammar<Iterator>::TestGrammar() :
TestGrammar::base_type(start)
{
using boost::spirit::qi::alnum;
using boost::spirit::qi::lit;
using boost::spirit::ascii::char_;;
using boost::spirit::qi::double_;
using boost::spirit::qi::eol;
using boost::spirit::qi::eoi;
start %= *((num | data) >> (eol | eoi));
data = +alnum >> lit(";") >> +alnum;
num = '=' >> double_;
}
int main() {
std::cout << "Parsing data:\n" << g_data << "\n";
TestGrammar<const char*> gramm;
MyContainer result;
boost::spirit::qi::parse(static_cast<const char*>(g_data),
g_data + sizeof(g_data) / sizeof(g_data[0]) - 1,
gramm,
result
);
std::cout << "Parsed data:\n";
std::cout << "Result: " << result.number << "\n";
for (const auto& p : result.data) {
std::cout << p.first << " = " << p.second << '\n';
}
return 0;
}
I'm developing this on Gentoo Linux, using dev-libs/boost-1.55.0-r2:0/1.55.0 and gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3. Compiling the above code I get an error like
/usr/include/boost/spirit/home/support/container.hpp:278:13: error: ‘struct MyContainer’ has no member named ‘insert’
as a workaround, I came up with the alternative code you get by commenting the "#define DATAPAIR_PAIR" line. In that case the code compiles and works, but what I really want is a pair where I can for example mix std::string and int values. Why using std::pair as the attribute for my data rule causes the compiler to miss the correct specialization of push_back_container? Is it possible to fix the code and have it working, either using std::pair or anything equivalent?
I'd simplify this by /just/ not treating things like a container and not-a-container at the same time. So for this particular situation I might deviate from my usual mantra (avoid semantic actions) and use them¹:
Live On Coliru
template <typename It, typename Skipper = qi::blank_type>
struct grammar : qi::grammar<It, MyContainer(), Skipper> {
grammar() : grammar::base_type(start) {
update_number = '=' > qi::double_ [ qi::_r1 = qi::_1 ];
map_entry = qi::int_ > ';' > qi::int_;
auto number = phx::bind(&MyContainer::number, qi::_val);
auto data = phx::bind(&MyContainer::data, qi::_val);
start = *(
( update_number(number)
| map_entry [ phx::insert(data, phx::end(data), qi::_1) ]
)
>> qi::eol);
}
private:
qi::rule<It, void(double&), Skipper> update_number;
qi::rule<It, MyContainer::Pair(), Skipper> map_entry;
qi::rule<It, MyContainer(), Skipper> start;
};
If you can afford a (0;0) entry in your map, you can even dispense with the grammar:
Live On Coliru
std::map<int, int> data;
double number;
bool ok = qi::phrase_parse(f, l,
*(
(qi::omit['=' > qi::double_ [phx::ref(number)=qi::_1]]
| (qi::int_ > ';' > qi::int_)
) >> qi::eol)
, qi::blank, data);
I can try to make your "advanced spirit" approach work too, but it might take a while :)
¹ I use auto for readability, but of course you don't need to use that; just repeat the subexpressions inline or use BOOST_AUTO. Note that this is not generically good advice for stateful parser expressions (see BOOST_SPIRIT_AUTO)