parsing chemical formula with mixtures of elements

parsing chemical formula with mixtures of elements - c++

I would like to use boost::spirit in order to extract the stoichiometry of compounds made of several elements from a brute formula. Within a given compound, my parser should be able to distinguish three kind of chemical element patterns:
natural element made of a mixture of isotopes in natural abundance
pure isotope
mixture of isotopes in non-natural abundance
Those patterns are then used to parse such following compounds:
"C" --> natural carbon made of C[12] and C[13] in natural abundance
"CH4" --> methane made of natural carbon and hydrogen
"C2H{H[1](0.8)H[2](0.2)}6" --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
"U[235]" --> pure uranium 235
Obviously, the chemical element patterns can be in any order (e.g. CH[1]4 and H[1]4C ...) and frequencies.
I wrote my parser which is quite close to do the job but I still face one problem.
Here is my code:
template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>>
{
ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
{
namespace phx = boost::phoenix;
// Semantic action for handling the case of pure isotope
phx::function<PureIsotopeBuilder> const build_pure_isotope = PureIsotopeBuilder();
// Semantic action for handling the case of pure isotope mixture
phx::function<IsotopesMixtureBuilder> const build_isotopes_mixture = IsotopesMixtureBuilder();
// Semantic action for handling the case of natural element
phx::function<NaturalElementBuilder> const build_natural_element = NaturalElementBuilder();
phx::function<UpdateElement> const update_element = UpdateElement();
// XML database that store all the isotopes of the periodical table
ChemicalDatabaseManager<Isotope>* imgr=ChemicalDatabaseManager<Isotope>::Instance();
const auto& isotopeDatabase=imgr->getDatabase();
// Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
for (const auto& isotope : isotopeDatabase) {
_isotopeNames.add(isotope.second.getName(),isotope.second.getName());
_elementSymbols.add(isotope.second.getProperty<std::string>("symbol"),isotope.second.getProperty<std::string>("symbol"));
}
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> qi::double_ >> ")") >> "}";
_isotopesMixtureToken = (_elementSymbols[qi::_a=qi::_1] >> _mixtureToken[qi::_b=qi::_1])[qi::_pass=build_isotopes_mixture(qi::_val,qi::_a,qi::_b)];
_pureIsotopeToken = (_isotopeNames[qi::_a=qi::_1])[qi::_pass=build_pure_isotope(qi::_val,qi::_a)];
_naturalElementToken = (_elementSymbols[qi::_a=qi::_1])[qi::_pass=build_natural_element(qi::_val,qi::_a)];
_start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[qi::_a=qi::_1] >>
(qi::double_|qi::attr(1.0))[qi::_b=qi::_1])[qi::_pass=update_element(qi::_val,qi::_a,qi::_b)] );
}
//! Defines the rule for matching a prefix
qi::symbols<char,std::string> _isotopeNames;
qi::symbols<char,std::string> _elementSymbols;
qi::rule<Iterator,isotopesMixture()> _mixtureToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<std::string,isotopesMixture>> _isotopesMixtureToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _pureIsotopeToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<std::string>> _naturalElementToken;
qi::rule<Iterator,isotopesMixture(),qi::locals<isotopesMixture,double>> _start;
};
Basically each separate element pattern can be parsed properly with their respective semantic action which produces as ouput a map between the isotopes that builds the compound and their corresponding stoichiometry. The problem starts when parsing the following compound:
CH{H[1](0.9)H[2](0.4)}
In such case the semantic action build_isotopes_mixture return false because 0.9+0.4 is non sense for a sum of ratio. Hence I would have expected and wanted my parser to fail for this compound. However, because of the _start rule which uses alternative operator for the three kind of chemical element pattern, the parser manages to parse it by 1) throwing away the {H[1](0.9)H[2](0.4)} part 2) keeping the preceding H 3) parsing it using the _naturalElementToken. Is my grammar not clear enough for being expressed as a parser ? How to use the alternative operator in such a way that, when an occurrence has been found but gave a false when running the semantic action, the parser stops ?

How to use the alternative operator in such a way that, when an occurrence has been found but gave a false when running the semantic action, the parser stops ?
In general, you achieve this by adding an expectation point to prevent backtracking.
In this case you are actually "conflating" several tasks:
matching input
interpreting matched input
validating matched input
Spirit excels at matching input, has
great facilities when it comes to interpreting (mostly in the sense of AST creation). However, things get "nasty" with validating on the fly.
An advice I often repeat is to consider separating the concerns whenever possible. I'd consider
building a direct AST representation of the input first,
transforming/normalizing/expanding/canonicalizing to a more convenient or meaningful domain representation
do final validations on the result
This gives you the most expressive code while keeping it highly maintainable.
Because I don't understand the problem domain well enough and the code sample is not nearly complete enough to induce it, I will not try to give a full sample of what I have in mind. Instead I'll try my best at sketching the expectation point approach I mentioned at the outset.
Mock Up Sample To Compile
This took the most time. (Consider doing the leg work for the people who are going to help you)
Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <map>
namespace qi = boost::spirit::qi;
struct DummyBuilder {
using result_type = bool;
template <typename... Ts>
bool operator()(Ts&&...) const { return true; }
};
struct PureIsotopeBuilder : DummyBuilder { };
struct IsotopesMixtureBuilder : DummyBuilder { };
struct NaturalElementBuilder : DummyBuilder { };
struct UpdateElement : DummyBuilder { };
struct Isotope {
std::string getName() const { return _name; }
Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }
template <typename T> std::string getProperty(std::string const& name) const {
if (name == "symbol")
return _symbol;
throw std::domain_error("no such property (" + name + ")");
}
private:
std::string _name, _symbol;
};
using MixComponent = std::pair<Isotope, double>;
using isotopesMixture = std::list<MixComponent>;
template <typename Isotope>
struct ChemicalDatabaseManager {
static ChemicalDatabaseManager* Instance() {
static ChemicalDatabaseManager s_instance;
return &s_instance;
}
auto& getDatabase() { return _db; }
private:
std::map<int, Isotope> _db {
{ 1, { "H[1]", "H" } },
{ 2, { "H[2]", "H" } },
{ 3, { "Carbon", "C" } },
{ 4, { "U[235]", "U" } },
};
};
template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> >
{
ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
{
using namespace qi;
namespace phx = boost::phoenix;
phx::function<PureIsotopeBuilder> build_pure_isotope; // Semantic action for handling the case of pure isotope
phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
phx::function<NaturalElementBuilder> build_natural_element; // Semantic action for handling the case of natural element
phx::function<UpdateElement> update_element;
// XML database that store all the isotopes of the periodical table
ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
const auto& isotopeDatabase=imgr->getDatabase();
// Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
for (const auto& isotope : isotopeDatabase) {
_isotopeNames.add(isotope.second.getName(),isotope.second.getName());
_elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"),isotope.second.template getProperty<std::string>("symbol"));
}
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
_isotopesMixtureToken = (_elementSymbols[_a=_1] >> _mixtureToken[_b=_1])[_pass=build_isotopes_mixture(_val,_a,_b)];
_pureIsotopeToken = (_isotopeNames[_a=_1])[_pass=build_pure_isotope(_val,_a)];
_naturalElementToken = (_elementSymbols[_a=_1])[_pass=build_natural_element(_val,_a)];
_start = +( ( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken)[_a=_1] >>
(double_|attr(1.0))[_b=_1]) [_pass=update_element(_val,_a,_b)] );
}
private:
//! Defines the rule for matching a prefix
qi::symbols<char, std::string> _isotopeNames;
qi::symbols<char, std::string> _elementSymbols;
qi::rule<Iterator, isotopesMixture()> _mixtureToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<std::string, isotopesMixture> > _isotopesMixtureToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _pureIsotopeToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<std::string> > _naturalElementToken;
qi::rule<Iterator, isotopesMixture(), qi::locals<isotopesMixture, double> > _start;
};
int main() {
using It = std::string::const_iterator;
ChemicalFormulaParser<It> parser;
for (std::string const input : {
"C", // --> natural carbon made of C[12] and C[13] in natural abundance
"CH4", // --> methane made of natural carbon and hydrogen
"C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
"C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
"U[235]", // --> pure uranium 235
})
{
std::cout << " ============= '" << input << "' ===========\n";
It f = input.begin(), l = input.end();
isotopesMixture mixture;
bool ok = qi::parse(f, l, parser, mixture);
if (ok)
std::cout << "Parsed successfully\n";
else
std::cout << "Parse failure\n";
if (f != l)
std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
}
}
Which, as given, just prints
============= 'C' ===========
Parsed successfully
============= 'CH4' ===========
Parsed successfully
============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
Parsed successfully
============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
Parsed successfully
============= 'U[235]' ===========
Parsed successfully
General remarks:
no need for the locals, just use the regular placeholders:
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}";
_isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];
_pureIsotopeToken = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
_naturalElementToken = _elementSymbols [ _pass=build_natural_element(_val, _1) ];
_start = +(
( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
(double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ]
);
// ....
qi::rule<Iterator, isotopesMixture()> _mixtureToken;
qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
qi::rule<Iterator, isotopesMixture()> _start;
you will want to handle conflicts between names/symbols (possibly just by prioritizing one or the other)
conforming compilers will require the template qualifier (unless I totally mis-guessed your datastructure, in which case I don't know what the template argument to ChemicalDatabaseManager was supposed to mean).
Hint, MSVC is not a standards-conforming compiler
Live On Coliru
Expectation Point Sketch
Assuming that the "weights" need to add up to 100% inside the _mixtureToken rule, we can either make build_isotopes_micture "not dummy" and add the validation:
struct IsotopesMixtureBuilder {
bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
using namespace boost::adaptors;
// validate weights total only
return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
}
};
However, as you note, it will thwart things by backtracking. Instead you might /assert/ that any complete mixture add up to 100%:
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
With something like
struct ValidateWeightTotal {
bool operator()(isotopesMixture const& mixture) const {
using namespace boost::adaptors;
bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
return ok;
// or perhaps just :
return ok? ok : throw InconsistentsWeights {};
}
struct InconsistentsWeights : virtual std::runtime_error {
InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
};
};
Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/numeric.hpp>
#include <map>
namespace qi = boost::spirit::qi;
struct DummyBuilder {
using result_type = bool;
template <typename... Ts>
bool operator()(Ts&&...) const { return true; }
};
struct PureIsotopeBuilder : DummyBuilder { };
struct NaturalElementBuilder : DummyBuilder { };
struct UpdateElement : DummyBuilder { };
struct Isotope {
std::string getName() const { return _name; }
Isotope(std::string const& name = "unnamed", std::string const& symbol = "?") : _name(name), _symbol(symbol) { }
template <typename T> std::string getProperty(std::string const& name) const {
if (name == "symbol")
return _symbol;
throw std::domain_error("no such property (" + name + ")");
}
private:
std::string _name, _symbol;
};
using MixComponent = std::pair<Isotope, double>;
using isotopesMixture = std::list<MixComponent>;
struct IsotopesMixtureBuilder {
bool operator()(isotopesMixture&/* output*/, std::string const&/* elementSymbol*/, isotopesMixture const& mixture) const {
using namespace boost::adaptors;
// validate weights total only
return std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
}
};
struct ValidateWeightTotal {
bool operator()(isotopesMixture const& mixture) const {
using namespace boost::adaptors;
bool ok = std::abs(1.0 - boost::accumulate(mixture | map_values, 0.0)) < 0.00001;
return ok;
// or perhaps just :
return ok? ok : throw InconsistentsWeights {};
}
struct InconsistentsWeights : virtual std::runtime_error {
InconsistentsWeights() : std::runtime_error("InconsistentsWeights") {}
};
};
template <typename Isotope>
struct ChemicalDatabaseManager {
static ChemicalDatabaseManager* Instance() {
static ChemicalDatabaseManager s_instance;
return &s_instance;
}
auto& getDatabase() { return _db; }
private:
std::map<int, Isotope> _db {
{ 1, { "H[1]", "H" } },
{ 2, { "H[2]", "H" } },
{ 3, { "Carbon", "C" } },
{ 4, { "U[235]", "U" } },
};
};
template <typename Iterator>
struct ChemicalFormulaParser : qi::grammar<Iterator, isotopesMixture()>
{
ChemicalFormulaParser(): ChemicalFormulaParser::base_type(_start)
{
using namespace qi;
namespace phx = boost::phoenix;
phx::function<PureIsotopeBuilder> build_pure_isotope; // Semantic action for handling the case of pure isotope
phx::function<IsotopesMixtureBuilder> build_isotopes_mixture; // Semantic action for handling the case of pure isotope mixture
phx::function<NaturalElementBuilder> build_natural_element; // Semantic action for handling the case of natural element
phx::function<UpdateElement> update_element;
phx::function<ValidateWeightTotal> validate_weight_total;
// XML database that store all the isotopes of the periodical table
ChemicalDatabaseManager<Isotope>* imgr = ChemicalDatabaseManager<Isotope>::Instance();
const auto& isotopeDatabase=imgr->getDatabase();
// Loop over the database to the spirit symbols for the isotopes names (e.g. H[1],C[14]) and the elements (e.g. H,C)
for (const auto& isotope : isotopeDatabase) {
_isotopeNames.add(isotope.second.getName(),isotope.second.getName());
_elementSymbols.add(isotope.second.template getProperty<std::string>("symbol"), isotope.second.template getProperty<std::string>("symbol"));
}
_mixtureToken = "{" >> +(_isotopeNames >> "(" >> double_ >> ")") >> "}" > eps(validate_weight_total(_val));
_isotopesMixtureToken = (_elementSymbols >> _mixtureToken) [ _pass=build_isotopes_mixture(_val, _1, _2) ];
_pureIsotopeToken = _isotopeNames [ _pass=build_pure_isotope(_val, _1) ];
_naturalElementToken = _elementSymbols [ _pass=build_natural_element(_val, _1) ];
_start = +(
( (_isotopesMixtureToken | _pureIsotopeToken | _naturalElementToken) >>
(double_|attr(1.0)) ) [ _pass=update_element(_val, _1, _2) ]
);
}
private:
//! Defines the rule for matching a prefix
qi::symbols<char, std::string> _isotopeNames;
qi::symbols<char, std::string> _elementSymbols;
qi::rule<Iterator, isotopesMixture()> _mixtureToken;
qi::rule<Iterator, isotopesMixture()> _isotopesMixtureToken;
qi::rule<Iterator, isotopesMixture()> _pureIsotopeToken;
qi::rule<Iterator, isotopesMixture()> _naturalElementToken;
qi::rule<Iterator, isotopesMixture()> _start;
};
int main() {
using It = std::string::const_iterator;
ChemicalFormulaParser<It> parser;
for (std::string const input : {
"C", // --> natural carbon made of C[12] and C[13] in natural abundance
"CH4", // --> methane made of natural carbon and hydrogen
"C2H{H[1](0.8)H[2](0.2)}6", // --> ethane made of natural C and non-natural H made of 80% of hydrogen and 20% of deuterium
"C2H{H[1](0.9)H[2](0.2)}6", // --> invalid mixture (total is 110%?)
"U[235]", // --> pure uranium 235
}) try
{
std::cout << " ============= '" << input << "' ===========\n";
It f = input.begin(), l = input.end();
isotopesMixture mixture;
bool ok = qi::parse(f, l, parser, mixture);
if (ok)
std::cout << "Parsed successfully\n";
else
std::cout << "Parse failure\n";
if (f != l)
std::cout << "Remaining input unparsed: '" << std::string(f, l) << "'\n";
} catch(std::exception const& e) {
std::cout << "Caught exception '" << e.what() << "'\n";
}
}
Prints
============= 'C' ===========
Parsed successfully
============= 'CH4' ===========
Parsed successfully
============= 'C2H{H[1](0.8)H[2](0.2)}6' ===========
Parsed successfully
============= 'C2H{H[1](0.9)H[2](0.2)}6' ===========
Caught exception 'boost::spirit::qi::expectation_failure'
============= 'U[235]' ===========
Parsed successfully

Related

How to use a pointer as token attribute in Boost::Spirit::Lex?

I write a minimum example to demonstrate this problem. It parses nested list of numbers like (1 2 3 (4 5) (6 (7 (8)))). I use spirit::lex to parse number and spirit::qi to parse list, so I code like this:
using TokenTypes = boost::mpl::vector<Object*>;
using Iterator = std::string::iterator;
class Lexer : public lex::lexer<actor_lexer<token<Iterator, TokenTypes>>>
{
public:
lex::token_def<> spaces; // used to skip spaces
lex::token_def<Object*> number; // create Number Object on heap and use the pointer as attribute
public:
Lexer();
};
template<typename... Ts>
using Rule = qi::rule<Lexer::iterator_type, Ts...>;
class Parser : public qi::grammar<Lexer::iterator_type, Object*>
{
public:
Lexer lexer;
Rule<Object*> list;
Rule<Object*> elem;
public:
Parser();
};
But in Parser::Parser(), I can't use Lexer::number in gramma expression:
Parser::Parser()
: base_type(elem)
{
// list = ...
elem %= list | lexer.number; // fail to compile!
}
Clang error message (brief):
/usr/include/boost/spirit/home/qi/detail/assign_to.hpp:42:36: error: type 'Object *' cannot be used prior to '::' because it has no members
: is_iter_range<typename C::value_type> {};
^
...
...
...
I can't understand why this is wrong considering it used to work fine when I use other scalar types like int and double as token attribute.
So, how to use pointer type as token attribute?
complete example
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
class Object
{
public:
virtual ~Object() = default;
public:
virtual void print(std::ostream& out) = 0;
};
class Number : public Object
{
public:
int64_t _val;
public:
virtual void print(std::ostream& out) override { out << _val; }
};
class List : public Object
{
public:
std::vector<Object*> _objs;
public:
virtual void print(std::ostream& out) override
{
out << '(';
for (auto&& i : _objs) {
i->print(out);
out << ' ';
}
out << ')';
}
};
namespace qi = boost::spirit::qi;
namespace fu = boost::fusion;
namespace lex = boost::spirit::lex;
using lex::lexertl::actor_lexer;
using lex::lexertl::token;
using TokenTypes = boost::mpl::vector<Object*>;
using Iterator = std::string::iterator;
class Lexer : public lex::lexer<actor_lexer<token<Iterator, TokenTypes>>>
{
public:
lex::token_def<> spaces;
lex::token_def<Object*> number;
public:
Lexer();
};
template<typename... Ts>
using Rule = qi::rule<Lexer::iterator_type, Ts...>;
class Parser : public qi::grammar<Lexer::iterator_type, Object*>
{
public:
Lexer lexer;
Rule<Object*, qi::locals<List*>> list;
Rule<Object*> elem;
public:
Parser();
};
Lexer::Lexer()
{
self += '(';
self += ')';
spaces = R"(\s+)";
self +=
spaces[([](auto& start, auto& end, auto& matched, auto& id, auto& ctx) {
matched = lex::pass_flags::pass_ignore;
})];
number = R"(\d+)";
self +=
number[([](auto& start, auto& end, auto& matched, auto& id, auto& ctx) {
auto val = new Number();
auto iter = start;
qi::parse(iter, end, qi::long_long, val->_val);
ctx.set_value(val);
})];
}
Parser::Parser()
: base_type(elem)
{
list = ( //
qi::lit('(')[( //
[](auto& attr, auto& ctx, bool& pass) {
fu::at_c<0>(ctx.locals) = new List();
})] //
>> *(elem[( //
[](auto& attr, auto& ctx, bool& pass) {
List* list = fu::at_c<0>(ctx.locals);
list->_objs.push_back(attr);
})]) //
>> ')' //
)[( //
[](auto& attr, auto& ctx, bool& pass) {
List* list = fu::at_c<0>(ctx.locals);
fu::at_c<0>(ctx.attributes) = list;
})];
elem %= list | lexer.number;
}
int
main(int argc, char* argv[])
{
Parser parser;
std::string line;
while (std::getline(std::cin, line)) {
auto begin = line.begin();
Object* obj;
lex::tokenize_and_parse(begin, line.end(), parser.lexer, parser, obj);
obj->print(std::cout);
std::cout << std::endl;
}
}

Okay. Don't take this badly. Reading your sample (kudos for including a self-contained example! This saves a ton of time) I can't help but feeling that you've somehow stumbled on the worst possible cross-section of anti-patterns in Spirit Qi.
You're using a polymorphic AST:
How can I use polymorphic attributes with boost::spirit::qi parsers?
Semantic actions runs multiple times in boost::spirit parsing
Parsing inherited struct with boost spirit
You're using semantic actions. As a rule this already misses the sweet spot for embedded grammars, which is why I linked 126 answers to Boost Spirit: "Semantic actions are evil"?.
However, that's even just talking about semantic actions for Qi. You also use them for Lex:
self +=
spaces[([](auto& start, auto& end, auto& matched, auto& id,
auto& ctx) { matched = lex::pass_flags::pass_ignore; })];
Which is then further complicated by not using Phoenix, e.g.:
self += spaces[lex::_pass = lex::pass_flags::pass_ignore];
Which does exactly the same but with about 870% less noise and equal amounts of evil magic.
The other semantic action tops it all:
self += number[(
[](auto& start, auto& end, auto& matched, auto& id, auto& ctx) {
auto val = new Number();
auto iter = start;
qi::parse(iter, end, qi::long_long, val->_val);
ctx.set_value(val);
})];
Besides having all the problems already listed, it literally makes a fractal out of things by calling Qi from a Lex semantic action. Of course, this wants to be:
self += number[lex::_val = phx::new_<Number>(/*magic*/)];
But that magic doesn't exist. My gut feeling is that your issue that the Lexer shouldn't be concerned with AST types at all. At this point I feel that the lexer could/should be something like
using TokenTypes = boost::mpl::vector<uint64_t>;
using Iterator = std::string::const_iterator; // NOTE const_
struct Lexer : lex::lexer<actor_lexer<token<Iterator, TokenTypes>>> {
lex::token_def<> spaces;
lex::token_def<uint64_t> number;
Lexer() : spaces{R"(\s+)"}, number{R"(\d+)"} {
self += '(';
self += ')';
self += spaces[lex::_pass = lex::pass_flags::pass_ignore];
self += number;
}
};
That is, if it should exist at all.
That's the structural assessment. Let me apply simplifications to the Qi grammar along the same lines, just so we can reason about the code:
struct Parser : qi::grammar<Lexer::iterator_type, Object*()> {
Parser() : base_type(elem) {
using namespace qi::labels;
static constexpr qi::_a_type _list{};
const auto _objs = phx::bind(&List::_objs, _list);
list = ( //
'(' >> //
*(elem[phx::push_back(_objs, _1)]) //
>> ')' //
)[_val = phx::new_<List>(_list)];
elem //
= list[_val = _1] //
| lexer.number[_val = phx::new_<Number>(_1)];
}
Lexer lexer; // TODO FIXME excess scope
private:
using It = Lexer::iterator_type;
qi::rule<It, Object*(), qi::locals<List>> list;
qi::rule<It, Object*()> elem;
};
Note how I made the local List instead of List*, to just slightly reduce the chance of memory leaks. I guess for efficiency you could try to make Phoenix do move-semantics for you:
[_val = phx::new_<List>(phx::static_cast_<List&&>(_list))];
But at that point I wouldn't trust all the expression templates to do what you want and go to the more elaborate (even assuming c++17):
phx::function move_new = [](List& l) { return new List(std::move(l)); };
list = ( //
'(' >> //
*(elem[phx::push_back(_objs, _1)]) //
>> ')' //
)[_val = move_new(_list)];
Now we arrive at a workable demo:
Live On Coliru
int main() {
Parser parser;
for (std::string const line : {
"",
"42",
"()",
"(1 2 3)",
"(1 (44 55 () 66) 3)",
}) {
auto begin = line.begin();
Object* obj = nullptr;
if (lex::tokenize_and_parse(begin, line.end(), parser.lexer, parser,
obj)) {
obj->print(std::cout << std::quoted(line) << " -> ");
delete obj;
} else {
std::cout << std::quoted(line) << " -> FAILED";
}
std::cout << std::endl;
}
}
Printing
"" -> FAILED
"42" -> 42
"()" -> ()
"(1 2 3)" -> (1 2 3 )
"(1 (44 55 () 66) 3)" -> (1 (44 55 () 66 ) 3 )
Note that this simple test program ALREADY leaks 11 objects, for a total of 224 bytes. That's not even complicating things with error-handling or backtracking rules.
That's craziness. You could of course fix it with smart pointers, but that just further complicates everything while making sure performance will be very poor.
Further Simplifications
I would stop using Lex and dynamic polymorphism:
No More Lex:
The only "value" Lex is adding here is skipping spaces. Qi is very capable (see e.g. Boost spirit skipper issues for variations on that theme), so we'll use skip(space)[] instead:
Live On Coliru
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
#include <string>
#include <vector>
struct Object {
virtual ~Object() = default;
virtual void print(std::ostream& out) const = 0;
friend std::ostream& operator<<(std::ostream& os, Object const& o) { return o.print(os), os; }
};
struct Number : Object {
Number(uint64_t v = 0) : _val(v) {}
int64_t _val;
virtual void print(std::ostream& out) const override { out << _val; }
};
struct List : Object {
std::vector<Object*> _objs;
virtual void print(std::ostream& out) const override {
out << '(';
for (auto&& el : _objs)
out << ' ' << *el;
out << ')';
}
};
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
template <typename It>
struct Parser : qi::grammar<It, Object*()> {
Parser() : Parser::base_type(start) {
using namespace qi::labels;
static constexpr qi::_a_type _list{};
const auto _objs = phx::bind(&List::_objs, _list);
phx::function move_new = [](List& l) { return new List(std::move(l)); };
list = ( //
'(' >> //
*(elem[phx::push_back(_objs, _1)]) //
>> ')' //
)[_val = move_new(_list)];
elem //
= list[_val = _1] //
| qi::uint_[_val = phx::new_<Number>(_1)] //
;
start = qi::skip(qi::space)[elem];
}
private:
qi::rule<It, Object*(), qi::space_type, qi::locals<List>> list;
qi::rule<It, Object*(), qi::space_type> elem;
// lexemes
qi::rule<It, Object*()> start;
};
int main() {
Parser<std::string::const_iterator> const parser;
for (std::string const line : {
"",
"42",
"()",
"(1 2 3)",
"(1 (44 55 () 66) 3)",
}) {
Object* obj = nullptr;
if (parse(line.begin(), line.end(), parser >> qi::eoi, obj)) {
std::cout << std::quoted(line) << " -> " << *obj;
} else {
std::cout << std::quoted(line) << " -> FAILED";
}
delete obj;
std::cout << std::endl;
}
}
Still leaking like C++ went out of fashion, but at least doing so in 20 fewer LoC and half the compile time.
Static Polymorphism
Hiding all the raw pointer stuff (or avoiding it completely, depending on the exact AST requirements):
using Number = uint64_t;
using Object = boost::make_recursive_variant< //
Number, //
std::vector<boost::recursive_variant_>>::type;
using List = std::vector<Object>;
For ease of supplying operator<< I moved them into an AST namespace below.
The parser goes down to:
template <typename It> struct Parser : qi::grammar<It, AST::Object()> {
Parser() : Parser::base_type(start) {
list = '(' >> *elem >> ')';
elem = list | qi::uint_;
start = qi::skip(qi::space)[elem];
}
private:
qi::rule<It, AST::List(), qi::space_type> list;
qi::rule<It, AST::Object(), qi::space_type> elem;
qi::rule<It, AST::Object()> start;
};
No more lex, no more phoenix, no more leaks, no more manual semantic actions. Just, expressive code.
Live Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace AST {
struct Number {
uint64_t v;
Number(uint64_t v = 0) : v(v){};
};
using Object = boost::make_recursive_variant< //
Number, //
std::vector<boost::recursive_variant_>>::type;
using List = std::vector<Object>;
std::ostream& operator<<(std::ostream& os, Number const& n) {
return os << n.v;
}
std::ostream& operator<<(std::ostream& os, List const& l) {
os << '(';
for (auto& el : l)
os << ' ' << el;
return os << ')';
}
} // namespace AST
namespace qi = boost::spirit::qi;
template <typename It> struct Parser : qi::grammar<It, AST::Object()> {
Parser() : Parser::base_type(start) {
list = '(' >> *elem >> ')';
elem = list | qi::uint_;
start = qi::skip(qi::space)[elem];
}
private:
qi::rule<It, AST::List(), qi::space_type> list;
qi::rule<It, AST::Object(), qi::space_type> elem;
qi::rule<It, AST::Object()> start;
};
int main() {
Parser<std::string::const_iterator> const parser;
for (std::string const line : {
"",
"42",
"()",
"(1 2 3)",
"(1 (44 55 () 66) 3)",
}) {
AST::Object obj;
if (parse(line.begin(), line.end(), parser >> qi::eoi, obj))
std::cout << std::quoted(line) << " -> " << obj << "\n";
else
std::cout << std::quoted(line) << " -> FAILED\n";
}
}
Prints
"" -> FAILED
"42" -> 42
"()" -> ()
"(1 2 3)" -> ( 1 2 3)
"(1 (44 55 () 66) 3)" -> ( 1 ( 44 55 () 66) 3)
But this time, without leaking memory. And also, it now compiles fast enough that Compiler Explorer can also handle it.

Found a walkaround: use std::size_t and reinterpret_cast to replace pointer types:
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
class Object
{
public:
virtual ~Object() = default;
public:
virtual void print(std::ostream& out) = 0;
};
class Number : public Object
{
public:
int64_t _val;
public:
virtual void print(std::ostream& out) override { out << _val; }
};
class List : public Object
{
public:
std::vector<Object*> _objs;
public:
virtual void print(std::ostream& out) override
{
out << '(';
for (auto&& i : _objs) {
i->print(out);
out << ' ';
}
out << ')';
}
};
namespace qi = boost::spirit::qi;
namespace fu = boost::fusion;
namespace lex = boost::spirit::lex;
using lex::lexertl::actor_lexer;
using lex::lexertl::token;
using TokenTypes = boost::mpl::vector<std::size_t>;
using Iterator = std::string::iterator;
class Lexer : public lex::lexer<actor_lexer<token<Iterator, TokenTypes>>>
{
public:
lex::token_def<> spaces;
lex::token_def<std::size_t> number; // use std::size_t instead
public:
Lexer();
};
template<typename... Ts>
using Rule = qi::rule<Lexer::iterator_type, Ts...>;
class Parser : public qi::grammar<Lexer::iterator_type, Object*>
{
public:
Lexer lexer;
Rule<Object*, qi::locals<List*>> list;
Rule<Object*> elem;
public:
Parser();
};
Lexer::Lexer()
{
self += '(';
self += ')';
spaces = R"(\s+)";
self +=
spaces[([](auto& start, auto& end, auto& matched, auto& id, auto& ctx) {
matched = lex::pass_flags::pass_ignore;
})];
number = R"(\d+)";
self +=
number[([](auto& start, auto& end, auto& matched, auto& id, auto& ctx) {
auto val = new Number();
auto iter = start;
qi::parse(iter, end, qi::long_long, val->_val);
ctx.set_value(reinterpret_cast<std::size_t>(val)); // cast here
})];
}
Parser::Parser()
: base_type(elem)
{
list = ( //
qi::lit('(')[( //
[](auto& attr, auto& ctx, bool& pass) {
fu::at_c<0>(ctx.locals) = new List();
})] //
>> *(elem[( //
[](auto& attr, auto& ctx, bool& pass) {
List* list = fu::at_c<0>(ctx.locals);
list->_objs.push_back(attr);
})]) //
>> ')' //
)[( //
[](auto& attr, auto& ctx, bool& pass) {
List* list = fu::at_c<0>(ctx.locals);
fu::at_c<0>(ctx.attributes) = list;
})];
elem %= list | qi::omit[lexer.number[([](auto& attr, auto& ctx, bool& pass) {
fu::at_c<0>(ctx.attributes) = reinterpret_cast<Object*>(attr); // cast here
})]];
}
int
main(int argc, char* argv[])
{
Parser parser;
std::string line;
while (std::getline(std::cin, line)) {
auto begin = line.begin();
Object* obj;
lex::tokenize_and_parse(begin, line.end(), parser.lexer, parser, obj);
obj->print(std::cout);
std::cout << std::endl;
}
}
I think this is really ugly. Anyone has a better solution???

Make boost::spirit::symbol parser non greedy

I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over. I use x3::symbols to get automatically get the parsed keyword represented as an enum value.
Minimal example:
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>
namespace x3 = boost::spirit::x3;
enum class TypeKeyword { Int, Float, Bool };
struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
TypeKeywordSymbolTable()
{
add("float", TypeKeyword::Float)
("int", TypeKeyword::Int)
("bool", TypeKeyword::Bool);
}
};
const TypeKeywordSymbolTable type_keyword_symbol_table;
struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;
const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);
using Iterator = std::string_view::const_iterator;
/* Thrown when the parser has failed to parse the whole input stream. Contains
* the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
public:
LeftoverError(Iterator begin, Iterator end)
: std::runtime_error(std::string(begin, end))
{}
std::string_view get_leftover_data() const noexcept { return what(); }
};
template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
Iterator begin = input.begin();
Iterator end = input.end();
using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
typename Rule::attribute_type result;
try {
bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
if (r && begin == end) {
return result;
} else { // Occurs when the whole input stream has not been consumed.
throw LeftoverError(begin, end);
}
} catch (const ExpectationFailure& exc) {
throw LeftoverError(exc.where(), end);
}
}
int main()
{
// TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
// "boolean" leftover is desired.
parse("boolean", type_keyword);
// TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
// "integer" leftover is desired.
parse("integer", type_keyword);
// TypeKeyword::Int is parsed successfully and this is the desired behavior.
parse("int", type_keyword);
}
Basicly, I want integer not to be recognized as a keyword with additional eger left to parse.

I morphed the test cases into self-describing expectations:
Live On Compiler Explorer
Prints:
FAIL "boolean" -> TypeKeyword::Bool (expected Leftover:"boolean")
FAIL "integer" -> TypeKeyword::Int (expected Leftover:"integer")
OK "int" -> TypeKeyword::Int
Now, the simplest, naive approach would be to make sure you parse till the eoi, by simply changing
auto actual = parse(input, Parser::type_keyword);
To
auto actual = parse(input, Parser::type_keyword >> x3::eoi);
And indeed the tests pass: Live
OK "boolean" -> Leftover:"boolean"
OK "integer" -> Leftover:"integer"
OK "int" -> TypeKeyword::Int
However, this fits the tests, but not the goal. Let's imagine a more involved grammar, where type identifier; is to be parsed:
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z_0-9")];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_;
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
I'll leave the details for Compiler Explorer:
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
Looks good. But what if we add some interesting tests:
{"flo at f;", LeftoverError("flo at f;")},
{"boolean;", LeftoverError("boolean;")},
It prints (Live)
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
FAIL "boolean;" -> (TypeKeyword::Bool ean) (expected Leftover:"boolean;")
So, the test cases were lacking. Your prose description is actually closer:
I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over
That correctly implies you want to check the lexeme inside the type_keyword rule. A naive try might be checking that no identifier character follows the type keyword:
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= type_ >> !identchar;
Where identchar was factored out of identifier like so:
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
However, this doesn't work. Can you see why (peeking allowed: https://godbolt.org/z/jb4zfhfWb)?
Our latest devious test case now passes (yay), but int j; is now rejected! If you think about it, it only makes sense, because you have spaced skipped.
The essential word I used a moment ago was lexeme: you want to treat some units as lexemes (and whitespace stops the lexeme. Or rather, whitespace isn't automatically skipped inside the lexeme¹). So, a fix would be:
auto type_keyword
// ...
= x3::lexeme[type_ >> !identchar];
(Note how I sneakily already included that on the identifier rule earlier)
Lo and behold (Live):
OK "boolean b;" -> Leftover:"boolean b;"
OK "integer i;" -> Leftover:"integer i;"
OK "int j;" -> (TypeKeyword::Int j)
OK "boolean;" -> Leftover:"boolean;"
Summarizing
This topic is a frequently recurring one, and it requires a solid understanding of skippers, lexemes first and foremost. Here are some other posts for inspiration:
Stop X3 symbols from matching substrings
parsing identifiers except keywords
Boost Spirit x3: parse delimited string Where I introduce a more general helper you might find useful:
auto kw = [](auto p) {
return x3::lexeme [ x3::as_parser(p) >> !x3::char_("a-zA-Z0-9_") ];
};
Stop X3 symbols from matching substrings
Dynamically switching symbol tables in x3
Good luck!
Complete Listing
Anti-Bitrot, the final listing:
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/as_vector.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
namespace Ast {
enum class TypeKeyword { Int, Float, Bool };
static std::ostream& operator<<(std::ostream& os, TypeKeyword tk) {
switch (tk) {
case TypeKeyword::Int: return os << "TypeKeyword::Int";
case TypeKeyword::Float: return os << "TypeKeyword::Float";
case TypeKeyword::Bool: return os << "TypeKeyword::Bool";
};
return os << "?";
}
using Identifier = std::string;
struct Declaration {
TypeKeyword type;
Identifier id;
bool operator==(Declaration const&) const = default;
};
} // namespace Ast
BOOST_FUSION_ADAPT_STRUCT(Ast::Declaration, type, id)
namespace Ast{
static std::ostream& operator<<(std::ostream& os, Ast::Declaration const& d) {
return os << boost::lexical_cast<std::string>(boost::fusion::as_vector(d));
}
} // namespace Ast
namespace Parser {
struct Type : x3::symbols<Ast::TypeKeyword> {
Type() {
add("float", Ast::TypeKeyword::Float) //
("int", Ast::TypeKeyword::Int) //
("bool", Ast::TypeKeyword::Bool); //
}
} const static type_;
auto identchar = x3::char_("a-zA-Z_0-9");
auto identifier
= x3::rule<struct id_, Ast::Identifier> {"identifier"}
= x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];
auto type_keyword
= x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
= x3::lexeme[type_ >> !identchar];
auto declaration
= x3::rule<struct decl_, Ast::Declaration>{"declaration"}
= type_keyword >> identifier >> ';';
} // namespace Parser
struct LeftoverError : std::runtime_error {
using std::runtime_error::runtime_error;
friend std::ostream& operator<<(std::ostream& os, LeftoverError const& e) {
return os << (std::string("Leftover:\"") + e.what() + "\"");
}
bool operator==(LeftoverError const& other) const {
return std::string_view(what()) == other.what();
}
};
template<typename T>
using Maybe = boost::variant<T, LeftoverError>;
template <typename Rule,
typename Attr = typename x3::traits::attribute_of<Rule, x3::unused_type>::type,
typename R = Maybe<Attr>>
R parse(std::string_view input, Rule const& rule) {
Attr result;
auto f = input.begin(), l = input.end();
return x3::phrase_parse(f, l, rule, x3::space, result)
? R(std::move(result))
: LeftoverError({f, l});
}
int main()
{
using namespace Ast;
struct {
std::string_view input;
Maybe<Declaration> expected;
} cases[] = {
{"boolean b;", LeftoverError("boolean b;")},
{"integer i;", LeftoverError("integer i;")},
{"int j;", Declaration{TypeKeyword::Int, "j"}},
{"boolean;", LeftoverError("boolean;")},
};
for (auto [input, expected] : cases) {
auto actual = parse(input, Parser::declaration >> x3::eoi);
bool ok = expected == actual;
std::cout << std::left << std::setw(6) << (ok ? "OK" : "FAIL")
<< std::setw(12) << std::quoted(input) << " -> "
<< std::setw(20) << actual;
if (not ok)
std::cout << " (expected " << expected << ")";
std::cout << "\n";
}
}
¹ see Boost spirit skipper issues

cannot get boost::spirit parser&lexer working for token types other than std::string or int or double

This does not compile (code below).
There was another question here with the same error. But I don't understand the answer. I already tried inserting qi::eps in places -- but without success.
I also tried already adding meta functions (boost::spirit::raits::is_container) for the types used -- but this also does not help.
I also tried using the same variant containing all types I need to use everywhere. Same problem.
Has anybody gotten this working for a lexer returning something else than double or int or string? And for the parser also returning non-trivial objects?
I've tried implementing semantic functions everywhere returning default objects. But this also does not help.
Here comes the code:
// spirit_error.cpp : Defines the entry point for the console application.
//
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/phoenix/object.hpp>
#include <boost/spirit/include/qi_char_class.hpp>
#include <boost/spirit/include/phoenix_bind.hpp>
#include <boost/mpl/index_of.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/intrusive_ptr.hpp>
#include <boost/smart_ptr/intrusive_ref_counter.hpp>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace frank
{
class ref_counter:public boost::intrusive_ref_counter<ref_counter>
{ public:
virtual ~ref_counter(void)
{
}
};
class symbol:public ref_counter
{ public:
typedef boost::intrusive_ptr<const symbol> symbolPtr;
typedef std::vector<symbolPtr> symbolVector;
struct push_scope
{ push_scope()
{
}
~push_scope(void)
{
}
};
};
class nature:public symbol
{ public:
enum enumAttribute
{ eAbstol,
eAccess,
eDDT,
eIDT,
eUnits
};
struct empty
{ bool operator<(const empty&) const
{ return false;
}
friend std::ostream &operator<<(std::ostream &_r, const empty&)
{ return _r;
}
};
typedef boost::variant<empty, std::string> attributeValue;
};
class discipline:public symbol
{ public:
enum enumDomain
{ eDiscrete,
eContinuous
};
};
class type:public ref_counter
{ public:
typedef boost::intrusive_ptr<type> typePtr;
};
struct myIterator:std::iterator<std::random_access_iterator_tag, char, std::ptrdiff_t, const char*, const char&>
{ std::string *m_p;
std::size_t m_iPos;
myIterator(void)
:m_p(nullptr),
m_iPos(~std::size_t(0))
{
}
myIterator(std::string &_r, const bool _bEnd = false)
:m_p(&_r),
m_iPos(_bEnd ? ~std::size_t(0) : 0)
{
}
myIterator(const myIterator &_r)
:m_p(_r.m_p),
m_iPos(_r.m_iPos)
{
}
myIterator &operator=(const myIterator &_r)
{ if (this != &_r)
{ m_p = _r.m_p;
m_iPos = _r.m_iPos;
}
return *this;
}
const char &operator*(void) const
{ return m_p->at(m_iPos);
}
bool operator==(const myIterator &_r) const
{ return m_p == _r.m_p && m_iPos == _r.m_iPos;
}
bool operator!=(const myIterator &_r) const
{ return m_p != _r.m_p || m_iPos != _r.m_iPos;
}
myIterator &operator++(void)
{ ++m_iPos;
if (m_iPos == m_p->size())
m_iPos = ~std::size_t(0);
return *this;
}
myIterator operator++(int)
{ const myIterator s(*this);
operator++();
return s;
}
myIterator &operator--(void)
{ --m_iPos;
return *this;
}
myIterator operator--(int)
{ const myIterator s(*this);
operator--();
return s;
}
bool operator<(const myIterator &_r) const
{ if (m_p == _r.m_p)
return m_iPos < _r.m_iPos;
else
return m_p < _r.m_p;
}
std::ptrdiff_t operator-(const myIterator &_r) const
{ return m_iPos - _r.m_iPos;
}
};
struct onInclude
{ auto operator()(myIterator &_rStart, myIterator &_rEnd) const
{ // erase what has been matched (the include statement)
_rStart.m_p->erase(_rStart.m_iPos, _rEnd.m_iPos - _rStart.m_iPos);
// and insert the contents of the file
_rStart.m_p->insert(_rStart.m_iPos, "abcd");
_rEnd = _rStart;
return lex::pass_flags::pass_ignore;
}
};
template<typename LEXER>
class lexer:public lex::lexer<LEXER>
{ public:
lex::token_def<type::typePtr> m_sKW_real, m_sKW_integer, m_sKW_string;
lex::token_def<lex::omit> m_sLineComment, m_sCComment;
lex::token_def<lex::omit> m_sWS;
lex::token_def<lex::omit> m_sSemicolon, m_sEqual, m_sColon, m_sInclude, m_sCharOP, m_sCharCP,
m_sComma;
lex::token_def<std::string> m_sIdentifier, m_sString;
lex::token_def<double> m_sReal;
lex::token_def<int> m_sInteger;
lex::token_def<lex::omit> m_sKW_units, m_sKW_access, m_sKW_idt_nature, m_sKW_ddt_nature, m_sKW_abstol,
m_sKW_nature, m_sKW_endnature, m_sKW_continuous, m_sKW_discrete,
m_sKW_potential, m_sKW_flow, m_sKW_domain, m_sKW_discipline, m_sKW_enddiscipline, m_sKW_module,
m_sKW_endmodule, m_sKW_parameter;
//typedef const type *typePtr;
template<typename T>
struct extractValue
{ T operator()(const myIterator &_rStart, const myIterator &_rEnd) const
{ return boost::lexical_cast<T>(std::string(_rStart, _rEnd));
}
};
struct extractString
{ std::string operator()(const myIterator &_rStart, const myIterator &_rEnd) const
{ const auto s = std::string(_rStart, _rEnd);
return s.substr(1, s.size() - 2);
}
};
lexer(void)
:m_sWS("[ \\t\\n\\r]+"),
m_sKW_parameter("\"parameter\""),
m_sKW_real("\"real\""),
m_sKW_integer("\"integer\""),
m_sKW_string("\"string\""),
m_sLineComment("\\/\\/[^\\n]*"),
m_sCComment("\\/\\*"
"("
"[^*]"
"|" "[\\n]"
"|" "([*][^/])"
")*"
"\\*\\/"),
m_sSemicolon("\";\""),
m_sEqual("\"=\""),
m_sColon("\":\""),
m_sCharOP("\"(\""),
m_sCharCP("\")\""),
m_sComma("\",\""),
m_sIdentifier("[a-zA-Z_]+[a-zA-Z0-9_]*"),
m_sString("[\\\"]"
//"("
// "(\\[\"])"
// "|"
//"[^\"]"
//")*"
"[^\\\"]*"
"[\\\"]"),
m_sKW_units("\"units\""),
m_sKW_access("\"access\""),
m_sKW_idt_nature("\"idt_nature\""),
m_sKW_ddt_nature("\"ddt_nature\""),
m_sKW_abstol("\"abstol\""),
m_sKW_nature("\"nature\""),
m_sKW_endnature("\"endnature\""),
m_sKW_continuous("\"continuous\""),
m_sKW_discrete("\"discrete\""),
m_sKW_domain("\"domain\""),
m_sKW_discipline("\"discipline\""),
m_sKW_enddiscipline("\"enddiscipline\""),
m_sKW_potential("\"potential\""),
m_sKW_flow("\"flow\""),
//realnumber ({uint}{exponent})|((({uint}\.{uint})|(\.{uint})){exponent}?)
//exponent [Ee][+-]?{uint}
//uint [0-9][_0-9]*
m_sReal("({uint}{exponent})"
"|"
"("
"(({uint}[\\.]{uint})|([\\.]{uint})){exponent}?"
")"
),
m_sInteger("{uint}"),
m_sInclude("\"`include\""),
m_sKW_module("\"module\""),
m_sKW_endmodule("\"endmodule\"")
{ this->self.add_pattern
("uint", "[0-9]+")
("exponent", "[eE][\\+\\-]?{uint}");
this->self = m_sSemicolon
| m_sEqual
| m_sColon
| m_sCharOP
| m_sCharCP
| m_sComma
| m_sString[lex::_val = boost::phoenix::bind(extractString(), lex::_start, lex::_end)]
| m_sKW_real//[lex::_val = boost::phoenix::bind(&type::getReal)]
| m_sKW_integer//[lex::_val = boost::phoenix::bind(&type::getInteger)]
| m_sKW_string//[lex::_val = boost::phoenix::bind(&type::getString)]
| m_sKW_parameter
| m_sKW_units
| m_sKW_access
| m_sKW_idt_nature
| m_sKW_ddt_nature
| m_sKW_abstol
| m_sKW_nature
| m_sKW_endnature
| m_sKW_continuous
| m_sKW_discrete
| m_sKW_domain
| m_sKW_discipline
| m_sKW_enddiscipline
| m_sReal[lex::_val = boost::phoenix::bind(extractValue<double>(), lex::_start, lex::_end)]
| m_sInteger[lex::_val = boost::phoenix::bind(extractValue<int>(), lex::_start, lex::_end)]
| m_sKW_potential
| m_sKW_flow
| m_sKW_module
| m_sKW_endmodule
| m_sIdentifier
| m_sInclude [ lex::_state = "INCLUDE" ]
;
this->self("INCLUDE") += m_sString [
lex::_state = "INITIAL", lex::_pass = boost::phoenix::bind(onInclude(), lex::_start, lex::_end)
];
this->self("WS") = m_sWS
| m_sLineComment
| m_sCComment
;
}
};
template<typename Iterator, typename Lexer>
class natureParser:public qi::grammar<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> >
{ qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sStart;
qi::rule<Iterator, std::pair<nature::enumAttribute, nature::attributeValue>(void), qi::in_state_skipper<Lexer> > m_sProperty;
qi::rule<Iterator, std::string(), qi::in_state_skipper<Lexer> > m_sName;
public:
template<typename Tokens>
natureParser(const Tokens &_rTokens)
:natureParser::base_type(m_sStart)
{ m_sProperty = (_rTokens.m_sKW_units
>> _rTokens.m_sEqual
>> _rTokens.m_sString
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_access
>> _rTokens.m_sEqual
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_idt_nature
>> _rTokens.m_sEqual
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_ddt_nature
>> _rTokens.m_sEqual
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
)
| (_rTokens.m_sKW_abstol
>> _rTokens.m_sEqual
>> _rTokens.m_sReal
>> _rTokens.m_sSemicolon
)
;
m_sName = (_rTokens.m_sColon >> _rTokens.m_sIdentifier);
m_sStart = (_rTokens.m_sKW_nature
>> _rTokens.m_sIdentifier
>> -m_sName
>> _rTokens.m_sSemicolon
>> *(m_sProperty)
>> _rTokens.m_sKW_endnature
);
m_sStart.name("start");
m_sProperty.name("property");
}
};
/*
// Conservative discipline
discipline electrical;
potential Voltage;
flow Current;
enddiscipline
*/
// a parser for a discipline declaration
template<typename Iterator, typename Lexer>
class disciplineParser:public qi::grammar<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> >
{ qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sStart;
typedef std::pair<bool, boost::intrusive_ptr<const nature> > CPotentialAndNature;
struct empty
{ bool operator<(const empty&) const
{ return false;
}
friend std::ostream &operator<<(std::ostream &_r, const empty&)
{ return _r;
}
};
typedef boost::variant<empty, CPotentialAndNature, discipline::enumDomain> property;
qi::rule<Iterator, discipline::enumDomain(), qi::in_state_skipper<Lexer> > m_sDomain;
qi::rule<Iterator, property(void), qi::in_state_skipper<Lexer> > m_sProperty;
public:
template<typename Tokens>
disciplineParser(const Tokens &_rTokens)
:disciplineParser::base_type(m_sStart)
{ m_sDomain = _rTokens.m_sKW_continuous
| _rTokens.m_sKW_discrete
;
m_sProperty = (_rTokens.m_sKW_potential >> _rTokens.m_sIdentifier >> _rTokens.m_sSemicolon)
| (_rTokens.m_sKW_flow >> _rTokens.m_sIdentifier >> _rTokens.m_sSemicolon)
| (_rTokens.m_sKW_domain >> m_sDomain >> _rTokens.m_sSemicolon)
;
m_sStart = (_rTokens.m_sKW_discipline
>> _rTokens.m_sIdentifier
>> _rTokens.m_sSemicolon
>> *m_sProperty
>> _rTokens.m_sKW_enddiscipline
);
}
};
template<typename Iterator, typename Lexer>
class moduleParser:public qi::grammar<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> >
{ public:
qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sStart;
qi::rule<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> > m_sModulePortList;
qi::rule<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> > m_sPortList;
qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sPort;
qi::rule<Iterator, std::shared_ptr<symbol::push_scope>(void), qi::in_state_skipper<Lexer> > m_sModule;
typedef boost::intrusive_ptr<const ref_counter> intrusivePtr;
typedef std::vector<intrusivePtr> vectorOfPtr;
qi::rule<Iterator, vectorOfPtr(void), qi::in_state_skipper<Lexer> > m_sModuleItemList;
qi::rule<Iterator, intrusivePtr(void), qi::in_state_skipper<Lexer> > m_sParameter;
qi::rule<Iterator, intrusivePtr(void), qi::in_state_skipper<Lexer> > m_sModuleItem;
qi::rule<Iterator, type::typePtr(void), qi::in_state_skipper<Lexer> > m_sType;
template<typename Tokens>
moduleParser(const Tokens &_rTokens)
:moduleParser::base_type(m_sStart)
{ m_sPort = _rTokens.m_sIdentifier;
m_sPortList %= m_sPort % _rTokens.m_sComma;
m_sModulePortList %= _rTokens.m_sCharOP >> m_sPortList >> _rTokens.m_sCharCP;
m_sModule = _rTokens.m_sKW_module;
m_sType = _rTokens.m_sKW_real | _rTokens.m_sKW_integer | _rTokens.m_sKW_string;
m_sParameter = _rTokens.m_sKW_parameter
>> m_sType
>> _rTokens.m_sIdentifier
;
m_sModuleItem = m_sParameter;
m_sModuleItemList %= *m_sModuleItem;
m_sStart = (m_sModule
>> _rTokens.m_sIdentifier
>> m_sModulePortList
>> m_sModuleItemList
>> _rTokens.m_sKW_endmodule);
}
};
template<typename Iterator, typename Lexer>
class fileParser:public qi::grammar<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> >
{ public:
disciplineParser<Iterator, Lexer> m_sDiscipline;
natureParser<Iterator, Lexer> m_sNature;
moduleParser<Iterator, Lexer> m_sModule;
qi::rule<Iterator, symbol::symbolVector(void), qi::in_state_skipper<Lexer> > m_sStart;
qi::rule<Iterator, symbol::symbolPtr(void), qi::in_state_skipper<Lexer> > m_sItem;
//public:
template<typename Tokens>
fileParser(const Tokens &_rTokens)
:fileParser::base_type(m_sStart),
m_sNature(_rTokens),
m_sDiscipline(_rTokens),
m_sModule(_rTokens)
{ m_sItem = m_sDiscipline | m_sNature | m_sModule;
m_sStart = *m_sItem;
}
};
}
int main()
{ std::string sInput = "\
nature Current;\n\
units = \"A\";\n\
access = I;\n\
idt_nature = Charge;\n\
abstol = 1e-12;\n\
endnature\n\
\n\
// Charge in coulombs\n\
nature Charge;\n\
units = \"coul\";\n\
access = Q;\n\
ddt_nature = Current;\n\
abstol = 1e-14;\n\
endnature\n\
\n\
// Potential in volts\n\
nature Voltage;\n\
units = \"V\";\n\
access = V;\n\
idt_nature = Flux;\n\
abstol = 1e-6;\n\
endnature\n\
\n\
discipline electrical;\n\
potential Voltage;\n\
flow Current;\n\
enddiscipline\n\
";
typedef lex::lexertl::token<frank::myIterator, boost::mpl::vector<frank::type::typePtr, std::string, double, int> > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
typedef frank::lexer<lexer_type>::iterator_type iterator_type;
typedef frank::fileParser<iterator_type, frank::lexer<lexer_type>::lexer_def> grammar_type;
frank::lexer<lexer_type> sLexer;
grammar_type sParser(sLexer);
frank::symbol::push_scope sPush;
auto pStringBegin = frank::myIterator(sInput);
auto pBegin(sLexer.begin(pStringBegin, frank::myIterator(sInput, true)));
const auto b = qi::phrase_parse(pBegin, sLexer.end(), sParser, qi::in_state("WS")[sLexer.self]);
}

Has anybody gotten this working for a lexer returning something else than double or int or string?
Sure. Simple examples might be found on this site
And for the parser also returning non-trivial objects?
Here's your real problem. Spirit is nice for a subset of parsers that are expressed easily in a eDSL, and has the huge benefit of "magically" mapping to a selection of attributes.
Some of the realities are:
attributes are expected to have value-semantic; using polymorphic attributes is hard (How can I use polymorphic attributes with boost::spirit::qi parsers?, e.g.)
using Lex makes most of the sweet-spot disappear since all "highlevel" parsers (like real_parser, [u]int_parser) are out the window. The Spirit devs are on record they prefer not to use Lex. Moreover, Spirit X3 doesn't have Lex support anymore.
Background Information:
I'd very much consider parsing the source as-is, into direct value-typed AST nodes. I know, this is probably what you consider "trivial objects", but don't be deceived by apparent simplicity: recursive variant trees have some expressive power.
Examples
Here's a trivial AST to represent JSON in <20 LoC: Boost Karma generator for composition of classes¹
Here we represent the Graphviz source format with full fidelity: How to use boost spirit list operator with mandatory minimum amount of elements?
I've since created the code to transform that AST into a domain representation with fully correct ownership, cascading lexically scoped node/edge attributes and cross references. I have just recovered that work and put it up on github if you're interested, mainly because the task is pretty similar in many respects, like the overriding/inheriting of properties and resolving identifiers within scopes: https://github.com/sehe/spirit-graphviz/blob/master/spirit-graphviz.cpp#L660
Suggestions, Ideas
In your case I'd take similar approach to retain simplicity. The code shown doesn't (yet) cover the trickiest ingredients (like nature attribute overrides within a discipline).
Once you start implementing use-cases like resolving compatible disciplines and the absolute tolerances at a given node, you want a domain model with full fidelity. Preferrably, there would be no loss of source information, and immutable AST information².
As a middle ground, you could probably avoid building an entire source-AST in memory only to transform it in one big go, at the top-level you could have:
file = qi::skip(skipper) [
*(m_sDiscipline | m_sNature | m_sModule) [process_ast(_1)]
];
Where process_ast would apply the "trivial" AST representation into the domain types, one at a time. That way you keep only small bits of temporary AST representation around.
The domain representation can be arbitrarily sophisticated to support all your logic and use-cases.
Let's "Show, Don't Tell"
Baking the simplest AST that comes to mind matching the grammar³:
namespace frank { namespace ast {
struct nature {
struct empty{};
std::string name;
std::string inherits;
enum class Attribute { units, access, idt, ddt, abstol };
using Value = boost::variant<int, double, std::string>;
std::map<Attribute, Value> attributes;
};
struct discipline {
enum enumDomain { eUnspecified, eDiscrete, eContinuous };
struct properties_t {
enumDomain domain = eUnspecified;
boost::optional<std::string> flow, potential;
};
std::string name;
properties_t properties;
};
// TODO
using module = qi::unused_type;
using file = std::vector<boost::variant<nature, discipline, module> >;
enum class type { real, integer, string };
} }
This is trivial and maps 1:1 onto the grammar productions, which means we have very little impedance.
Tokens? We Don't Need Lex For That
You can have common token parsers without requiring the complexities of Lex
Yes, Lex (especially statically generated) can potentially improve performance, but
if you need that, I wager Spirit Qi is not your best option anyways
premature optimization...
What I did:
struct tokens {
// implicit lexemes
qi::rule<It, std::string()> string, identifier;
qi::rule<It, double()> real;
qi::rule<It, int()> integer;
qi::rule<It, ast::nature::Value()> value;
qi::rule<It, ast::nature::Attribute()> attribute;
qi::rule<It, ast::discipline::enumDomain()> domain;
struct attribute_sym_t : qi::symbols<char, ast::nature::Attribute> {
attribute_sym_t() {
this->add
("units", ast::nature::Attribute::units)
("access", ast::nature::Attribute::access)
("idt_nature", ast::nature::Attribute::idt)
("ddt_nature", ast::nature::Attribute::ddt)
("abstol", ast::nature::Attribute::abstol);
}
} attribute_sym;
struct domain_sym_t : qi::symbols<char, ast::discipline::enumDomain> {
domain_sym_t() {
this->add
("discrete", ast::discipline::eDiscrete)
("continuous", ast::discipline::eContinuous);
}
} domain_sym;
tokens() {
using namespace qi;
auto kw = qr::distinct(copy(char_("a-zA-Z0-9_")));
string = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
real = double_;
integer = int_;
attribute = kw[attribute_sym];
domain = kw[domain_sym];
value = string | identifier | real | integer;
BOOST_SPIRIT_DEBUG_NODES((string)(identifier)(real)(integer)(value)(domain)(attribute))
}
};
Liberating, isn't it? Note how
all attributes are automatically propagated
strings handle escapes (this bit was commented out in your Lex approach). We don't even need semantic actions to (badly) pry out the unquoted/unescaped value
we used distinct to ensure keyword parsing matches only full identifiers. (See How to parse reserved words correctly in boost spirit).
This is actually where you notice the lack of separate lexer.
On the flipside, this makes context-sensitive keywords a breeze (lex can easily prioritizes keywords over identifiers that occur in places where keywords cannot occur.⁴)
What About Skipping Space/Comments?
We could have added a token, but for reasons of convention I made it a parser:
struct skipParser : qi::grammar<It> {
skipParser() : skipParser::base_type(spaceOrComment) {
using namespace qi;
spaceOrComment = space
| ("//" >> *(char_ - eol) >> (eoi|eol))
| ("/*" >> *(char_ - "*/") >> "*/");
BOOST_SPIRIT_DEBUG_NODES((spaceOrComment))
}
private:
qi::rule<It> spaceOrComment;
};
natureParser
We inherit our AST parsers from tokens:
struct natureParser : tokens, qi::grammar<It, ast::nature(), skipParser> {
And from there it is plain sailing:
property = attribute >> '=' >> value >> ';';
nature
= kw["nature"] >> identifier >> -(':' >> identifier) >> ';'
>> *property
>> kw["endnature"];
disciplineParser
discipline = kw["discipline"] >> identifier >> ';'
>> properties
>> kw["enddiscipline"]
;
properties
= kw["domain"] >> domain >> ';'
^ kw["flow"] >> identifier >> ';'
^ kw["potential"] >> identifier >> ';'
;
This shows a competing approach that uses the permutation operator (^) to parse optional alternatives in any order into a fixed frank::ast::discipline properties struct. Of course, you might elect to have a more generic representation here, like we had with ast::nature.
Module AST is left as an exercise for the reader, though the parser rules are implemented below.
Top Level, Encapsulating The Skipper
I hate having to specify the skipper from the calling code (it's more complex than required, and changing the skipper changes the grammar). So, I encapsulate it in the top-level parser:
struct fileParser : qi::grammar<It, ast::file()> {
fileParser() : fileParser::base_type(file) {
file = qi::skip(qi::copy(m_sSkip)) [
*(m_sDiscipline | m_sNature | m_sModule)
];
BOOST_SPIRIT_DEBUG_NODES((file))
}
private:
disciplineParser m_sDiscipline;
natureParser m_sNature;
moduleParser m_sModule;
skipParser m_sSkip;
qi::rule<It, ast::file()> file;
};
Demo Time
This demo adds operator<< for the enums, and a variant visitor to print some AST details for debug/demonstrational purposes (print_em).
Then we have a test driver:
int main() {
using iterator_type = std::string::const_iterator;
iterator_type iter = sInput.begin(), last = sInput.end();
frank::Parsers<iterator_type>::fileParser parser;
print_em print;
frank::ast::file file;
bool ok = qi::parse(iter, last, parser, file);
if (ok) {
for (auto& symbol : file)
print(symbol);
}
else {
std::cout << "Parse failed\n";
}
if (iter != last) {
std::cout << "Remaining unparsed: '" << std::string(iter,last) << "'\n";
}
}
With the sample input from your question we get the following output:
Live On Coliru
-- Nature
name: Current
inherits:
attribute: units = A
attribute: access = I
attribute: idt = Charge
attribute: abstol = 1e-12
-- Nature
name: Charge
inherits:
attribute: units = coul
attribute: access = Q
attribute: ddt = Current
attribute: abstol = 1e-14
-- Nature
name: Voltage
inherits:
attribute: units = V
attribute: access = V
attribute: idt = Flux
attribute: abstol = 1e-06
-- Discipline
name: electrical
domain: (unspecified)
flow: Current
potential: Voltage
Remaining unparsed: '
'
With BOOST_SPIRIT_DEBUG defined, you get rich debug information: Live On Coliru
Full Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <map>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapted.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace qi = boost::spirit::qi;
namespace frank { namespace ast {
struct nature {
struct empty{};
std::string name;
std::string inherits;
enum class Attribute { units, access, idt, ddt, abstol };
using Value = boost::variant<int, double, std::string>;
std::map<Attribute, Value> attributes;
};
struct discipline {
enum enumDomain { eUnspecified, eDiscrete, eContinuous };
struct properties_t {
enumDomain domain = eUnspecified;
boost::optional<std::string> flow, potential;
};
std::string name;
properties_t properties;
};
// TODO
using module = qi::unused_type;
using file = std::vector<boost::variant<nature, discipline, module> >;
enum class type { real, integer, string };
} }
BOOST_FUSION_ADAPT_STRUCT(frank::ast::nature, name, inherits, attributes)
BOOST_FUSION_ADAPT_STRUCT(frank::ast::discipline, name, properties)
BOOST_FUSION_ADAPT_STRUCT(frank::ast::discipline::properties_t, domain, flow, potential)
namespace frank {
namespace qr = boost::spirit::repository::qi;
template <typename It> struct Parsers {
struct tokens {
// implicit lexemes
qi::rule<It, std::string()> string, identifier;
qi::rule<It, double()> real;
qi::rule<It, int()> integer;
qi::rule<It, ast::nature::Value()> value;
qi::rule<It, ast::nature::Attribute()> attribute;
qi::rule<It, ast::discipline::enumDomain()> domain;
struct attribute_sym_t : qi::symbols<char, ast::nature::Attribute> {
attribute_sym_t() {
this->add
("units", ast::nature::Attribute::units)
("access", ast::nature::Attribute::access)
("idt_nature", ast::nature::Attribute::idt)
("ddt_nature", ast::nature::Attribute::ddt)
("abstol", ast::nature::Attribute::abstol);
}
} attribute_sym;
struct domain_sym_t : qi::symbols<char, ast::discipline::enumDomain> {
domain_sym_t() {
this->add
("discrete", ast::discipline::eDiscrete)
("continuous", ast::discipline::eContinuous);
}
} domain_sym;
tokens() {
using namespace qi;
auto kw = qr::distinct(copy(char_("a-zA-Z0-9_")));
string = '"' >> *("\\" >> char_ | ~char_('"')) >> '"';
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
real = double_;
integer = int_;
attribute = kw[attribute_sym];
domain = kw[domain_sym];
value = string | identifier | real | integer;
BOOST_SPIRIT_DEBUG_NODES((string)(identifier)(real)(integer)(value)(domain)(attribute))
}
};
struct skipParser : qi::grammar<It> {
skipParser() : skipParser::base_type(spaceOrComment) {
using namespace qi;
spaceOrComment = space
| ("//" >> *(char_ - eol) >> (eoi|eol))
| ("/*" >> *(char_ - "*/") >> "*/");
BOOST_SPIRIT_DEBUG_NODES((spaceOrComment))
}
private:
qi::rule<It> spaceOrComment;
};
struct natureParser : tokens, qi::grammar<It, ast::nature(), skipParser> {
natureParser() : natureParser::base_type(nature) {
using namespace qi;
auto kw = qr::distinct(copy(char_("a-zA-Z0-9_")));
property = attribute >> '=' >> value >> ';';
nature
= kw["nature"] >> identifier >> -(':' >> identifier) >> ';'
>> *property
>> kw["endnature"];
BOOST_SPIRIT_DEBUG_NODES((nature)(property))
}
private:
using Attribute = std::pair<ast::nature::Attribute, ast::nature::Value>;
qi::rule<It, ast::nature(), skipParser> nature;
qi::rule<It, Attribute(), skipParser> property;
using tokens::attribute;
using tokens::value;
using tokens::identifier;
};
struct disciplineParser : tokens, qi::grammar<It, ast::discipline(), skipParser> {
disciplineParser() : disciplineParser::base_type(discipline) {
auto kw = qr::distinct(qi::copy(qi::char_("a-zA-Z0-9_")));
discipline = kw["discipline"] >> identifier >> ';'
>> properties
>> kw["enddiscipline"]
;
properties
= kw["domain"] >> domain >> ';'
^ kw["flow"] >> identifier >> ';'
^ kw["potential"] >> identifier >> ';'
;
BOOST_SPIRIT_DEBUG_NODES((discipline)(properties))
}
private:
qi::rule<It, ast::discipline(), skipParser> discipline;
qi::rule<It, ast::discipline::properties_t(), skipParser> properties;
using tokens::domain;
using tokens::identifier;
};
struct moduleParser : tokens, qi::grammar<It, ast::module(), skipParser> {
moduleParser() : moduleParser::base_type(module) {
auto kw = qr::distinct(qi::copy(qi::char_("a-zA-Z0-9_")));
m_sPort = identifier;
m_sPortList = m_sPort % ',';
m_sModulePortList = '(' >> m_sPortList >> ')';
m_sModule = kw["module"];
m_sType = kw["real"] | kw["integer"] | kw["string"];
m_sParameter = kw["parameter"] >> m_sType >> identifier;
m_sModuleItem = m_sParameter;
m_sModuleItemList = *m_sModuleItem;
module =
(m_sModule >> identifier >> m_sModulePortList >> m_sModuleItemList >> kw["endmodule"]);
}
private:
qi::rule<It, ast::module(), skipParser> module;
qi::rule<It, skipParser> m_sModulePortList;
qi::rule<It, skipParser> m_sPortList;
qi::rule<It, skipParser> m_sPort;
qi::rule<It, skipParser> m_sModule;
qi::rule<It, skipParser> m_sModuleItemList;
qi::rule<It, skipParser> m_sParameter;
qi::rule<It, skipParser> m_sModuleItem;
qi::rule<It, skipParser> m_sType;
using tokens::identifier;
};
struct fileParser : qi::grammar<It, ast::file()> {
fileParser() : fileParser::base_type(file) {
file = qi::skip(qi::copy(m_sSkip)) [
*(m_sDiscipline | m_sNature | m_sModule)
];
BOOST_SPIRIT_DEBUG_NODES((file))
}
private:
disciplineParser m_sDiscipline;
natureParser m_sNature;
moduleParser m_sModule;
skipParser m_sSkip;
qi::rule<It, ast::file()> file;
};
};
}
extern std::string const sInput;
// just for demo
#include <boost/optional/optional_io.hpp>
namespace frank { namespace ast {
//static inline std::ostream &operator<<(std::ostream &os, const nature::empty &) { return os; }
static inline std::ostream &operator<<(std::ostream &os, nature::Attribute a) {
switch(a) {
case nature::Attribute::units: return os << "units";
case nature::Attribute::access: return os << "access";
case nature::Attribute::idt: return os << "idt";
case nature::Attribute::ddt: return os << "ddt";
case nature::Attribute::abstol: return os << "abstol";
};
return os << "?";
}
static inline std::ostream &operator<<(std::ostream &os, discipline::enumDomain d) {
switch(d) {
case discipline::eDiscrete: return os << "discrete";
case discipline::eContinuous: return os << "continuous";
case discipline::eUnspecified: return os << "(unspecified)";
};
return os << "?";
}
} }
struct print_em {
using result_type = void;
template <typename V>
void operator()(V const& variant) const {
boost::apply_visitor(*this, variant);
}
void operator()(frank::ast::nature const& nature) const {
std::cout << "-- Nature\n";
std::cout << "name: " << nature.name << "\n";
std::cout << "inherits: " << nature.inherits << "\n";
for (auto& a : nature.attributes) {
std::cout << "attribute: " << a.first << " = " << a.second << "\n";
}
}
void operator()(frank::ast::discipline const& discipline) const {
std::cout << "-- Discipline\n";
std::cout << "name: " << discipline.name << "\n";
std::cout << "domain: " << discipline.properties.domain << "\n";
std::cout << "flow: " << discipline.properties.flow << "\n";
std::cout << "potential: " << discipline.properties.potential << "\n";
}
void operator()(frank::ast::module const&) const {
std::cout << "-- Module (TODO)\n";
}
};
int main() {
using iterator_type = std::string::const_iterator;
iterator_type iter = sInput.begin(), last = sInput.end();
frank::Parsers<iterator_type>::fileParser parser;
print_em print;
frank::ast::file file;
bool ok = parse(iter, last, parser, file);
if (ok) {
for (auto& symbol : file)
print(symbol);
}
else {
std::cout << "Parse failed\n";
}
if (iter != last) {
std::cout << "Remaining unparsed: '" << std::string(iter,last) << "'\n";
}
}
std::string const sInput = R"(
nature Current;
units = "A";
access = I;
idt_nature = Charge;
abstol = 1e-12;
endnature
// Charge in coulombs
nature Charge;
units = "coul";
access = Q;
ddt_nature = Current;
abstol = 1e-14;
endnature
// Potential in volts
nature Voltage;
units = "V";
access = V;
idt_nature = Flux;
abstol = 1e-6;
endnature
discipline electrical;
potential Voltage;
flow Current;
enddiscipline
)";
¹ incidentally, the other answer there demonstrates the "impedance mismatch" with polymorphic attributes and Spirit - this time on the Karma side of it
² (to prevent subtle bugs that depend on evaluation order or things like that, e.g.)
³ (gleaning some from here but not importing too much complexity that wasn't reflected in your Lex approach)
⁴ (In fact, this is where you'd need state-switching inside the grammar, an area notoriously underdeveloped and practically unusable in Spirit Lex: e.g. when it works how to avoid defining token which matchs everything in boost::spirit::lex or when it goes badly: Boost.Spirit SQL grammar/lexer failure)

One solution would be to use a std::string everywhere and define a boost::variant with everything needed but not use it anywhere in the parser or lexer directly but only serialize & deserialize it into/from the string.
Is this what the originators of boost::spirit intended?

Using lexer token attributes in grammar rules with Lex and Qi from Boost.Spirit

Let's consider following code:
#include <boost/phoenix.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;
struct operation
{
enum type
{
add,
sub,
mul,
div
};
};
template<typename Lexer>
class expression_lexer
: public lex::lexer<Lexer>
{
public:
typedef lex::token_def<operation::type> operator_token_type;
typedef lex::token_def<double> value_token_type;
typedef lex::token_def<std::string> variable_token_type;
typedef lex::token_def<lex::omit> parenthesis_token_type;
typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
typedef lex::token_def<lex::omit> whitespace_token_type;
expression_lexer()
: operator_add('+'),
operator_sub('-'),
operator_mul("[x*]"),
operator_div("[:/]"),
value("\\d+(\\.\\d+)?"),
variable("%(\\w+)"),
parenthesis({
std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
}),
whitespace("[ \\t]+")
{
this->self
+= operator_add [lex::_val = operation::add]
| operator_sub [lex::_val = operation::sub]
| operator_mul [lex::_val = operation::mul]
| operator_div [lex::_val = operation::div]
| value
| variable [lex::_val = phoenix::construct<std::string>(lex::_start + 1, lex::_end)]
| whitespace [lex::_pass = lex::pass_flags::pass_ignore]
;
std::for_each(parenthesis.cbegin(), parenthesis.cend(),
[&](parenthesis_token_pair_type const& token_pair)
{
this->self += token_pair.first | token_pair.second;
}
);
}
operator_token_type operator_add;
operator_token_type operator_sub;
operator_token_type operator_mul;
operator_token_type operator_div;
value_token_type value;
variable_token_type variable;
std::vector<parenthesis_token_pair_type> parenthesis;
whitespace_token_type whitespace;
};
template<typename Iterator>
class expression_grammar
: public qi::grammar<Iterator>
{
public:
template<typename Tokens>
explicit expression_grammar(Tokens const& tokens)
: expression_grammar::base_type(start)
{
start %= expression >> qi::eoi;
expression %= sum_operand >> -(sum_operator >> expression);
sum_operator %= tokens.operator_add | tokens.operator_sub;
sum_operand %= fac_operand >> -(fac_operator >> sum_operand);
fac_operator %= tokens.operator_mul | tokens.operator_div;
if(!tokens.parenthesis.empty())
fac_operand %= parenthesised | terminal;
else
fac_operand %= terminal;
terminal %= tokens.value | tokens.variable;
if(!tokens.parenthesis.empty())
{
parenthesised %= tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
[&](typename Tokens::parenthesis_token_pair_type const& token_pair)
{
parenthesised %= parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
}
);
}
}
private:
qi::rule<Iterator> start;
qi::rule<Iterator> expression;
qi::rule<Iterator> sum_operand;
qi::rule<Iterator> sum_operator;
qi::rule<Iterator> fac_operand;
qi::rule<Iterator> fac_operator;
qi::rule<Iterator> terminal;
qi::rule<Iterator> parenthesised;
};
int main()
{
typedef lex::lexertl::token<std::string::const_iterator, boost::mpl::vector<operation::type, double, std::string>> token_type;
typedef expression_lexer<lex::lexertl::actor_lexer<token_type>> expression_lexer_type;
typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
typedef expression_grammar<expression_lexer_iterator_type> expression_grammar_type;
expression_lexer_type lexer;
expression_grammar_type grammar(lexer);
while(std::cin)
{
std::string line;
std::getline(std::cin, line);
std::string::const_iterator first = line.begin();
std::string::const_iterator const last = line.end();
bool const result = lex::tokenize_and_parse(first, last, lexer, grammar);
if(!result)
std::cout << "Parsing failed! Reminder: >" << std::string(first, last) << "<" << std::endl;
else
{
if(first != last)
std::cout << "Parsing succeeded! Reminder: >" << std::string(first, last) << "<" << std::endl;
else
std::cout << "Parsing succeeded!" << std::endl;
}
}
}
It is a simple parser for arithmetic expressions with values and variables. It is build using expression_lexer for extracting tokens, and then with expression_grammar to parse the tokens.
Use of lexer for such a small case might seem an overkill and probably is one. But that is the cost of simplified example. Also note that use of lexer allows to easily define tokens with regular expression while that allows to easily define them by external code (and user provided configuration in particular). With the example provided it would be no issue at all to read definition of tokens from an external config file and for example allow user to change variables from %name to $name.
The code seems to be working fine (checked on Visual Studio 2013 with Boost 1.61).
The expression_lexer has attributes attached to tokens. I guess they work since they compile. But I don't really know how to check.
Ultimately I would like the grammar to build me an std::vector with reversed polish notation of the expression. (Where every element would be a boost::variant over either operator::type or double or std::string.)
The problem is however that I failed to use token attributes in my expression_grammar. For example if you try to change sum_operator following way:
qi::rule<Iterator, operation::type ()> sum_operator;
you will get compilation error. I expected this to work since operation::type is the attribute for both operator_add and operator_sub and so also for their alternative. And still it doesn't compile. Judging from the error in assign_to_attribute_from_iterators it seems that parser tries to build the attribute value directly from input stream range. Which means it ignores the [lex::_val = operation::add] I specified in my lexer.
Changing that to
qi::rule<Iterator, operation::type (operation::type)> sum_operator;
didn't help either.
Also I tried changing definition to
sum_operator %= (tokens.operator_add | tokens.operator_sub) [qi::_val = qi::_1];
didn't help either.
How to work around that? I know I could use symbols from Qi. But I want to have the lexer to make it easy to configure regexes for the tokens. I could also extend the assign_to_attribute_from_iterators as described in the documentation but this kind of double the work. I guess I could also skip the attributes on lexer and just have them on grammar. But this again doesn't work well with flexibility on variable token (in my actual case there is slightly more logic there so that it is configurable also which part of the token forms actual name of the variable - while here it is fixed to just skip the first character). Anything else?
Also a side question - maybe anyone knows. Is there a way to get to capture groups of the regular expression of the token from tokens action? So that instead of having
variable [lex::_val = phoenix::construct<std::string>(lex::_start + 1, lex::_end)]
instead I would be able to make a string from the capture group and so easily handle formats like $var$.
Edited! I have improved whitespace skipping along conclusions from Whitespace skipper when using Boost.Spirit Qi and Lex. It is a simplification that does not affect questions asked here.

Ok, here's my take on the RPN 'requirement'. I heavily favor natural (automatic) attribute propagation over semantic actions (see Boost Spirit: "Semantic actions are evil"?)
I consider the other options (uglifying) optimizations. You might do them if you're happy with the overall design and don't mind making it harder to maintain :)
Live On Coliru
Beyond the sample from my comment that you've already studied, I added that RPN transformation step:
namespace RPN {
using cell = boost::variant<AST::operation, AST::value, AST::variable>;
using rpn_stack = std::vector<cell>;
struct transform : boost::static_visitor<> {
void operator()(rpn_stack& stack, AST::expression const& e) const {
boost::apply_visitor(boost::bind(*this, boost::ref(stack), ::_1), e);
}
void operator()(rpn_stack& stack, AST::bin_expr const& e) const {
(*this)(stack, e.lhs);
(*this)(stack, e.rhs);
stack.push_back(e.op);
}
void operator()(rpn_stack& stack, AST::value const& v) const { stack.push_back(v); }
void operator()(rpn_stack& stack, AST::variable const& v) const { stack.push_back(v); }
};
}
That's all! Use it like so, e.g.:
RPN::transform compiler;
RPN::rpn_stack program;
compiler(program, expr);
for (auto& instr : program) {
std::cout << instr << " ";
}
Which makes the output:
Parsing success: (3 + (8 * 9))
3 8 9 * +
Full Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/phoenix.hpp>
#include <boost/bind.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;
struct operation
{
enum type
{
add,
sub,
mul,
div
};
friend std::ostream& operator<<(std::ostream& os, type op) {
switch (op) {
case type::add: return os << "+";
case type::sub: return os << "-";
case type::mul: return os << "*";
case type::div: return os << "/";
}
return os << "<" << static_cast<int>(op) << ">";
}
};
template<typename Lexer>
class expression_lexer
: public lex::lexer<Lexer>
{
public:
//typedef lex::token_def<operation::type> operator_token_type;
typedef lex::token_def<lex::omit> operator_token_type;
typedef lex::token_def<double> value_token_type;
typedef lex::token_def<std::string> variable_token_type;
typedef lex::token_def<lex::omit> parenthesis_token_type;
typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
typedef lex::token_def<lex::omit> whitespace_token_type;
expression_lexer()
: operator_add('+'),
operator_sub('-'),
operator_mul("[x*]"),
operator_div("[:/]"),
value("\\d+(\\.\\d+)?"),
variable("%(\\w+)"),
parenthesis({
std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
}),
whitespace("[ \\t]+")
{
this->self
+= operator_add [lex::_val = operation::add]
| operator_sub [lex::_val = operation::sub]
| operator_mul [lex::_val = operation::mul]
| operator_div [lex::_val = operation::div]
| value
| variable [lex::_val = phoenix::construct<std::string>(lex::_start + 1, lex::_end)]
| whitespace [lex::_pass = lex::pass_flags::pass_ignore]
;
std::for_each(parenthesis.cbegin(), parenthesis.cend(),
[&](parenthesis_token_pair_type const& token_pair)
{
this->self += token_pair.first | token_pair.second;
}
);
}
operator_token_type operator_add;
operator_token_type operator_sub;
operator_token_type operator_mul;
operator_token_type operator_div;
value_token_type value;
variable_token_type variable;
std::vector<parenthesis_token_pair_type> parenthesis;
whitespace_token_type whitespace;
};
namespace AST {
using operation = operation::type;
using value = double;
using variable = std::string;
struct bin_expr;
using expression = boost::variant<value, variable, boost::recursive_wrapper<bin_expr> >;
struct bin_expr {
expression lhs, rhs;
operation op;
friend std::ostream& operator<<(std::ostream& os, bin_expr const& be) {
return os << "(" << be.lhs << " " << be.op << " " << be.rhs << ")";
}
};
}
BOOST_FUSION_ADAPT_STRUCT(AST::bin_expr, lhs, op, rhs)
template<typename Iterator>
class expression_grammar : public qi::grammar<Iterator, AST::expression()>
{
public:
template<typename Tokens>
explicit expression_grammar(Tokens const& tokens)
: expression_grammar::base_type(start)
{
start = expression >> qi::eoi;
bin_sum_expr = sum_operand >> sum_operator >> expression;
bin_fac_expr = fac_operand >> fac_operator >> sum_operand;
expression = bin_sum_expr | sum_operand;
sum_operand = bin_fac_expr | fac_operand;
sum_operator = tokens.operator_add >> qi::attr(AST::operation::add) | tokens.operator_sub >> qi::attr(AST::operation::sub);
fac_operator = tokens.operator_mul >> qi::attr(AST::operation::mul) | tokens.operator_div >> qi::attr(AST::operation::div);
if(tokens.parenthesis.empty()) {
fac_operand = terminal;
}
else {
fac_operand = parenthesised | terminal;
parenthesised = tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
[&](typename Tokens::parenthesis_token_pair_type const& token_pair)
{
parenthesised = parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
});
}
terminal = tokens.value | tokens.variable;
BOOST_SPIRIT_DEBUG_NODES(
(start) (expression) (bin_sum_expr) (bin_fac_expr)
(fac_operand) (terminal) (parenthesised) (sum_operand)
(sum_operator) (fac_operator)
);
}
private:
qi::rule<Iterator, AST::expression()> start;
qi::rule<Iterator, AST::expression()> expression;
qi::rule<Iterator, AST::expression()> sum_operand;
qi::rule<Iterator, AST::expression()> fac_operand;
qi::rule<Iterator, AST::expression()> terminal;
qi::rule<Iterator, AST::expression()> parenthesised;
qi::rule<Iterator, int()> sum_operator;
qi::rule<Iterator, int()> fac_operator;
// extra rules to help with AST creation
qi::rule<Iterator, AST::bin_expr()> bin_sum_expr;
qi::rule<Iterator, AST::bin_expr()> bin_fac_expr;
};
namespace RPN {
using cell = boost::variant<AST::operation, AST::value, AST::variable>;
using rpn_stack = std::vector<cell>;
struct transform : boost::static_visitor<> {
void operator()(rpn_stack& stack, AST::expression const& e) const {
boost::apply_visitor(boost::bind(*this, boost::ref(stack), ::_1), e);
}
void operator()(rpn_stack& stack, AST::bin_expr const& e) const {
(*this)(stack, e.lhs);
(*this)(stack, e.rhs);
stack.push_back(e.op);
}
void operator()(rpn_stack& stack, AST::value const& v) const { stack.push_back(v); }
void operator()(rpn_stack& stack, AST::variable const& v) const { stack.push_back(v); }
};
}
int main()
{
typedef lex::lexertl::token<std::string::const_iterator, boost::mpl::vector<operation::type, double, std::string>> token_type;
typedef expression_lexer<lex::lexertl::actor_lexer<token_type>> expression_lexer_type;
typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
typedef expression_grammar<expression_lexer_iterator_type> expression_grammar_type;
expression_lexer_type lexer;
expression_grammar_type grammar(lexer);
RPN::transform compiler;
std::string line;
while(std::getline(std::cin, line) && !line.empty())
{
std::string::const_iterator first = line.begin();
std::string::const_iterator const last = line.end();
AST::expression expr;
bool const result = lex::tokenize_and_parse(first, last, lexer, grammar, expr);
if(!result)
std::cout << "Parsing failed!\n";
else
{
std::cout << "Parsing success: " << expr << "\n";
RPN::rpn_stack program;
compiler(program, expr);
for (auto& instr : program) {
std::cout << instr << " ";
}
}
if(first != last)
std::cout << "Remainder: >" << std::string(first, last) << "<\n";
}
}

Why can I not access the value in a semantic action?

I'm trying to write a parser to create an AST using boost::spirit. As a first step I'm trying to wrap numerical values in an AST node. This is the code I'm using:
AST_NodePtr make_AST_NodePtr(const int& i) {
return std::make_shared<AST_Node>(i);
}
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace l = qi::labels;
template<typename Iterator>
struct test_grammar : qi::grammar<Iterator, AST_NodePtr(), ascii::space_type> {
test_grammar() : test_grammar::base_type(test) {
test = qi::int_ [qi::_val = make_AST_NodePtr(qi::_1)];
}
qi::rule<Iterator, AST_NodePtr(), ascii::space_type> test;
};
As far as I understood it from the documentation q::_1 should contain the value parsed by qi::int_, but the above code always gives me an error along the lines
invalid initialization of reference of type ‘const int&’ from expression of type ‘const _1_type {aka const boost::phoenix::actor<boost::spirit::argument<0> >}
Why does this not work even though qi::_1 is supposed to hold the parsed valued? How else would I parse the input into a custom AST?

You're using a regular function inside the semantic action.
This means that in the contructor the compiler will try to invoke that make_AST_NodePtr function with the argument supplied: qi::_1.
Q. Why does this not work even though qi::_1 is supposed to hold the parsed valued?
A. qi::_1 does not hold the parsed value. It represents (is-a-placeholder-for) the first unbound argument in the function call
This can /obviously/ never work. The function expects an integer.
So what gives?
You need to make a "lazy" or "deferred" function for use in the semantic action. Using only pre-supplied Boost Phoenix functors, you could spell it out:
test = qi::int_ [ qi::_val = px::construct<AST_NodePtr>(px::new_<AST_Node>(qi::_1)) ];
You don't need the helper function this way. But the result is both ugly and suboptimal. So, let's do better!
Using a Phoenix Function wrapper
struct make_shared_f {
std::shared_ptr<AST_Node> operator()(int v) const {
return std::make_shared<AST_Node>(v);
}
};
px::function<make_shared_f> make_shared_;
With this defined, you can simplify the semantic action to:
test = qi::int_ [ qi::_val = make_shared_(qi::_1) ];
Actually, if you make it generic you can reuse it for many types:
template <typename T>
struct make_shared_f {
template <typename... Args>
std::shared_ptr<T> operator()(Args&&... args) const {
return std::make_shared<T>(std::forward<Args>(args)...);
}
};
px::function<make_shared_f<AST_Node> > make_shared_;
DEMO
Here's a self-contained example showing some style fixes in the process:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <memory>
struct AST_Node {
AST_Node(int v) : _value(v) {}
int value() const { return _value; }
private:
int _value;
};
using AST_NodePtr = std::shared_ptr<AST_Node>;
AST_NodePtr make_AST_NodePtr(const int& i) {
return std::make_shared<AST_Node>(i);
}
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template<typename Iterator>
struct test_grammar : qi::grammar<Iterator, AST_NodePtr()> {
test_grammar() : test_grammar::base_type(start) {
using boost::spirit::ascii::space;
start = qi::skip(space) [ test ];
test = qi::int_ [ qi::_val = make_shared_(qi::_1) ];
}
private:
struct make_shared_f {
std::shared_ptr<AST_Node> operator()(int v) const {
return std::make_shared<AST_Node>(v);
}
};
px::function<make_shared_f> make_shared_;
//
qi::rule<Iterator, AST_NodePtr()> start;
qi::rule<Iterator, AST_NodePtr(), boost::spirit::ascii::space_type> test;
};
int main() {
AST_NodePtr parsed;
std::string const input ("42");
auto f = input.begin(), l = input.end();
test_grammar<std::string::const_iterator> g;
bool ok = qi::parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << (parsed? std::to_string(parsed->value()) : "nullptr") << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l)
{
std::cout << "Remaining input: '" << std::string(f, l) << "'\n";
}
}
Prints
Parsed: 42
BONUS: Alternative using BOOST_PHOENIX_ADAPT_FUNCTION
You can actually use your free function if you wish, and use it as follows:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <memory>
struct AST_Node {
AST_Node(int v) : _value(v) {}
int value() const { return _value; }
private:
int _value;
};
using AST_NodePtr = std::shared_ptr<AST_Node>;
AST_NodePtr make_AST_NodePtr(int i) {
return std::make_shared<AST_Node>(i);
}
BOOST_PHOENIX_ADAPT_FUNCTION(AST_NodePtr, make_AST_NodePtr_, make_AST_NodePtr, 1)
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template<typename Iterator>
struct test_grammar : qi::grammar<Iterator, AST_NodePtr()> {
test_grammar() : test_grammar::base_type(start) {
using boost::spirit::ascii::space;
start = qi::skip(space) [ test ] ;
test = qi::int_ [ qi::_val = make_AST_NodePtr_(qi::_1) ] ;
}
private:
qi::rule<Iterator, AST_NodePtr()> start;
qi::rule<Iterator, AST_NodePtr(), boost::spirit::ascii::space_type> test;
};
int main() {
AST_NodePtr parsed;
std::string const input ("42");
auto f = input.begin(), l = input.end();
test_grammar<std::string::const_iterator> g;
bool ok = qi::parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << (parsed? std::to_string(parsed->value()) : "nullptr") << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l)
{
std::cout << "Remaining input: '" << std::string(f, l) << "'\n";
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

parsing chemical formula with mixtures of elements - c++

Related

How to use a pointer as token attribute in Boost::Spirit::Lex?

Make boost::spirit::symbol parser non greedy

cannot get boost::spirit parser&lexer working for token types other than std::string or int or double

Using lexer token attributes in grammar rules with Lex and Qi from Boost.Spirit

Why can I not access the value in a semantic action?

Categories

Resources