Boost Spirit Qi: Skipper parser does not skip under certain conditions - c++

I am currently implementing a parser which succeeds on the "strongest" match for spirit::qi. There are meaningful applications for such a thing. E.g matching references to either simple refs (eg "willy") or namespace qualified refs (eg. "willy::anton"). That's not my actual real world case but it is almost self-explanatory, I guess. At least it helped me to track down the issue.
I found a solution for that. It works perfectly, when the skipper parser is not involved (i.e. there is nothing to skip). It does not work as expected if there are areas which need skipping.
I believe, I tracked down the problem. It seems like under certain conditions spaces are actually not skipped allthough they should be.
Below is find a self-contained very working example. It loops over some rules and some input to provide enough information. If you run it with BOOST_SPIRIT_DEBUG enabled, you get in particular the output:
<qualifier>
<try> :: anton</try>
<fail/>
</qualifier>
I think, this one should not have failed. Am I right guessing so? Does anyone know a way to get around that? Or is it just my poor understanding of qi semantics? Thank you very much for your time. :)
My environment: MSVC 2015 latest, target win32 console
#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include<map>
#include <boost/spirit/include/qi.hpp>
typedef std::string::const_iterator iterator_type;
namespace qi = boost::spirit::qi;
using map_type = std::map<std::string, qi::rule<iterator_type, std::string()>&>;
namespace maxence { namespace parser {
template <typename Iterator>
struct ident : qi::grammar<Iterator, std::string()>
{
ident();
qi::rule<Iterator, std::string()>
id, id_raw;
qi::rule<Iterator, std::string()>
not_used,
qualifier,
qualified_id, simple_id,
id_reference, id_reference_final;
map_type rules = {
{ "E1", id },
{ "E2", id_raw}
};
};
template <typename Iterator>
// we actually don't need the start rule (see below)
ident<Iterator>::ident() : ident::base_type(not_used)
{
id_reference = (!simple_id >> qualified_id) | (!qualified_id >> simple_id);
id_reference_final = id_reference;
///////////////////////////////////////////////////
// standard simple id (not followed by
// delimiter "::")
simple_id = (qi::alpha | '_') >> *(qi::alnum | '_') >> !qi::lit("::");
///////////////////////////////////////////////////
// this is qualifier <- "::" simple_id
// I repeat the simple_id pattern here to make sure
// this demo has no "early match" issues
qualifier = qi::string("::") > (qi::alpha | '_') >> *(qi::alnum | '_');
///////////////////////////////////////////////////
// this is: qualified_id <- simple_id qualifier*
qualified_id = (qi::alpha | '_') >> *(qi::alnum | '_') >> +(qualifier) >> !qi::lit("::");
id = id_reference_final;
id_raw = qi::raw[id_reference_final];
BOOST_SPIRIT_DEBUG_NODES(
(id)
(id_raw)
(qualifier)
(qualified_id)
(simple_id)
(id_reference)
(id_reference_final)
)
}
}}
int main()
{
maxence::parser::ident<iterator_type> ident;
using ss_map_type = std::map<std::string, std::string>;
ss_map_type parser_input =
{
{ "Simple id (behaves ok)", "willy" },
{ "Qualified id (behaves ok)", "willy::anton" },
{ "Skipper involved (unexpected)", "willy :: anton" }
};
for (ss_map_type::const_iterator input = parser_input.begin(); input != parser_input.end(); input++) {
for (map_type::const_iterator example = ident.rules.begin(); example != ident.rules.end(); example++) {
std::string to_parse = input->second;
std::string result;
std::string parser_name = (example->second).name();
std::cout << "--------------------------------------------" << std::endl;
std::cout << "Description: " << input->first << std::endl;
std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
auto b(to_parse.begin()), e(to_parse.end());
// --- test for parser success
bool success = qi::phrase_parse(b, e, (example)->second, qi::space, result);
if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
else std::cout << " Parser failed. " << std::endl;
//--- test for EOI
if (b == e) {
std::cout << "EOI reached.";
if (success) std::cout << " The sun is shining brightly. :)";
} else {
std::cout << "Failure: EOI not reached. Remaining: [";
while (b != e) std::cout << *b++; std::cout << "]";
}
std::cout << std::endl << "--------------------------------------------" << std::endl;
}
}
return 0;
}

Related

How to tokenize C++ using Boost Regex

I'm currently working on a tokenizer for a class using boost regex. I'm not too familiar with boost, so I may be way off base on what I have so far but anyway, here is what I'm using:
regex re("[\\s*,()=;<>\+-]{1,2}");
sregex_token_iterator i(text.begin(), text.end(), re, -1);
sregex_token_iterator j;
sregex_token_iterator begin(text.begin(), text.end(), re), end;
unsigned count = 0;
while(i != j)
{
if(*i != ' ' && *i != '\n')
{
count++;
cout << "From i - " << count << " " << *i << endl;
}
i++;
if(*begin != ' ' && *begin != '\n')
{
count++;
cout << "Form j - " << count << " " << *begin << endl;
}
begin++;
}
cout << "There were " << count << " tokens found." << endl;
So, basically, I'm using the spaces and the symbols as delimiters, but I'm still outputting both (since I still want the symbols to be tokens). Like I said, I'm not extremely familiar with boost, so I'm not positive if I'm taking the right approach.
My end goal is to split a file that has a simple c++ block of code and tokenize it, here's the example file I am using:
#define MAX 5
int main(int argc)
{
for(int i = 0; i < MAX; i ++)
{
cout << "i is equal to " << i << endl;
}
return 0;
}
I'm having trouble with the fact that it is counting next lines and blank spaces as tokens, and I need them to be thrown away really. Also, I'm having a hard time with the "++" token, I can't seem to figure out the right expression for it to count "++".
Any help would be greatly appreciated!
Thanks!
Tim
First off,
Boost has Boost Wave which has (several, I think) ready-made tokenizers for C++ source
Boost has Spirit Lex which is a lexer that can tokenize based on regex patterns and some state support. It allows both dynamic lexer tables and statically generated lexer tables
In case you're interested in using Lex I ran a quick & dirty finger exercise for myself: it tokenizes itself Live On Coliru.
Notes:
A Lex tokenizer plays nicely with Boost Spirit Qi for parsing (though in all honesty, I prefer doing Spirit grammars directly on the source iterators).
It exposes an iterator interface, allthough my example leverages the callback interface to display the tokens:
int main()
{
typedef boost::spirit::istream_iterator It;
typedef lex::lexertl::token<It, boost::mpl::vector<int, double>, boost::mpl::true_ > token_type;
tokens<lex::lexertl::actor_lexer<token_type> > lexer;
std::ifstream ifs("main.cpp");
ifs >> std::noskipws;
It first(ifs), last;
bool ok = lex::tokenize(first, last, lexer, process_token());
std::cout << "\nTokenization " << (ok?"succeeded":"failed") << "; remaining input: '" << std::string(first,last) << "'\n";
}
Which is tokenized in the output as (trimming the preceding output):
[int][main][(][)][{][typedef][boost][::][spirit][::][istream_iterator][It][;][typedef][lex][::][lexertl][::][token][<][It][,][boost][::][mpl][::][vector][<][int][,][double][>][,][boost][::][mpl][::][true_][>][token_type][;][tokens][<][lex][::][lexertl][::][actor_lexer][<][token_type][>][>][lexer][;][std][::][ifstream][ifs][(]["main.cpp"][)][;][ifs][>>][std][::][noskipws][;][It][first][(][ifs][)][,][last][;][bool][ok][=][lex][::][tokenize][(][first][,][last][,][lexer][,][process_token][(][)][)][;][std][::][cout][<<]["\nTokenization "][<<][(][ok][?]["succeeded"][:]["failed"][)][<<]["; remaining input: '"][<<][std][::][string][(][first][,][last][)][<<]["'\n"][;][}]
Tokenization succeeded; remaining input: ''
You should actually want a different lexer state for parsing the preprocessor directives (line-ends become meaningful and several other expressions/keywords are valid). In real life, there's often a separate preprocessor step doing its own lexing here. (The fallout of this can be seen when lexing the include file specifications, e.g.)
ordering of tokens in the lexer is critical for the result
in this sample, you'd always match the & token as a binop_. You'd
probably want to match a ampersand_ token and decide at parse time
whether it's a binary operator (bitwise-and), unary operator (adress-of), reference type-qualifier etc. C++ is really interesting to parse :|
Comments are supported!
digraphs/trigraphs are not supported :)
pragmas, line/file directives etc. are unsupported
All in all, this should be pretty usable if you wanted to make, say, a simple syntax highlighter or formatter. Anything beyond that should require some more parsing/semantic analysis.
Full Listing:
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <fstream>
#include <sstream>
#include <boost/lexical_cast.hpp>
namespace lex = boost::spirit::lex;
template <typename Lexer>
struct tokens : lex::lexer<Lexer>
{
tokens()
{
pound_ = "#";
define_ = "define";
if_ = "if";
else_ = "else";
endif_ = "endif";
ifdef_ = "ifdef";
ifndef_ = "ifndef";
defined_ = "defined";
keyword_ = "for|break|continue|while|do|switch|case|default|if|else|return|goto|throw|catch"
"static|volatile|auto|void|int|char|signed|unsigned|long|double|float|"
"delete|new|virtual|override|final|"
"typename|template|using|namespace|extern|\"C\"|"
"friend|public|private|protected|"
"class|struct|enum|"
"register|thread_local|noexcept|constexpr";
scope_ = "::";
dot_ = '.';
arrow_ = "->";
star_ = '*';
popen_ = '(';
pclose_ = ')';
bopen_ = '{';
bclose_ = '}';
iopen_ = '[';
iclose_ = ']';
colon_ = ':';
semic_ = ';';
comma_ = ',';
tern_q_ = '?';
relop_ = "==|!=|<=|>=|<|>";
assign_ = '=';
incr_ = "\\+\\+";
decr_ = "--";
binop_ = "[-+/%&|^]|>>|<<";
unop_ = "[-+~!]";
real_ = "[-+]?[0-9]+(e[-+]?[0-9]+)?f?";
int_ = "[-+]?[0-9]+";
identifier_ = "[a-zA-Z_][a-zA-Z0-9_]*";
ws_ = "[ \\t\\r\\n]";
line_comment_ = "\\/\\/.*?[\\r\\n]";
block_comment_ = "\\/\\*.*?\\*\\/";
this->self.add_pattern
("SCHAR", "\\\\(x[0-9a-fA-F][0-9a-fA-F]|[\\\\\"'0tbrn])|[^\"\\\\'\\r\\n]")
;
string_lit = "\\\"('|{SCHAR})*?\\\"";
char_lit = "'(\\\"|{SCHAR})'";
this->self +=
pound_ | define_ | if_ | else_ | endif_ | ifdef_ | ifndef_ | defined_
| keyword_ | scope_ | dot_ | arrow_ | star_ | popen_ | pclose_ | bopen_ | bclose_ | iopen_ | iclose_ | colon_ | semic_ | comma_ | tern_q_
| relop_ | assign_ | incr_ | decr_ | binop_ | unop_
| int_ | real_ | identifier_ | string_lit | char_lit
// ignore whitespace and comments
| ws_ [ lex::_pass = lex::pass_flags::pass_ignore ]
| line_comment_ [ lex::_pass = lex::pass_flags::pass_ignore ]
| block_comment_[ lex::_pass = lex::pass_flags::pass_ignore ]
;
}
private:
lex::token_def<> pound_, define_, if_, else_, endif_, ifdef_, ifndef_, defined_;
lex::token_def<> keyword_, scope_, dot_, arrow_, star_, popen_, pclose_, bopen_, bclose_, iopen_, iclose_, colon_, semic_, comma_, tern_q_;
lex::token_def<> relop_, assign_, incr_, decr_, binop_, unop_;
lex::token_def<int> int_;
lex::token_def<double> real_;
lex::token_def<> identifier_, string_lit, char_lit;
lex::token_def<lex::omit> ws_, line_comment_, block_comment_;
};
struct token_value : boost::static_visitor<std::string>
{
template <typename... T> // the token value can be a variant over any of the exposed attribute types
std::string operator()(boost::variant<T...> const& v) const {
return boost::apply_visitor(*this, v);
}
template <typename T> // the default value is a pair of iterators into the source sequence
std::string operator()(boost::iterator_range<T> const& v) const {
return { v.begin(), v.end() };
}
template <typename T>
std::string operator()(T const& v) const {
// not taken unless used in Spirit Qi rules, I guess
return std::string("attr<") + typeid(v).name() + ">(" + boost::lexical_cast<std::string>(v) + ")";
}
};
struct process_token
{
template <typename T>
bool operator()(T const& token) const {
std::cout << '[' /*<< token.id() << ":" */<< print(token.value()) << "]";
return true;
}
token_value print;
};
#if 0
std::string read(std::string fname)
{
std::ifstream ifs(fname);
std::ostringstream oss;
oss << ifs.rdbuf();
return oss.str();
}
#endif
int main()
{
typedef boost::spirit::istream_iterator It;
typedef lex::lexertl::token<It, boost::mpl::vector<int, double>, boost::mpl::true_ > token_type;
tokens<lex::lexertl::actor_lexer<token_type> > lexer;
std::ifstream ifs("main.cpp");
ifs >> std::noskipws;
It first(ifs), last;
bool ok = lex::tokenize(first, last, lexer, process_token());
std::cout << "\nTokenization " << (ok?"succeeded":"failed") << "; remaining input: '" << std::string(first,last) << "'\n";
}

boost::spirit access position iterator from semantic actions

Lets say I have code like this (line numbers for reference):
1:
2:function FuncName_1 {
3: var Var_1 = 3;
4: var Var_2 = 4;
5: ...
I want to write a grammar that parses such text, puts all indentifiers (function and variable names) infos into a tree (utree?).
Each node should preserve: line_num, column_num and symbol value. example:
root: FuncName_1 (line:2,col:10)
children[0]: Var_1 (line:3, col:8)
children[1]: Var_1 (line:4, col:9)
I want to put it into the tree because I plan to traverse through that tree and for each node I must know the 'context': (all parent nodes of current nodes).
E.g, while processing node with Var_1, I must know that this is a name for local variable for function FuncName_1 (that is currently being processed as node, but one level earlier)
I cannot figure out few things
Can this be done in Spirit with semantic actions and utree's ? Or should I use variant<> trees ?
How to pass to the node those three informations (column,line,symbol_name) at the same time ? I know I must use pos_iterator as iterator type for grammar but how to access those information in sematic action ?
I'm a newbie in Boost so I read the Spirit documentaiton over and over, I try to google my problems but I somehow cannot put all the pieces together ot find the solution. Seems like there was no one me with such use case like mine before (or I'm just not able to find it)
Looks like the only solutions with position iterator are the ones with parsing error handling, but this is not the case I'm interested in.
The code that only parses the code I was taking about is below but I dont know how to move forward with it.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
namespace qi = boost::spirit::qi;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar():ParseGrammar::base_type(SourceCode)
{
using namespace qi;
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = lexeme [alpha >> *(alnum | '_')];
VarAssignemnt = KeywordVar >> Identifier >> char_('=') >> int_ >> SemiColon;
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignemnt >> '}';
}
qi::rule<Iterator, Skipper> SourceCode;
qi::rule<Iterator > KeywordFunction;
qi::rule<Iterator, Skipper> VarAssignemnt;
qi::rule<Iterator> KeywordVar;
qi::rule<Iterator> SemiColon;
qi::rule<Iterator > Identifier;
};
int main()
{
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 = 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver; // Our parser
bool ok = phrase_parse(iter,
last,
resolver,
qi::space);
std::cout << std::boolalpha;
std::cout << "\nok : " << ok << std::endl;
std::cout << "full : " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
This has been a fun exercise, where I finally put together a working demo of on_success[1] to annotate AST nodes.
Let's assume we want an AST like:
namespace ast
{
struct LocationInfo {
unsigned line, column, length;
};
struct Identifier : LocationInfo {
std::string name;
};
struct VarAssignment : LocationInfo {
Identifier id;
int value;
};
struct SourceCode : LocationInfo {
Identifier function;
std::vector<VarAssignment> assignments;
};
}
I know, 'location information' is probably overkill for the SourceCode node, but you know... Anyways, to make it easy to assign attributes to these nodes without requiring semantic actions or lots of specifically crafted constructors:
#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier, (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode, (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))
There. Now we can declare the rules to expose these attributes:
qi::rule<Iterator, ast::SourceCode(), Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()> Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
We don't (essentially) modify the grammar, at all: attribute propagation is "just automatic"[2] :
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment = KeywordVar >> Identifier >> '=' >> int_ >> SemiColon;
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';
The magic
How do we get the source location information attached to our nodes?
auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier, set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode, set_location_info);
Now, annotate is just a lazy version of a calleable that is defined as:
template<typename It>
struct annotation_f {
typedef void result_type;
annotation_f(It first) : first(first) {}
It const first;
template<typename Val, typename First, typename Last>
void operator()(Val& v, First f, Last l) const {
do_annotate(v, f, l, first);
}
private:
void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
using std::distance;
li.line = get_line(f);
li.column = get_column(first, f);
li.length = distance(f, l);
}
static void do_annotate(...) { }
};
Due to way in which get_column works, the functor is stateful (as it remembers the start iterator)[3]. As you can see do_annotate just accepts anything that derives from LocationInfo.
Now, the proof of the pudding:
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 = 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first); // Our parser
ast::SourceCode program;
bool ok = phrase_parse(iter,
last,
resolver,
qi::space,
program);
std::cout << std::boolalpha;
std::cout << "ok : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
for (auto const& va : program.assignments)
std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
This prints:
ok : true
full: true
OK: Parsing fully succeeded
Function name: FuncName_1 (see L1:1:56)
variable Var_1 assigned value 3 at L2:3:14
variable Var_2 assigned value 4 at L3:3:15
Full Demo Program
See it Live On Coliru
Also showing:
error handling, e.g.:
Error: expecting "=" in line 3:
var Var_2 - 4; }
^---- here
ok : false
full: false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 1:1
remaining: 'function FuncName_1 {
var Var_1 = 3;
var Var_2 - 4; }'
-------------------------
BOOST_SPIRIT_DEBUG macros
A bit of a hacky way to conveniently stream the LocationInfo part of any AST node, sorry :)
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
namespace ast
{
namespace manip { struct LocationInfoPrinter; }
struct LocationInfo {
unsigned line, column, length;
manip::LocationInfoPrinter printLoc() const;
};
struct Identifier : LocationInfo {
std::string name;
};
struct VarAssignment : LocationInfo {
Identifier id;
int value;
};
struct SourceCode : LocationInfo {
Identifier function;
std::vector<VarAssignment> assignments;
};
///////////////////////////////////////////////////////////////////////////
// Completely unnecessary tweak to get a "poor man's" io manipulator going
// so we can do `std::cout << x.printLoc()` on types of `x` deriving from
// LocationInfo
namespace manip {
struct LocationInfoPrinter {
LocationInfoPrinter(LocationInfo const& ref) : ref(ref) {}
LocationInfo const& ref;
friend std::ostream& operator<<(std::ostream& os, LocationInfoPrinter const& lip) {
return os << lip.ref.line << ':' << lip.ref.column << ':' << lip.ref.length;
}
};
}
manip::LocationInfoPrinter LocationInfo::printLoc() const { return { *this }; }
// feel free to disregard this hack
///////////////////////////////////////////////////////////////////////////
}
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier, (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode, (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))
struct error_handler_f {
typedef qi::error_handler_result result_type;
template<typename T1, typename T2, typename T3, typename T4>
qi::error_handler_result operator()(T1 b, T2 e, T3 where, T4 const& what) const {
std::cerr << "Error: expecting " << what << " in line " << get_line(where) << ": \n"
<< std::string(b,e) << "\n"
<< std::setw(std::distance(b, where)) << '^' << "---- here\n";
return qi::fail;
}
};
template<typename It>
struct annotation_f {
typedef void result_type;
annotation_f(It first) : first(first) {}
It const first;
template<typename Val, typename First, typename Last>
void operator()(Val& v, First f, Last l) const {
do_annotate(v, f, l, first);
}
private:
void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
using std::distance;
li.line = get_line(f);
li.column = get_column(first, f);
li.length = distance(f, l);
}
static void do_annotate(...) {}
};
template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, ast::SourceCode(), Skipper>
{
ParseGrammar(Iterator first) :
ParseGrammar::base_type(SourceCode),
annotate(first)
{
using namespace qi;
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment = KeywordVar > Identifier > '=' > int_ > SemiColon; // note: expectation points
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';
on_error<fail>(VarAssignment, handler(_1, _2, _3, _4));
on_error<fail>(SourceCode, handler(_1, _2, _3, _4));
auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier, set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode, set_location_info);
BOOST_SPIRIT_DEBUG_NODES((KeywordFunction)(KeywordVar)(SemiColon)(Identifier)(VarAssignment)(SourceCode))
}
phx::function<error_handler_f> handler;
phx::function<annotation_f<Iterator>> annotate;
qi::rule<Iterator, ast::SourceCode(), Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()> Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
};
int main()
{
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 - 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first); // Our parser
ast::SourceCode program;
bool ok = phrase_parse(iter,
last,
resolver,
qi::space,
program);
std::cout << std::boolalpha;
std::cout << "ok : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
for (auto const& va : program.assignments)
std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
[1] sadly un(der)documented, except for the conjure sample(s)
[2] well, I used as_string to get proper assignment to Identifier without too much work
[3] There could be smarter ways about this in terms of performance, but for now, let's keep it simple

Boost::spirit (classic) primitives vs custom parsers

I'm a beginner in Boost::spirit and I want to define grammar that parses TTCN language.
(http://www.trex.informatik.uni-goettingen.de/trac/wiki/ttcn-3_4.5.1)
I'm trying to define some rules for 'primitve' parsers like Alpha, AlphaNum to be faitful 1 to 1 to original grammar but obviously I do something wrong because grammar defined this way does not work.
But when I use primite parsers in place of TTCN's it started to work.
Can someone tell why 'manually' defined rules does not work as expected ?
How to fix it, because I would like to stick close to original grammar.
Is it a begginer's code bug or something different ?
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/classic_symbols.hpp>
#include <boost/spirit/include/classic_tree_to_xml.hpp>
#include <boost/spirit/include/classic_position_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_parse_tree.hpp>
#include <boost/spirit/include/classic_ast.hpp>
#include <iostream>
#include <string>
#include <boost/spirit/home/classic/debug.hpp>
using namespace boost::spirit::classic;
using namespace std;
using namespace BOOST_SPIRIT_CLASSIC_NS;
typedef node_iter_data_factory<int> factory_t;
typedef position_iterator<std::string::iterator> pos_iterator_t;
typedef tree_match<pos_iterator_t, factory_t> parse_tree_match_t;
typedef parse_tree_match_t::const_tree_iterator iter_t;
struct ParseGrammar: public grammar<ParseGrammar>
{
template<typename ScannerT>
struct definition
{
definition(ParseGrammar const &)
{
KeywordImport = str_p("import");
KeywordAll = str_p("all");
SemiColon = ch_p(';');
Underscore = ch_p('_');
NonZeroNum = range_p('1','9');
Num = ch_p('0') | NonZeroNum;
UpperAlpha = range_p('A', 'Z');
LowerAlpha = range_p('a', 'z');
Alpha = UpperAlpha | LowerAlpha;
AlphaNum = Alpha | Num;
//this does not!
Identifier = lexeme_d[Alpha >> *(AlphaNum | Underscore)];
// Uncomment below line to make rule work
// Identifier = lexeme_d[alpha_p >> *(alnum_p | Underscore)];
Module = KeywordImport >> Identifier >> KeywordAll >> SemiColon;
BOOST_SPIRIT_DEBUG_NODE(Module);
BOOST_SPIRIT_DEBUG_NODE(KeywordImport);
BOOST_SPIRIT_DEBUG_NODE(KeywordAll);
BOOST_SPIRIT_DEBUG_NODE(Identifier);
BOOST_SPIRIT_DEBUG_NODE(SemiColon);
}
rule<ScannerT> KeywordImport,KeywordAll,Module,Identifier,SemiColon;
rule<ScannerT> Alpha,UpperAlpha,LowerAlpha,Underscore,Num,AlphaNum;
rule<ScannerT> NonZeroNum;
rule<ScannerT> const&
start() const { return Module; }
};
};
int main()
{
ParseGrammar resolver; // Our parser
BOOST_SPIRIT_DEBUG_NODE(resolver);
string content = "import foobar all;";
pos_iterator_t pos_begin(content.begin(), content.end());
pos_iterator_t pos_end;
tree_parse_info<pos_iterator_t, factory_t> info;
info = ast_parse<factory_t>(pos_begin, pos_end, resolver, space_p);
std::cout << "\ninfo.length : " << info.length << std::endl;
std::cout << "info.full : " << info.full << std::endl;
if(info.full)
{
std::cout << "OK: Parsing succeeded\n\n";
}
else
{
int line = info.stop.get_position().line;
int column = info.stop.get_position().column;
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "-------------------------\n";
}
return 0;
}
I don't do Spirit Classic (which has been deprecated for some years now).
I can only assume you've mixed something up with skippers. Here's the thing translated into Spirit V2:
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
namespace qi = boost::spirit::qi;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
template <typename Iterator = pos_iterator_t, typename Skipper = qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar() : ParseGrammar::base_type(Module)
{
using namespace qi;
KeywordImport = lit("import");
KeywordAll = lit("all");
SemiColon = lit(';');
#if 1
// this rule obviously works
Identifier = lexeme [alpha >> *(alnum | '_')];
#else
// this does too, but less efficiently
Underscore = lit('_');
NonZeroNum = char_('1','9');
Num = char_('0') | NonZeroNum;
UpperAlpha = char_('A', 'Z');
LowerAlpha = char_('a', 'z');
Alpha = UpperAlpha | LowerAlpha;
AlphaNum = Alpha | Num;
Identifier = lexeme [Alpha >> *(AlphaNum | Underscore)];
#endif
Module = KeywordImport >> Identifier >> KeywordAll >> SemiColon;
BOOST_SPIRIT_DEBUG_NODES((Module)(KeywordImport)(KeywordAll)(Identifier)(SemiColon))
}
qi::rule<Iterator, Skipper> Module;
qi::rule<Iterator> KeywordImport,KeywordAll,Identifier,SemiColon;
qi::rule<Iterator> Alpha,UpperAlpha,LowerAlpha,Underscore,Num,AlphaNum;
qi::rule<Iterator> NonZeroNum;
};
int main()
{
std::string const content = "import \r\n\r\nfoobar\r\n\r\n all; bogus";
pos_iterator_t first(content.begin()), iter=first, last(content.end());
ParseGrammar<pos_iterator_t> resolver; // Our parser
bool ok = phrase_parse(iter, last, resolver, qi::space);
std::cout << std::boolalpha;
std::cout << "\nok : " << ok << std::endl;
std::cout << "full : " << (iter == last) << std::endl;
if(ok && iter==last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
I've added a little "bogus" at the end of input, so the output becomes a nicer demonstration:
<Module>
<try>import \r\n\r\nfoobar\r\n\r</try>
<KeywordImport>
<try>import \r\n\r\nfoobar\r\n\r</try>
<success> \r\n\r\nfoobar\r\n\r\n all;</success>
<attributes>[]</attributes>
</KeywordImport>
<Identifier>
<try>foobar\r\n\r\n all; bogu</try>
<success>\r\n\r\n all; bogus</success>
<attributes>[]</attributes>
</Identifier>
<KeywordAll>
<try>all; bogus</try>
<success>; bogus</success>
<attributes>[]</attributes>
</KeywordAll>
<SemiColon>
<try>; bogus</try>
<success> bogus</success>
<attributes>[]</attributes>
</SemiColon>
<success> bogus</success>
<attributes>[]</attributes>
</Module>
ok : true
full : false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 3:8
remaining: 'bogus'
-------------------------
That all said, this is what I'd probably reduce it to:
template <typename Iterator, typename Skipper = qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar() : ParseGrammar::base_type(Module)
{
using namespace qi;
Identifier = alpha >> *(alnum | '_');
Module = "import" >> Identifier >> "all" >> ';';
BOOST_SPIRIT_DEBUG_NODES((Module)(Identifier))
}
qi::rule<Iterator, Skipper> Module;
qi::rule<Iterator> Identifier;
};
As you can see, the Identifier rule is implicitely a lexeme because it doesn't declared to use a skipper.
See it Live on Coliru

Can Boost Spirit Rules be parameterized

In my Boost Spirit grammar I would like to have a rule that does this:
rule<...> noCaseLit = no_case[ lit( "KEYWORD" ) ];
but for a custom keyword so that I can do this:
... >> noCaseLit( "SomeSpecialKeyword" ) >> ... >> noCaseLit( "OtherSpecialKeyword1" )
Is this possible with Boost Spirit rules and if so how?
P.S. I use the case insensitive thing as an example, what I'm after is rule parameterization in general.
Edits:
Through the link provided by 'sehe' in the comments I was able to come close to what I wanted but I'm not quite there yet.
/* Defining the noCaseLit rule */
rule<Iterator, string(string)> noCaseLit = no_case[lit(_r1)];
/* Using the noCaseLit rule */
rule<...> someRule = ... >> noCaseLit(phx::val("SomeSpecialKeyword")) >> ...
I haven't yet figured out a way to automatically convert the literal string to the Phoenix value so that I can use the rule like this:
rule<...> someRule = ... >> noCaseLit("SomeSpecialKeyword") >> ...
The easiest way is to simply create a function that returns your rule/parser. In the example near the end of this page you can find a way to declare the return value of your function. (The same here in a commented example).
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace ascii = boost::spirit::ascii;
namespace qi = boost::spirit::qi;
typedef boost::proto::result_of::deep_copy<
BOOST_TYPEOF(ascii::no_case[qi::lit(std::string())])
>::type nocaselit_return_type;
nocaselit_return_type nocaselit(const std::string& keyword)
{
return boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]);
}
//C++11 VERSION EASIER TO MODIFY (AND DOESN'T REQUIRE THE TYPEDEF)
//auto nocaselit(const std::string& keyword) -> decltype(boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]))
//{
// return boost::proto::deep_copy(ascii::no_case[qi::lit(keyword)]);
//}
int main()
{
std::string test1="MyKeYWoRD";
std::string::const_iterator iter=test1.begin();
std::string::const_iterator end=test1.end();
if(qi::parse(iter,end,nocaselit("mYkEywOrd"))&& (iter==end))
std::cout << "Parse 1 Successful" << std::endl;
else
std::cout << "Parse 2 Failed. Remaining: " << std::string(iter,end) << std::endl;
qi::rule<std::string::const_iterator,ascii::space_type> myrule =
*(
( nocaselit("double") >> ':' >> qi::double_ )
| ( nocaselit("keyword") >> '-' >> *(qi::char_ - '.') >> '.')
);
std::string test2=" DOUBLE : 3.5 KEYWORD-whatever.Double :2.5";
iter=test2.begin();
end=test2.end();
if(qi::phrase_parse(iter,end,myrule,ascii::space)&& (iter==end))
std::cout << "Parse 2 Successful" << std::endl;
else
std::cout << "Parse 2 Failed. Remaining: " << std::string(iter,end) << std::endl;
return 0;
}

How to pass the iterator to a function in spirit qi

template <typename Iterator>
struct parse_grammar
: qi::grammar<Iterator, std::string()>
{
parse_grammar()
: parse_grammar::base_type(start_p, "start_p"){
a_p = ',' > qi::double_;
b_p = *a_p;
start_p = qi::double_ > b_p >> qi::eoi;
}
qi::rule<Iterator, std::string()> a_p;
qi::rule<Iterator, std::string()> b_p;
qi::rule<Iterator, std::string()> start_p;
};
// implementation
std::vector<double> parse(std::istream& input, const std::string& filename)
{
// iterate over stream input
typedef std::istreambuf_iterator<char> base_iterator_type;
base_iterator_type in_begin(input);
// convert input iterator to forward iterator, usable by spirit parser
typedef boost::spirit::multi_pass<base_iterator_type> forward_iterator_type;
forward_iterator_type fwd_begin = boost::spirit::make_default_multi_pass(in_begin);
forward_iterator_type fwd_end;
// prepare output
std::vector<double> output;
// wrap forward iterator with position iterator, to record the position
typedef classic::position_iterator2<forward_iterator_type> pos_iterator_type;
pos_iterator_type position_begin(fwd_begin, fwd_end, filename);
pos_iterator_type position_end;
parse_grammar<pos_iterator_type> gram;
// parse
try
{
qi::phrase_parse(
position_begin, position_end, // iterators over input
gram, // recognize list of doubles
ascii::space); // comment skipper
}
catch(const qi::expectation_failure<pos_iterator_type>& e)
{
const classic::file_position_base<std::string>& pos = e.first.get_position();
std::stringstream msg;
msg <<
"parse error at file " << pos.file <<
" line " << pos.line << " column " << pos.column << std::endl <<
"'" << e.first.get_currentline() << "'" << std::endl <<
" " << "^- here";
throw std::runtime_error(msg.str());
}
// return result
return output;
}
I have this above sample code(Code used from boost-spirit website for example here).
In the grammar in the rule a_p I want to use semantic action and call a method and pass the iterator to it something as below:
a_p = ',' > qi::double_[boost::bind(&parse_grammar::doStuff(), this,
boost::ref(position_begin), boost::ref(position_end)];
and if the signature of the method doStuff is like this:
void doStuff(pos_iterator_type const& first, pos_iterator_type const& last);
Any ideas how to do this?
I do not mind any way(if I can do it using boost::phoenix or something not sure how) as long as to the method the iterators are passed with their current state.
I'm not completely sure why you think you 'need' what you describe. I'm afraid the solution to your actual task might be very simple:
start_p = qi::double_ % ',' > qi::eoi;
However, since the actual question is quite interesting, and the use of position interators in combination with istream_buf (rather than just the usual (slower) boost::spirit::istream_iterator) has it's merit, I'll show you how to do it with the semantic action as well.
For a simple (but rather complete) test main of
int main()
{
std::istringstream iss(
"1, -3.4 ,3.1415926\n"
",+inF,-NaN ,\n"
"2,-.4,4.14e7\n");
data_t parsed = parse(iss, "<inline-test>");
std::cout << "Done, parsed " << parsed.size() << " values ("
<< "min: " << *std::min_element(parsed.begin(), parsed.end()) << ", "
<< "max: " << *std::max_element(parsed.begin(), parsed.end()) << ")\n";
}
The output with the semantic action now becomes:
debug ('start_p') at <inline-test>:1:[1..2] '1' = 1
debug ('start_p') at <inline-test>:1:[4..8] '-3.4' = -3.4
debug ('start_p') at <inline-test>:1:[10..19] '3.1415926' = 3.14159
debug ('start_p') at <inline-test>:2:[2..6] '+inF' = inf
debug ('start_p') at <inline-test>:2:[7..11] '-NaN' = -nan
debug ('start_p') at <inline-test>:3:[1..2] '2' = 2
debug ('start_p') at <inline-test>:3:[3..6] '-.4' = -0.4
debug ('start_p') at <inline-test>:3:[7..13] '4.14e7' = 4.14e+07
Done, parsed 8 values (min: -3.4, max: inf)
See it live at http://liveworkspace.org/code/8a874ef3...
Note how it
demonstrates access to the name of the actual parser instance ('start_p')
demonstrates accces to the full source iterator range
shows how to do specialized processing inside the semantic action
I still suggest using qi::double_ to parse the raw input, because it is the only thing I know that easily handles all cases (see test data and this other question: Is it possible to read infinity or NaN values using input streams?)
demonstrates parsing the actual data into the vector efficiently by displaying statistics of the parsed values
Full Code
Here is the full code for future reference:
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_multi_pass.hpp>
#include <boost/spirit/include/classic_position_iterator.hpp>
#include <boost/phoenix/function/adapt_function.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
namespace classic = boost::spirit::classic;
namespace ascii = boost::spirit::ascii;
typedef std::vector<double> data_t;
///////// USING A FREE FUNCTION
//
template <typename Grammar, typename Range>
double doStuff_(Grammar &grammar, Range pos_range)
{
// for efficiency, cache adhoc grammar:
static const qi::rule <typename Range::iterator, double()> r_double = qi::double_;
static const qi::grammar<typename Range::iterator, double()> g_double(r_double); // caching just the rule may be enough, actually
double value = 0;
qi::parse(pos_range.begin(), pos_range.end(), g_double, value);
std::cout << "debug ('" << grammar.name() << "') at "
<< pos_range.begin().get_position().file << ":"
<< pos_range.begin().get_position().line << ":["
<< pos_range.begin().get_position().column << ".."
<< pos_range.end ().get_position().column << "]\t"
<< "'" << std::string(pos_range.begin(),pos_range.end()) << "'\t = "
<< value
<< '\n';
return value;
}
BOOST_PHOENIX_ADAPT_FUNCTION(double, doStuff, doStuff_, 2)
template <typename Iterator, typename Skipper>
struct parse_grammar : qi::grammar<Iterator, data_t(), Skipper>
{
parse_grammar()
: parse_grammar::base_type(start_p, "start_p")
{
using qi::raw;
using qi::double_;
using qi::_1;
using qi::_val;
using qi::eoi;
using phx::push_back;
value_p = raw [ double_ ] [ _val = doStuff(phx::ref(*this), _1) ];
start_p = value_p % ',' > eoi;
// // To use without the semantic action (more efficient):
// start_p = double_ % ',' >> eoi;
}
qi::rule<Iterator, data_t::value_type(), Skipper> value_p;
qi::rule<Iterator, data_t(), Skipper> start_p;
};
// implementation
data_t parse(std::istream& input, const std::string& filename)
{
// iterate over stream input
typedef std::istreambuf_iterator<char> base_iterator_type;
base_iterator_type in_begin(input);
// convert input iterator to forward iterator, usable by spirit parser
typedef boost::spirit::multi_pass<base_iterator_type> forward_iterator_type;
forward_iterator_type fwd_begin = boost::spirit::make_default_multi_pass(in_begin);
forward_iterator_type fwd_end;
// wrap forward iterator with position iterator, to record the position
typedef classic::position_iterator2<forward_iterator_type> pos_iterator_type;
pos_iterator_type position_begin(fwd_begin, fwd_end, filename);
pos_iterator_type position_end;
parse_grammar<pos_iterator_type, ascii::space_type> gram;
data_t output;
// parse
try
{
if (!qi::phrase_parse(
position_begin, position_end, // iterators over input
gram, // recognize list of doubles
ascii::space, // comment skipper
output) // <-- attribute reference
)
{
std::cerr << "Parse failed at "
<< position_begin.get_position().file << ":"
<< position_begin.get_position().line << ":"
<< position_begin.get_position().column << "\n";
}
}
catch(const qi::expectation_failure<pos_iterator_type>& e)
{
const classic::file_position_base<std::string>& pos = e.first.get_position();
std::stringstream msg;
msg << "parse error at file " << pos.file
<< " line " << pos.line
<< " column " << pos.column
<< "\n\t'" << e.first.get_currentline()
<< "'\n\t " << std::string(pos.column, ' ') << "^-- here";
throw std::runtime_error(msg.str());
}
return output;
}
int main()
{
std::istringstream iss(
"1, -3.4 ,3.1415926\n"
",+inF,-NaN ,\n"
"2,-.4,4.14e7\n");
data_t parsed = parse(iss, "<inline-test>");
std::cout << "Done, parsed " << parsed.size() << " values ("
<< "min: " << *std::min_element(parsed.begin(), parsed.end()) << ", "
<< "max: " << *std::max_element(parsed.begin(), parsed.end()) << ")\n";
}