Attempting to parse SQL like statement with Boost-Spirit - c++

I'm a newbie in boost::spirit. I wrote the program to parse a SQL statement, like "select * from table where conditions". It compile failed. A large of template errors reported. So, would somebody help me?
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
struct db_select {
void exec() {}
std::string filed;
std::string table;
std::string condition;
};
std::ostream& operator<<(std::ostream& os, const db_select& se) {
return os << "filed: " << se.filed << " table: " << se.table << " condition: " << se.condition;
}
template <class Iterator>
struct selecter : qi::grammar<Iterator, db_select (), ascii::space_type> {
selecter() : selecter::base_type(se) {
se %= "select" >> +qi::char_ << "from" << +qi::char_ << "where" << +qi::char_;
}
qi::rule<Iterator, db_select (), ascii::space_type> se;
};
int main(int argc, char* argv[]) {
if (argc < 2)
return -1;
std::string str(argv[1]);
const char* first = str.c_str();
const char* last = &str[str.size()];
selecter<const char*> se;
db_select rst;
bool r = qi::phrase_parse(first, last, se, ascii::space, rst);
if (!r || first != last) {
std::cout << "parse failed, at: " << std::string(first, last) << std::endl;
return -1;
} else
std::cout << "success, " << rst << std::endl;
return 0;
}

Edit Finally behind a computer, revised answer:
There were three things to note
1. Syntax errors
The parser expression contained errors (<< instead of >> as intended). This accounts for a lot of compile errors. Note the appearance of ******* in the compiler error:
/.../qi/nonterminal/rule.hpp|176 col 13| error: no matching function for call to ‘assertion_failed(mpl_::failed************
which is designed to lead you to the corresponding comment in the source code:
// Report invalid expression error as early as possible.
// If you got an error_invalid_expression error message here,
// then the expression (expr) is not a valid spirit qi expression.
BOOST_SPIRIT_ASSERT_MATCH(qi::domain, Expr);
Easily fixed. One down, two to go!
2. Adapting the struct
In order to assign to your datatype as an rule attribute, you need to make it fusion compatible. The most convenient way:
BOOST_FUSION_ADAPT_STRUCT(db_select,
(std::string,field)(std::string,table)(std::string,condition));
Now the code compiles. But the parsing fails. One more to go:
3. Lexemes
Further more you will need to take measures to avoid 'eating' your query keywords with the +qi::char_ expressions.
As a basis, consider writing it something like
lexeme [ (!lit("where") >> +qi::graph) % +qi::space ]
lexeme inhibits the skipper for the enclosed expression
! Asserts that the specified keyword must not match
Lastly, look at the docs for qi::no_case to do case-insensitive matching.
FULL CODE
#include <iostream>
#include <string>
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct db_select {
void exec() {}
std::string field;
std::string table;
std::string condition;
};
BOOST_FUSION_ADAPT_STRUCT(db_select,(std::string,field)(std::string,table)(std::string,condition));
std::ostream& operator<<(std::ostream& os, const db_select& se) {
return os << "field: " << se.field << " table: " << se.table << " condition: " << se.condition;
}
template <class Iterator>
struct selecter : qi::grammar<Iterator, db_select (), qi::space_type> {
selecter() : selecter::base_type(se) {
using namespace qi;
se %= "select"
>> lexeme [ (!lit("from") >> +graph) % +space ] >> "from"
>> lexeme [ (!lit("where") >> +graph) % +space ] >> "where"
>> +qi::char_;
}
qi::rule<Iterator, db_select (), qi::space_type> se;
};
int main(int argc, char* argv[]) {
if (argc < 2)
return -1;
std::string str(argv[1]);
const char* first = str.c_str();
const char* last = &str[str.size()];
selecter<const char*> se;
db_select rst;
bool r = qi::phrase_parse(first, last, se, qi::space, rst);
if (!r || first != last) {
std::cout << "parse failed, at: " << std::string(first, last) << std::endl;
return -1;
} else
std::cout << "success, " << rst << std::endl;
return 0;
}
Test:
g++ test.cpp -o test
./test "select aap, noot, mies from table where field = 'value'"
Output:
success, field: aap,noot,mies table: table condition: field='value'

Related

Boost Spirit Qi: Skipper parser does not skip under certain conditions

I am currently implementing a parser which succeeds on the "strongest" match for spirit::qi. There are meaningful applications for such a thing. E.g matching references to either simple refs (eg "willy") or namespace qualified refs (eg. "willy::anton"). That's not my actual real world case but it is almost self-explanatory, I guess. At least it helped me to track down the issue.
I found a solution for that. It works perfectly, when the skipper parser is not involved (i.e. there is nothing to skip). It does not work as expected if there are areas which need skipping.
I believe, I tracked down the problem. It seems like under certain conditions spaces are actually not skipped allthough they should be.
Below is find a self-contained very working example. It loops over some rules and some input to provide enough information. If you run it with BOOST_SPIRIT_DEBUG enabled, you get in particular the output:
<qualifier>
<try> :: anton</try>
<fail/>
</qualifier>
I think, this one should not have failed. Am I right guessing so? Does anyone know a way to get around that? Or is it just my poor understanding of qi semantics? Thank you very much for your time. :)
My environment: MSVC 2015 latest, target win32 console
#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include<map>
#include <boost/spirit/include/qi.hpp>
typedef std::string::const_iterator iterator_type;
namespace qi = boost::spirit::qi;
using map_type = std::map<std::string, qi::rule<iterator_type, std::string()>&>;
namespace maxence { namespace parser {
template <typename Iterator>
struct ident : qi::grammar<Iterator, std::string()>
{
ident();
qi::rule<Iterator, std::string()>
id, id_raw;
qi::rule<Iterator, std::string()>
not_used,
qualifier,
qualified_id, simple_id,
id_reference, id_reference_final;
map_type rules = {
{ "E1", id },
{ "E2", id_raw}
};
};
template <typename Iterator>
// we actually don't need the start rule (see below)
ident<Iterator>::ident() : ident::base_type(not_used)
{
id_reference = (!simple_id >> qualified_id) | (!qualified_id >> simple_id);
id_reference_final = id_reference;
///////////////////////////////////////////////////
// standard simple id (not followed by
// delimiter "::")
simple_id = (qi::alpha | '_') >> *(qi::alnum | '_') >> !qi::lit("::");
///////////////////////////////////////////////////
// this is qualifier <- "::" simple_id
// I repeat the simple_id pattern here to make sure
// this demo has no "early match" issues
qualifier = qi::string("::") > (qi::alpha | '_') >> *(qi::alnum | '_');
///////////////////////////////////////////////////
// this is: qualified_id <- simple_id qualifier*
qualified_id = (qi::alpha | '_') >> *(qi::alnum | '_') >> +(qualifier) >> !qi::lit("::");
id = id_reference_final;
id_raw = qi::raw[id_reference_final];
BOOST_SPIRIT_DEBUG_NODES(
(id)
(id_raw)
(qualifier)
(qualified_id)
(simple_id)
(id_reference)
(id_reference_final)
)
}
}}
int main()
{
maxence::parser::ident<iterator_type> ident;
using ss_map_type = std::map<std::string, std::string>;
ss_map_type parser_input =
{
{ "Simple id (behaves ok)", "willy" },
{ "Qualified id (behaves ok)", "willy::anton" },
{ "Skipper involved (unexpected)", "willy :: anton" }
};
for (ss_map_type::const_iterator input = parser_input.begin(); input != parser_input.end(); input++) {
for (map_type::const_iterator example = ident.rules.begin(); example != ident.rules.end(); example++) {
std::string to_parse = input->second;
std::string result;
std::string parser_name = (example->second).name();
std::cout << "--------------------------------------------" << std::endl;
std::cout << "Description: " << input->first << std::endl;
std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
auto b(to_parse.begin()), e(to_parse.end());
// --- test for parser success
bool success = qi::phrase_parse(b, e, (example)->second, qi::space, result);
if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
else std::cout << " Parser failed. " << std::endl;
//--- test for EOI
if (b == e) {
std::cout << "EOI reached.";
if (success) std::cout << " The sun is shining brightly. :)";
} else {
std::cout << "Failure: EOI not reached. Remaining: [";
while (b != e) std::cout << *b++; std::cout << "]";
}
std::cout << std::endl << "--------------------------------------------" << std::endl;
}
}
return 0;
}

boost::spirit access position iterator from semantic actions

Lets say I have code like this (line numbers for reference):
1:
2:function FuncName_1 {
3: var Var_1 = 3;
4: var Var_2 = 4;
5: ...
I want to write a grammar that parses such text, puts all indentifiers (function and variable names) infos into a tree (utree?).
Each node should preserve: line_num, column_num and symbol value. example:
root: FuncName_1 (line:2,col:10)
children[0]: Var_1 (line:3, col:8)
children[1]: Var_1 (line:4, col:9)
I want to put it into the tree because I plan to traverse through that tree and for each node I must know the 'context': (all parent nodes of current nodes).
E.g, while processing node with Var_1, I must know that this is a name for local variable for function FuncName_1 (that is currently being processed as node, but one level earlier)
I cannot figure out few things
Can this be done in Spirit with semantic actions and utree's ? Or should I use variant<> trees ?
How to pass to the node those three informations (column,line,symbol_name) at the same time ? I know I must use pos_iterator as iterator type for grammar but how to access those information in sematic action ?
I'm a newbie in Boost so I read the Spirit documentaiton over and over, I try to google my problems but I somehow cannot put all the pieces together ot find the solution. Seems like there was no one me with such use case like mine before (or I'm just not able to find it)
Looks like the only solutions with position iterator are the ones with parsing error handling, but this is not the case I'm interested in.
The code that only parses the code I was taking about is below but I dont know how to move forward with it.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
namespace qi = boost::spirit::qi;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar():ParseGrammar::base_type(SourceCode)
{
using namespace qi;
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = lexeme [alpha >> *(alnum | '_')];
VarAssignemnt = KeywordVar >> Identifier >> char_('=') >> int_ >> SemiColon;
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignemnt >> '}';
}
qi::rule<Iterator, Skipper> SourceCode;
qi::rule<Iterator > KeywordFunction;
qi::rule<Iterator, Skipper> VarAssignemnt;
qi::rule<Iterator> KeywordVar;
qi::rule<Iterator> SemiColon;
qi::rule<Iterator > Identifier;
};
int main()
{
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 = 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver; // Our parser
bool ok = phrase_parse(iter,
last,
resolver,
qi::space);
std::cout << std::boolalpha;
std::cout << "\nok : " << ok << std::endl;
std::cout << "full : " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
This has been a fun exercise, where I finally put together a working demo of on_success[1] to annotate AST nodes.
Let's assume we want an AST like:
namespace ast
{
struct LocationInfo {
unsigned line, column, length;
};
struct Identifier : LocationInfo {
std::string name;
};
struct VarAssignment : LocationInfo {
Identifier id;
int value;
};
struct SourceCode : LocationInfo {
Identifier function;
std::vector<VarAssignment> assignments;
};
}
I know, 'location information' is probably overkill for the SourceCode node, but you know... Anyways, to make it easy to assign attributes to these nodes without requiring semantic actions or lots of specifically crafted constructors:
#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier, (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode, (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))
There. Now we can declare the rules to expose these attributes:
qi::rule<Iterator, ast::SourceCode(), Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()> Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
We don't (essentially) modify the grammar, at all: attribute propagation is "just automatic"[2] :
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment = KeywordVar >> Identifier >> '=' >> int_ >> SemiColon;
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';
The magic
How do we get the source location information attached to our nodes?
auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier, set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode, set_location_info);
Now, annotate is just a lazy version of a calleable that is defined as:
template<typename It>
struct annotation_f {
typedef void result_type;
annotation_f(It first) : first(first) {}
It const first;
template<typename Val, typename First, typename Last>
void operator()(Val& v, First f, Last l) const {
do_annotate(v, f, l, first);
}
private:
void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
using std::distance;
li.line = get_line(f);
li.column = get_column(first, f);
li.length = distance(f, l);
}
static void do_annotate(...) { }
};
Due to way in which get_column works, the functor is stateful (as it remembers the start iterator)[3]. As you can see do_annotate just accepts anything that derives from LocationInfo.
Now, the proof of the pudding:
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 = 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first); // Our parser
ast::SourceCode program;
bool ok = phrase_parse(iter,
last,
resolver,
qi::space,
program);
std::cout << std::boolalpha;
std::cout << "ok : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
for (auto const& va : program.assignments)
std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
This prints:
ok : true
full: true
OK: Parsing fully succeeded
Function name: FuncName_1 (see L1:1:56)
variable Var_1 assigned value 3 at L2:3:14
variable Var_2 assigned value 4 at L3:3:15
Full Demo Program
See it Live On Coliru
Also showing:
error handling, e.g.:
Error: expecting "=" in line 3:
var Var_2 - 4; }
^---- here
ok : false
full: false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 1:1
remaining: 'function FuncName_1 {
var Var_1 = 3;
var Var_2 - 4; }'
-------------------------
BOOST_SPIRIT_DEBUG macros
A bit of a hacky way to conveniently stream the LocationInfo part of any AST node, sorry :)
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
namespace ast
{
namespace manip { struct LocationInfoPrinter; }
struct LocationInfo {
unsigned line, column, length;
manip::LocationInfoPrinter printLoc() const;
};
struct Identifier : LocationInfo {
std::string name;
};
struct VarAssignment : LocationInfo {
Identifier id;
int value;
};
struct SourceCode : LocationInfo {
Identifier function;
std::vector<VarAssignment> assignments;
};
///////////////////////////////////////////////////////////////////////////
// Completely unnecessary tweak to get a "poor man's" io manipulator going
// so we can do `std::cout << x.printLoc()` on types of `x` deriving from
// LocationInfo
namespace manip {
struct LocationInfoPrinter {
LocationInfoPrinter(LocationInfo const& ref) : ref(ref) {}
LocationInfo const& ref;
friend std::ostream& operator<<(std::ostream& os, LocationInfoPrinter const& lip) {
return os << lip.ref.line << ':' << lip.ref.column << ':' << lip.ref.length;
}
};
}
manip::LocationInfoPrinter LocationInfo::printLoc() const { return { *this }; }
// feel free to disregard this hack
///////////////////////////////////////////////////////////////////////////
}
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier, (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode, (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))
struct error_handler_f {
typedef qi::error_handler_result result_type;
template<typename T1, typename T2, typename T3, typename T4>
qi::error_handler_result operator()(T1 b, T2 e, T3 where, T4 const& what) const {
std::cerr << "Error: expecting " << what << " in line " << get_line(where) << ": \n"
<< std::string(b,e) << "\n"
<< std::setw(std::distance(b, where)) << '^' << "---- here\n";
return qi::fail;
}
};
template<typename It>
struct annotation_f {
typedef void result_type;
annotation_f(It first) : first(first) {}
It const first;
template<typename Val, typename First, typename Last>
void operator()(Val& v, First f, Last l) const {
do_annotate(v, f, l, first);
}
private:
void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
using std::distance;
li.line = get_line(f);
li.column = get_column(first, f);
li.length = distance(f, l);
}
static void do_annotate(...) {}
};
template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, ast::SourceCode(), Skipper>
{
ParseGrammar(Iterator first) :
ParseGrammar::base_type(SourceCode),
annotate(first)
{
using namespace qi;
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment = KeywordVar > Identifier > '=' > int_ > SemiColon; // note: expectation points
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';
on_error<fail>(VarAssignment, handler(_1, _2, _3, _4));
on_error<fail>(SourceCode, handler(_1, _2, _3, _4));
auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier, set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode, set_location_info);
BOOST_SPIRIT_DEBUG_NODES((KeywordFunction)(KeywordVar)(SemiColon)(Identifier)(VarAssignment)(SourceCode))
}
phx::function<error_handler_f> handler;
phx::function<annotation_f<Iterator>> annotate;
qi::rule<Iterator, ast::SourceCode(), Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()> Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
};
int main()
{
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 - 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first); // Our parser
ast::SourceCode program;
bool ok = phrase_parse(iter,
last,
resolver,
qi::space,
program);
std::cout << std::boolalpha;
std::cout << "ok : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
for (auto const& va : program.assignments)
std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
[1] sadly un(der)documented, except for the conjure sample(s)
[2] well, I used as_string to get proper assignment to Identifier without too much work
[3] There could be smarter ways about this in terms of performance, but for now, let's keep it simple

Boost::spirit (classic) primitives vs custom parsers

I'm a beginner in Boost::spirit and I want to define grammar that parses TTCN language.
(http://www.trex.informatik.uni-goettingen.de/trac/wiki/ttcn-3_4.5.1)
I'm trying to define some rules for 'primitve' parsers like Alpha, AlphaNum to be faitful 1 to 1 to original grammar but obviously I do something wrong because grammar defined this way does not work.
But when I use primite parsers in place of TTCN's it started to work.
Can someone tell why 'manually' defined rules does not work as expected ?
How to fix it, because I would like to stick close to original grammar.
Is it a begginer's code bug or something different ?
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/classic_symbols.hpp>
#include <boost/spirit/include/classic_tree_to_xml.hpp>
#include <boost/spirit/include/classic_position_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_parse_tree.hpp>
#include <boost/spirit/include/classic_ast.hpp>
#include <iostream>
#include <string>
#include <boost/spirit/home/classic/debug.hpp>
using namespace boost::spirit::classic;
using namespace std;
using namespace BOOST_SPIRIT_CLASSIC_NS;
typedef node_iter_data_factory<int> factory_t;
typedef position_iterator<std::string::iterator> pos_iterator_t;
typedef tree_match<pos_iterator_t, factory_t> parse_tree_match_t;
typedef parse_tree_match_t::const_tree_iterator iter_t;
struct ParseGrammar: public grammar<ParseGrammar>
{
template<typename ScannerT>
struct definition
{
definition(ParseGrammar const &)
{
KeywordImport = str_p("import");
KeywordAll = str_p("all");
SemiColon = ch_p(';');
Underscore = ch_p('_');
NonZeroNum = range_p('1','9');
Num = ch_p('0') | NonZeroNum;
UpperAlpha = range_p('A', 'Z');
LowerAlpha = range_p('a', 'z');
Alpha = UpperAlpha | LowerAlpha;
AlphaNum = Alpha | Num;
//this does not!
Identifier = lexeme_d[Alpha >> *(AlphaNum | Underscore)];
// Uncomment below line to make rule work
// Identifier = lexeme_d[alpha_p >> *(alnum_p | Underscore)];
Module = KeywordImport >> Identifier >> KeywordAll >> SemiColon;
BOOST_SPIRIT_DEBUG_NODE(Module);
BOOST_SPIRIT_DEBUG_NODE(KeywordImport);
BOOST_SPIRIT_DEBUG_NODE(KeywordAll);
BOOST_SPIRIT_DEBUG_NODE(Identifier);
BOOST_SPIRIT_DEBUG_NODE(SemiColon);
}
rule<ScannerT> KeywordImport,KeywordAll,Module,Identifier,SemiColon;
rule<ScannerT> Alpha,UpperAlpha,LowerAlpha,Underscore,Num,AlphaNum;
rule<ScannerT> NonZeroNum;
rule<ScannerT> const&
start() const { return Module; }
};
};
int main()
{
ParseGrammar resolver; // Our parser
BOOST_SPIRIT_DEBUG_NODE(resolver);
string content = "import foobar all;";
pos_iterator_t pos_begin(content.begin(), content.end());
pos_iterator_t pos_end;
tree_parse_info<pos_iterator_t, factory_t> info;
info = ast_parse<factory_t>(pos_begin, pos_end, resolver, space_p);
std::cout << "\ninfo.length : " << info.length << std::endl;
std::cout << "info.full : " << info.full << std::endl;
if(info.full)
{
std::cout << "OK: Parsing succeeded\n\n";
}
else
{
int line = info.stop.get_position().line;
int column = info.stop.get_position().column;
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "-------------------------\n";
}
return 0;
}
I don't do Spirit Classic (which has been deprecated for some years now).
I can only assume you've mixed something up with skippers. Here's the thing translated into Spirit V2:
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
namespace qi = boost::spirit::qi;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
template <typename Iterator = pos_iterator_t, typename Skipper = qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar() : ParseGrammar::base_type(Module)
{
using namespace qi;
KeywordImport = lit("import");
KeywordAll = lit("all");
SemiColon = lit(';');
#if 1
// this rule obviously works
Identifier = lexeme [alpha >> *(alnum | '_')];
#else
// this does too, but less efficiently
Underscore = lit('_');
NonZeroNum = char_('1','9');
Num = char_('0') | NonZeroNum;
UpperAlpha = char_('A', 'Z');
LowerAlpha = char_('a', 'z');
Alpha = UpperAlpha | LowerAlpha;
AlphaNum = Alpha | Num;
Identifier = lexeme [Alpha >> *(AlphaNum | Underscore)];
#endif
Module = KeywordImport >> Identifier >> KeywordAll >> SemiColon;
BOOST_SPIRIT_DEBUG_NODES((Module)(KeywordImport)(KeywordAll)(Identifier)(SemiColon))
}
qi::rule<Iterator, Skipper> Module;
qi::rule<Iterator> KeywordImport,KeywordAll,Identifier,SemiColon;
qi::rule<Iterator> Alpha,UpperAlpha,LowerAlpha,Underscore,Num,AlphaNum;
qi::rule<Iterator> NonZeroNum;
};
int main()
{
std::string const content = "import \r\n\r\nfoobar\r\n\r\n all; bogus";
pos_iterator_t first(content.begin()), iter=first, last(content.end());
ParseGrammar<pos_iterator_t> resolver; // Our parser
bool ok = phrase_parse(iter, last, resolver, qi::space);
std::cout << std::boolalpha;
std::cout << "\nok : " << ok << std::endl;
std::cout << "full : " << (iter == last) << std::endl;
if(ok && iter==last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
I've added a little "bogus" at the end of input, so the output becomes a nicer demonstration:
<Module>
<try>import \r\n\r\nfoobar\r\n\r</try>
<KeywordImport>
<try>import \r\n\r\nfoobar\r\n\r</try>
<success> \r\n\r\nfoobar\r\n\r\n all;</success>
<attributes>[]</attributes>
</KeywordImport>
<Identifier>
<try>foobar\r\n\r\n all; bogu</try>
<success>\r\n\r\n all; bogus</success>
<attributes>[]</attributes>
</Identifier>
<KeywordAll>
<try>all; bogus</try>
<success>; bogus</success>
<attributes>[]</attributes>
</KeywordAll>
<SemiColon>
<try>; bogus</try>
<success> bogus</success>
<attributes>[]</attributes>
</SemiColon>
<success> bogus</success>
<attributes>[]</attributes>
</Module>
ok : true
full : false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 3:8
remaining: 'bogus'
-------------------------
That all said, this is what I'd probably reduce it to:
template <typename Iterator, typename Skipper = qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar() : ParseGrammar::base_type(Module)
{
using namespace qi;
Identifier = alpha >> *(alnum | '_');
Module = "import" >> Identifier >> "all" >> ';';
BOOST_SPIRIT_DEBUG_NODES((Module)(Identifier))
}
qi::rule<Iterator, Skipper> Module;
qi::rule<Iterator> Identifier;
};
As you can see, the Identifier rule is implicitely a lexeme because it doesn't declared to use a skipper.
See it Live on Coliru

how to parse and verify an ordered list of integers using qi

I'm parsing a text file, possibly several GB in size, consisting of lines as follows:
11 0.1
14 0.78
532 -3.5
Basically, one int and one float per line. The ints should be ordered and non-negative. I'd like to verify the data are as described, and have returned to me the min and max int in the range. This is what I've come up with:
#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace my_parsers
{
using namespace qi;
using px::at_c;
using px::val;
template <typename Iterator>
struct verify_data : grammar<Iterator, locals<int>, std::pair<int, int>()>
{
verify_data() : verify_data::base_type(section)
{
section
= line(val(0)) [ at_c<0>(_val) = _1]
>> +line(_a) [ _a = _1]
>> eps [ at_c<1>(_val) = _a]
;
line
%= (int_ >> other) [
if_(_r1 >= _1)
[
std::cout << _r1 << " and "
<< _1 << val(" out of order\n")
]
]
;
other
= omit[(lit(' ') | '\t') >> float_ >> eol];
}
rule<Iterator, locals<int>, std::pair<int, int>() > section;
rule<Iterator, int(int)> line;
rule<Iterator> other;
};
}
using namespace std;
int main(int argc, char** argv)
{
string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
my_parsers::verify_data<string::iterator> verifier;
pair<int, int> p;
std::string::iterator begin(input.begin()), end(input.end());
cout << "parse result: " << boolalpha
<< qi::parse(begin, end, verifier, p) << endl;
cout << "p.first: " << p.first << "\np.second: " << p.second << endl;
return 0;
}
What I'd like to know is the following:
Is there a better way of going about this? I have used inherited and synthesised attributes, local variables and a bit of phoenix voodoo. This is great; learning the tools is good but I can't help thinking there might be a much simpler way of achieving the same thing :/ (within a PEG parser that is...)
How could it be done without the local variable for instance?
More info: I have other data formats that are being parsed at the same time and so I'd like to keep the return value as a parser attribute. At the moment this is a std::pair, the other data formats when parsed, will expose their own std::pairs for instance and it's these that I'd like to stuff in a std::vector.
This is at least a lot shorter already:
down to 28 LOC
no more locals
no more fusion vector at<> wizardry
no more inherited attributes
no more grammar class
no more manual iteration
using expectation points (see other) to enhance parse error reporting
this parser expressions synthesizes neatly into a vector<int> if you choose to assign it with %= (but it will cost performance, besides potentially allocating a largish array)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
int min=-1, max=0;
{
using namespace qi;
using px::val;
using px::ref;
It begin(input.begin()), end(input.end());
rule<It> index = int_
[
if_(ref(max) < _1) [ ref(max) = _1 ] .else_ [ std::cout << _1 << val(" out of order\n") ],
if_(ref(min) < 0) [ ref(min) = _1 ]
] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
Bonus
I might suggest taking the validation out of the expression and make it a free-standing function; of course, this makes things more verbose (and... legible) and my braindead sample uses global variables... -- but I trust you know how to use boost::bind or px::bind to make it more real-life
In addition to the above
down to 27 LOC even with the free function
no more phoenix, no more phoenix includes (yay compile times)
no more phoenix expression types in debug builds ballooning the binary and slowing it down
no more var, ref, if_, .else_ and the wretched operator, (which had major bug risk (at some time) due to the overload not being included with phoenix.hpp)
(easily ported to c++0x lambda's - immediately removing the need for global variables)
.
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int min=-1, max=0, linenumber=0;
void validate_index(int index)
{
linenumber++;
if (min < 0) min = index;
if (max < index) max = index;
else std::cout << index << " out of order at line " << linenumber << std::endl;
}
int main(int argc, char** argv)
{
std::string input("11 0.1\n"
"14 0.78\n"
"532 -3.6\n");
It begin(input.begin()), end(input.end());
{
using namespace qi;
rule<It> index = int_ [ validate_index ] ;
rule<It> other = char_(" \t") > float_ > eol;
std::cout << "parse result: " << std::boolalpha
<< qi::parse(begin, end, index % other) << std::endl;
}
std::cout << "min: " << min << "\nmax: " << max << std::endl;
return 0;
}
I guess a much simpler way would be to parse the file using standard stream operations and then check the ordering in a loop. First, the input:
typedef std::pair<int, float> value_pair;
bool greater(const value_pair & left, const value_pair & right) {
return left.first > right.first;
}
std::istream & operator>>(std::istream & stream, value_pair & value) {
stream >> value.first >> value.second;
return stream;
}
The use it like this:
std::ifstream file("your_file.txt");
std::istream_iterator<value_pair> it(file);
std::istream_iterator<value_pair> eof;
if(std::adjacent_find(it, eof, greater) != eof) {
std::cout << "The values are not ordered" << std::endl;
}
I find this a lot simpler.

How do you use a variable stored in a boost spirit closure as input to a boost spirit loop parser?

I would like to use a parsed value as the input to a loop parser.
The grammar defines a header that specifies the (variable) size of the following string. For example, say the following string is the input to some parser.
12\r\nTest Payload
The parser should extract the 12, convert it to an unsigned int and then read twelve characters. I can define a boost spirit grammar that compiles, but an assertion in the boost spirit code fails at runtime.
#include <iostream>
#include <boost/spirit.hpp>
using namespace boost::spirit;
struct my_closure : public closure<my_closure, std::size_t> {
member1 size;
};
struct my_grammar : public grammar<my_grammar> {
template <typename ScannerT>
struct definition {
typedef rule<ScannerT> rule_type;
typedef rule<ScannerT, my_closure::context_t> closure_rule_type;
closure_rule_type header;
rule_type payload;
rule_type top;
definition(const my_grammar &self)
{
using namespace phoenix;
header = uint_p[header.size = arg1];
payload = repeat_p(header.size())[anychar_p][assign_a(self.result)];
top = header >> str_p("\r\n") >> payload;
}
const rule_type &start() const { return top; }
};
my_grammar(std::string &p_) : result(p_) {}
std::string &result;
};
int
main(int argc, char **argv)
{
const std::string content = "12\r\nTest Payload";
std::string payload;
my_grammar g(payload);
if (!parse(content.begin(), content.end(), g).full) {
std::cerr << "there was a parsing error!\n";
return -1;
}
std::cout << "Payload: " << payload << std::endl;
return 0;
}
Is it possible to tell spirit that the closure variable should be evaluated lazily? Is this behaviour supported by boost spirit?
This is much easier with the new qi parser available in Spirit 2. The following code snippet provides a full example that mostly works. An unexpected character is being inserted into the final result.
#include <iostream>
#include <string>
#include <boost/version.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_repeat.hpp>
#include <boost/spirit/include/qi_grammar.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
using boost::spirit::qi::repeat;
using boost::spirit::qi::uint_;
using boost::spirit::ascii::char_;
using boost::spirit::ascii::alpha;
using boost::spirit::qi::_1;
namespace phx = boost::phoenix;
namespace qi = boost::spirit::qi;
template <typename P, typename T>
void test_parser_attr(
char const* input, P const& p, T& attr, bool full_match = true)
{
using boost::spirit::qi::parse;
char const* f(input);
char const* l(f + strlen(f));
if (parse(f, l, p, attr) && (!full_match || (f == l)))
std::cout << "ok" << std::endl;
else
std::cout << "fail" << std::endl;
}
static void
straight_forward()
{
std::string str;
int n;
test_parser_attr("12\r\nTest Payload",
uint_[phx::ref(n) = _1] >> "\r\n" >> repeat(phx::ref(n))[char_],
str);
std::cout << "str.length() == " << str.length() << std::endl;
std::cout << n << "," << str << std::endl; // will print "12,Test Payload"
}
template <typename P, typename T>
void
test_phrase_parser(char const* input, P const& p,
T& attr, bool full_match = true)
{
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::ascii::space;
char const* f(input);
char const* l(f + strlen(f));
if (phrase_parse(f, l, p, space, attr) && (!full_match || (f == l)))
std::cout << "ok" << std::endl;
else
std::cout << "fail" << std::endl;
}
template <typename Iterator>
struct test_grammar
: qi::grammar<Iterator, std::string(), qi::locals<unsigned> > {
test_grammar()
: test_grammar::base_type(my_rule)
{
using boost::spirit::qi::_a;
my_rule %= uint_[_a = _1] >> "\r\n" >> repeat(_a)[char_];
}
qi::rule<Iterator, std::string(), qi::locals<unsigned> > my_rule;
};
static void
with_grammar_local_variable()
{
std::string str;
test_phrase_parser("12\r\nTest Payload", test_grammar<const char*>(), str);
std::cout << str << std::endl; // will print "Test Payload"
}
int
main(int argc, char **argv)
{
std::cout << "boost version: " << BOOST_LIB_VERSION << std::endl;
straight_forward();
with_grammar_local_variable();
return 0;
}
What you are looking for is lazy_p, check the example here: http://www.boost.org/doc/libs/1_35_0/libs/spirit/doc/the_lazy_parser.html