Think about a preprocessor which will read the raw text (no significant white space or tokens).
There are 3 rules.
resolve_para_entry should solve the Argument inside a call. The top-level text is returned as string.
resolve_para should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve is the entry
On the way I track the iterator and get the text portion
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
Rules:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
Indeed this should be a lot simpler.
First off, I fail to see why the absense of a skipper is at all relevant.
Second, exposing the raw input is best done using qi::raw[] instead of dancing with iter_pos and clumsy semantic actions¹.
Among the other observations I see:
negating a charset is done with ~, so e.g. ~char_(",()")
(p|eps) would be better spelled -p
(lit('(') >> lit(')')) could be just "()" (after all, there's no skipper, right)
p >> *(',' >> p) is equivalent to p % ','
With the above, resolve_para simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
resolve_para_entry seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
Here's my take on it:
Define An AST
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
Creating The Grammar Rules
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
And their definitions:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
Notes:
No more semantic actions
No more eps
No more iter_pos
I've opted to make arglist not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
But in our sample it will generate a lot of noisy output.
Of course your entry point (start) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos that you were already using): seek[]
The hold is there for this reason: boost::spirit::qi duplicate parsing on the output - You might not need it in your actual parser.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
More Ideas
That last output shows you a problem with your current grammar: lit(',') should obviously not be seen as a call with two parameters.
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
Boost spirit parse rule is not applied
or this one boost spirit reporting semantic error
BONUS
Bonus version that uses string_view and also shows exact line/column information of all extracted words.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view from an iterator range.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
¹ Boost Spirit: "Semantic actions are evil"?
Related
Below is a very compact version of a grammar I'm trying to write using boost::spirit::qi.
Environment: VS2013, x86, Boost1.64
When #including the header file, the compiler complains about the line
rBlock = "{" >> +(rInvocation) >> "}";
with a very long log (I've only copied the beginning and the end):
more than one partial specialization matches the template argument list
...
...
see reference to function template instantiation
'boost::spirit::qi::rule
&boost::spirit::qi::rule::operator =>(const Expr &)' being compiled
Where is my mistake?
The header file:
//mygrammar.h
#pragma once
#include <boost/spirit/include/qi.hpp>
namespace myNS
{
typedef std::string Identifier;
typedef ::boost::spirit::qi::rule <const char*, Identifier()> myIdentifierRule;
typedef ::boost::variant<char, int> Expression;
typedef ::boost::spirit::qi::rule <const char*, Expression()> myExpressionRule;
struct IdntifierEqArgument
{
Identifier ident;
Expression arg;
};
typedef ::boost::variant < IdntifierEqArgument, Expression > Argument;
typedef ::boost::spirit::qi::rule <const char*, Argument()> myArgumentRule;
typedef ::std::vector<Argument> ArgumentList;
typedef ::boost::spirit::qi::rule <const char*, myNS::ArgumentList()> myArgumentListRule;
struct Invocation
{
Identifier identifier;
::boost::optional<ArgumentList> args;
};
typedef ::boost::spirit::qi::rule <const char*, Invocation()> myInvocationRule;
typedef ::std::vector<Invocation> Block;
typedef ::boost::spirit::qi::rule <const char*, myNS::Block()> myBlockRule;
}
BOOST_FUSION_ADAPT_STRUCT(
myNS::IdntifierEqArgument,
(auto, ident)
(auto, arg)
);
BOOST_FUSION_ADAPT_STRUCT(
myNS::Invocation,
(auto, identifier)
(auto, args)
);
namespace myNS
{
struct myRules
{
myIdentifierRule rIdentifier;
myExpressionRule rExpression;
myArgumentRule rArgument;
myArgumentListRule rArgumentList;
myInvocationRule rInvocation;
myBlockRule rBlock;
myRules()
{
using namespace ::boost::spirit;
using namespace ::boost::spirit::qi;
rIdentifier = as_string[((qi::alpha | '_') >> *(qi::alnum | '_'))];
rExpression = char_ | int_;
rArgument = (rIdentifier >> "=" >> rExpression) | rExpression;
rArgumentList = rArgument >> *("," >> rArgument);
rInvocation = rIdentifier >> "(" >> -rArgumentList >> ")";
rBlock = "{" >> +(rInvocation) >> "}";
}
};
}
I'm not exactly sure where the issue is triggered, but it clearly is a symptom of too many ambiguities in the attribute forwarding rules.
Conceptually this could be triggered by your attribute types having similar/compatible layouts. In language theory, you're looking at a mismatch between C++'s nominative type system versus the approximation of structural typing in the attribute propagation system. But enough theorism :)
I don't think attr_cast<> will save you here as it probably uses the same mechanics and heuristics under the hood.
It drew my attention that making the ArgumentList optional is ... not very useful (as an empty list already accurately reflects absense of arguments).
So I tried simplifying the rules:
rArgumentList = -(rArgument % ',');
rInvocation = rIdentifier >> '(' >> rArgumentList >> ')';
And the declared attribute type can be simply ArgumentList instead of boost::optional::ArgumentList.
This turns out to remove the ambiguity when propagating into the vector<Invocation>, so ... you're saved.
If this feels "accidental" to you, you should! What would I do if this hadn't removed the ambiguity "by chance"? I'd have created a semantic action to propagate the Invocation by simpler mechanics. There's a good chance that fusion::push_back(_val, _1) or similar would have worked.
See also Boost Spirit: "Semantic actions are evil"?
Review And Demo
In the cleaned up review here I present a few fixes/improvements and a test run that dumps the parsed AST.
Separate AST from parser (you don't want use qi in the AST types. You specifically do not want using namespace directives in the face of generic template libraries)
Do not use auto in the adapt macros. That's not a feature. Instead, since you can ostensibly use C++11, use the C++11 (decltype) based macros
BOOST_FUSION_ADAPT_STRUCT(myAST::IdntifierEqArgument, ident,arg);
BOOST_FUSION_ADAPT_STRUCT(myAST::Invocation, identifier,args);
AST is leading (also, prefer c++11 for clarity):
namespace myAST {
using Identifier = std::string;
using Expression = boost::variant<char, int>;
struct IdntifierEqArgument {
Identifier ident;
Expression arg;
};
using Argument = boost::variant<IdntifierEqArgument, Expression>;
using ArgumentList = std::vector<Argument>;
struct Invocation {
Identifier identifier;
ArgumentList args;
};
using Block = std::vector<Invocation>;
}
It's nice to have the definitions separate
Regarding the parser,
I'd prefer the qi::grammar convention. Also,
You didn't declare any of the rules with a skipper. I "guessed" from context that whitespace is insignificant outside of the rules for Expression and Identifier.
Expression ate every char_, so also would eat ')' or even '3'. I noticed this only when testing and after debugging with:
//#define BOOST_SPIRIT_DEBUG
BOOST_SPIRIT_DEBUG_NODES((start)(rBlock)(rInvocation)(rIdentifier)(rArgumentList)(rArgument)(rExpression))
I highly recommend using these facilities
All in all the parser comes down to
namespace myNS {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const*>
struct myRules : qi::grammar<Iterator, myAST::Block()> {
myRules() : myRules::base_type(start) {
rIdentifier = qi::raw [(qi::alpha | '_') >> *(qi::alnum | '_')];
rExpression = qi::alpha | qi::int_;
rArgument = (rIdentifier >> '=' >> rExpression) | rExpression;
rArgumentList = -(rArgument % ',');
rInvocation = rIdentifier >> '(' >> rArgumentList >> ')';
rBlock = '{' >> +rInvocation >> '}';
start = qi::skip(qi::space) [ rBlock ];
BOOST_SPIRIT_DEBUG_NODES((start)(rBlock)(rInvocation)(rIdentifier)(rArgumentList)(rArgument)(rExpression))
}
private:
qi::rule<Iterator, myAST::Block()> start;
using Skipper = qi::space_type;
qi::rule<Iterator, myAST::Argument(), Skipper> rArgument;
qi::rule<Iterator, myAST::ArgumentList(), Skipper> rArgumentList;
qi::rule<Iterator, myAST::Invocation(), Skipper> rInvocation;
qi::rule<Iterator, myAST::Block(), Skipper> rBlock;
// implicit lexemes
qi::rule<Iterator, myAST::Identifier()> rIdentifier;
qi::rule<Iterator, myAST::Expression()> rExpression;
};
}
Adding a test driver
int main() {
std::string const input = R"(
{
foo()
bar(a, b, 42)
qux(someThing_awful01 = 9)
}
)";
auto f = input.data(), l = f + input.size();
myAST::Block block;
bool ok = parse(f, l, myNS::myRules<>{}, block);
if (ok) {
std::cout << "Parse success\n";
for (auto& invocation : block) {
std::cout << invocation.identifier << "(";
for (auto& arg : invocation.args) std::cout << arg << ",";
std::cout << ")\n";
}
}
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Complete Demo
See it Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
namespace myAST {
using Identifier = std::string;
using Expression = boost::variant<char, int>;
struct IdntifierEqArgument {
Identifier ident;
Expression arg;
};
using Argument = boost::variant<IdntifierEqArgument, Expression>;
using ArgumentList = std::vector<Argument>;
struct Invocation {
Identifier identifier;
ArgumentList args;
};
using Block = std::vector<Invocation>;
// for debug printing
static inline std::ostream& operator<<(std::ostream& os, myAST::IdntifierEqArgument const& named) {
return os << named.ident << "=" << named.arg;
}
}
BOOST_FUSION_ADAPT_STRUCT(myAST::IdntifierEqArgument, ident,arg);
BOOST_FUSION_ADAPT_STRUCT(myAST::Invocation, identifier,args);
namespace myNS {
namespace qi = boost::spirit::qi;
template <typename Iterator = char const*>
struct myRules : qi::grammar<Iterator, myAST::Block()> {
myRules() : myRules::base_type(start) {
rIdentifier = qi::raw [(qi::alpha | '_') >> *(qi::alnum | '_')];
rExpression = qi::alpha | qi::int_;
rArgument = (rIdentifier >> '=' >> rExpression) | rExpression;
rArgumentList = -(rArgument % ',');
rInvocation = rIdentifier >> '(' >> rArgumentList >> ')';
rBlock = '{' >> +rInvocation >> '}';
start = qi::skip(qi::space) [ rBlock ];
BOOST_SPIRIT_DEBUG_NODES((start)(rBlock)(rInvocation)(rIdentifier)(rArgumentList)(rArgument)(rExpression))
}
private:
qi::rule<Iterator, myAST::Block()> start;
using Skipper = qi::space_type;
qi::rule<Iterator, myAST::Argument(), Skipper> rArgument;
qi::rule<Iterator, myAST::ArgumentList(), Skipper> rArgumentList;
qi::rule<Iterator, myAST::Invocation(), Skipper> rInvocation;
qi::rule<Iterator, myAST::Block(), Skipper> rBlock;
// implicit lexemes
qi::rule<Iterator, myAST::Identifier()> rIdentifier;
qi::rule<Iterator, myAST::Expression()> rExpression;
};
}
int main() {
std::string const input = R"(
{
foo()
bar(a, b, 42)
qux(someThing_awful01 = 9)
}
)";
auto f = input.data(), l = f + input.size();
myAST::Block block;
bool ok = parse(f, l, myNS::myRules<>{}, block);
if (ok) {
std::cout << "Parse success\n";
for (auto& invocation : block) {
std::cout << invocation.identifier << "(";
for (auto& arg : invocation.args) std::cout << arg << ",";
std::cout << ")\n";
}
}
else
std::cout << "Parse failed\n";
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints output
Parse success
foo()
bar(a,b,42,)
qux(someThing_awful01=9,)
Remaining unparsed input: '
'
I want to find out whether I can detect function call using regex. The basic case is easy: somefunction(1, 2);
But what if I had code:
somefunction(someotherfunction(), someotherotherfunction());
or
somefunction(function () { return 1; }, function() {return 2;});
or
caller_function(somefunction(function () { return 1; }, function() {return 2;}))
In this case I need to match same number of opening braces and closing braces so that I can find end of call to somefunction
Is it possible?
Thanks in advance.
Your question is misleading. It's not as simple as you think.
First, the grammar isn't regular. Regular expressions are not the right tool.
Second, you ask "detecting function call" but the samples show anonymous function definitions, a totally different ball game.
Here's a start using Boost Spirit:
start = skip(space) [ fcall ];
fcall = ident >> '(' >> -args >> ')';
args = arg % ',';
arg = fdef | fcall;
fdef = lexeme["function"] >> '(' >> -formals >> ')' >> body;
formals = ident % ',';
identch = alpha | char_("_");
ident = identch >> *(identch|digit);
body = '{' >> *~char_('}') >> '}';
Which would map onto an AST like:
struct function_definition {
std::vector<std::string> formal_arguments;
std::string body;
};
struct function_call;
using argument = boost::variant<
function_definition,
boost::recursive_wrapper<function_call>
>;
struct function_call {
std::string name;
std::vector<argument> args;
};
DEMO
Live On Coliru
// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/fusion/adapted/struct.hpp>
struct function_definition {
std::vector<std::string> formal_arguments;
std::string body;
};
struct function_call;
using argument = boost::variant<
function_definition,
boost::recursive_wrapper<function_call>
>;
struct function_call {
std::string name;
std::vector<argument> args;
};
BOOST_FUSION_ADAPT_STRUCT(function_call, name, args)
BOOST_FUSION_ADAPT_STRUCT(function_definition, formal_arguments, body)
namespace qi = boost::spirit::qi;
template <typename It>
struct Parser : qi::grammar<It, function_call()> {
Parser() : Parser::base_type(start) {
using namespace qi;
start = skip(space) [ fcall ];
fcall = ident >> '(' >> -args >> ')';
args = arg % ',';
arg = fdef | fcall;
fdef = lexeme["function"] >> '(' >> -formals >> ')' >> body;
formals = ident % ',';
identch = alpha | char_("_");
ident = identch >> *(identch|digit);
body = '{' >> *~char_('}') >> '}';
BOOST_SPIRIT_DEBUG_NODES((start)(fcall)(args)(arg)(fdef)(formals)(ident)(body))
}
private:
using Skipper = qi::space_type;
qi::rule<It, function_call()> start;
qi::rule<It, function_call(), Skipper> fcall;
qi::rule<It, argument(), Skipper> arg;
qi::rule<It, std::vector<argument>(), Skipper> args;
qi::rule<It, function_definition(), Skipper> fdef;
qi::rule<It, std::vector<std::string>(), Skipper> formals;
qi::rule<It, char()> identch;
qi::rule<It, std::string()> ident, body;
};
// for debug:
#include <experimental/iterator>
static inline std::ostream& operator<<(std::ostream& os, function_definition const& v) {
os << "function(";
std::copy(v.formal_arguments.begin(), v.formal_arguments.end(), std::experimental::make_ostream_joiner(os, ", "));
return os << ") {" << v.body << "}";
}
static inline std::ostream& operator<<(std::ostream& os, function_call const& v) {
os << v.name << "(";
std::copy(v.args.begin(), v.args.end(), std::experimental::make_ostream_joiner(os, ", "));
return os << ")";
}
int main() {
std::string const input = "caller_function(somefunction(function () { return 1; }, function() {return 2;}))";
using It = std::string::const_iterator;
Parser<It> const p;
It f = input.begin(), l = input.end();
function_call parsed;
bool ok = parse(f, l, p, parsed);
if (ok) {
std::cout << "Parsed ok: " << parsed << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints
Parsed ok: caller_function(somefunction(function() { return 1; }, function() {return 2;}))
I need a comma delimited output from a struct with optionals. For example, if I have this struct:
MyStruct
{
boost::optional<std::string> one;
boost::optional<int> two;
boost::optional<float> three;
};
An output like: { "string", 1, 3.0 } or { "string" } or { 1, 3.0 } and so on.
Now, I have code like this:
struct MyStruct
{
boost::optional<std::string> one;
boost::optional<int> two;
boost::optional<float> three;
};
BOOST_FUSION_ADAPT_STRUCT
(MyStruct,
one,
two,
three)
template<typename Iterator>
struct MyKarmaGrammar : boost::spirit::karma::grammar<Iterator, MyStruct()>
{
MyKarmaGrammar() : MyKarmaGrammar::base_type(request_)
{
using namespace std::literals::string_literals;
namespace karma = boost::spirit::karma;
using karma::int_;
using karma::double_;
using karma::string;
using karma::lit;
using karma::_r1;
key_ = '"' << string(_r1) << '"';
str_prop_ = key_(_r1) << ':'
<< string
;
int_prop_ = key_(_r1) << ':'
<< int_
;
dbl_prop_ = key_(_r1) << ':'
<< double_
;
//REQUEST
request_ = '{'
<< -str_prop_("one"s) <<
-int_prop_("two"s) <<
-dbl_prop_("three"s)
<< '}'
;
}
private:
//GENERAL RULES
boost::spirit::karma::rule<Iterator, void(std::string)> key_;
boost::spirit::karma::rule<Iterator, double(std::string)> dbl_prop_;
boost::spirit::karma::rule<Iterator, int(std::string)> int_prop_;
boost::spirit::karma::rule<Iterator, std::string(std::string)> str_prop_;
//REQUEST
boost::spirit::karma::rule<Iterator, MyStruct()> request_;
};
int main()
{
using namespace std::literals::string_literals;
MyStruct request = {std::string("one"), 2, 3.1};
std::string generated;
std::back_insert_iterator<std::string> sink(generated);
MyKarmaGrammar<std::back_insert_iterator<std::string>> serializer;
boost::spirit::karma::generate(sink, serializer, request);
std::cout << generated << std::endl;
}
This works but I need a comma delimited output. I tried with a grammar like:
request_ = '{'
<< (str_prop_("one"s) |
int_prop_("two"s) |
dbl_prop_("three"s)) % ','
<< '}'
;
But I receive this compile error:
/usr/include/boost/spirit/home/support/container.hpp:194:52: error: no type named ‘const_iterator’ in ‘struct MyStruct’
typedef typename Container::const_iterator type;
thanks!
Your struct is not a container, therefore list-operator% will not work. The documentation states it expects the attribute to be a container type.
So, just like in the Qi counterpart I showed you to create a conditional delim production:
delim = (&qi::lit('}')) | ',';
You'd need something similar here. However, everything about it is reversed. Instead of "detecting" the end of the input sequence from the presence of a {, we need to track the absense of preceding field from "not having output a field since opening brace yet".
That's a bit trickier since the required state cannot come from the same source as the input. We'll use a parser-member for simplicity here¹:
private:
bool _is_first_field;
Now, when we generate the opening brace, we want to initialize that to true:
auto _f = px::ref(_is_first_field); // short-hand
request_ %= lit('{') [ _f = true ]
Note: Use of %= instead of = tells Spirit that we want automatic attribute propagation to happen, in spite of the presence of a Semantic Action ([ _f = true ]).
Now, we need to generate the delimiter:
delim = eps(_f) | ", ";
Simple. Usage is also simple, except we'll want to conditionally reset the _f:
auto reset = boost::proto::deep_copy(eps [ _f = false ]);
str_prop_ %= (delim << key_(_r1) << string << reset) | "";
int_prop_ %= (delim << key_(_r1) << int_ << reset) | "";
dbl_prop_ %= (delim << key_(_r1) << double_ << reset) | "";
A very subtle point here is that I changed to the declared rule attribute types from T to optional<T>. This allows Karma to do the magic to fail the value generator if it's empty (boost::none), and skipping the reset!
ka::rule<Iterator, boost::optional<double>(std::string)> dbl_prop_;
ka::rule<Iterator, boost::optional<int>(std::string)> int_prop_;
ka::rule<Iterator, boost::optional<std::string>(std::string)> str_prop_;
Now, let's put together some testcases:
Test Cases
Live On Coliru
#include "iostream"
#include <boost/optional/optional_io.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
struct MyStruct {
boost::optional<std::string> one;
boost::optional<int> two;
boost::optional<double> three;
};
BOOST_FUSION_ADAPT_STRUCT(MyStruct, one, two, three)
namespace ka = boost::spirit::karma;
namespace px = boost::phoenix;
template<typename Iterator>
struct MyKarmaGrammar : ka::grammar<Iterator, MyStruct()> {
MyKarmaGrammar() : MyKarmaGrammar::base_type(request_) {
using namespace std::literals::string_literals;
using ka::int_;
using ka::double_;
using ka::string;
using ka::lit;
using ka::eps;
using ka::_r1;
auto _f = px::ref(_is_first_field);
auto reset = boost::proto::deep_copy(eps [ _f = false ]);
key_ = '"' << string(_r1) << "\":";
delim = eps(_f) | ", ";
str_prop_ %= (delim << key_(_r1) << string << reset) | "";
int_prop_ %= (delim << key_(_r1) << int_ << reset) | "";
dbl_prop_ %= (delim << key_(_r1) << double_ << reset) | "";
//REQUEST
request_ %= lit('{') [ _f = true ]
<< str_prop_("one"s) <<
int_prop_("two"s) <<
dbl_prop_("three"s)
<< '}';
}
private:
bool _is_first_field = true;
//GENERAL RULES
ka::rule<Iterator, void(std::string)> key_;
ka::rule<Iterator, boost::optional<double>(std::string)> dbl_prop_;
ka::rule<Iterator, boost::optional<int>(std::string)> int_prop_;
ka::rule<Iterator, boost::optional<std::string>(std::string)> str_prop_;
ka::rule<Iterator> delim;
//REQUEST
ka::rule<Iterator, MyStruct()> request_;
};
template <typename T> std::array<boost::optional<T>, 2> option(T const& v) {
return { { v, boost::none } };
}
int main() {
using namespace std::literals::string_literals;
for (auto a : option("one"s))
for (auto b : option(2))
for (auto c : option(3.1))
for (auto request : { MyStruct { a, b, c } }) {
std::string generated;
std::back_insert_iterator<std::string> sink(generated);
MyKarmaGrammar<std::back_insert_iterator<std::string>> serializer;
ka::generate(sink, serializer, request);
std::cout << boost::fusion::as_vector(request) << ":\t" << generated << "\n";
}
}
Printing:
( one 2 3.1): {"one":one, "two":2, "three":3.1}
( one 2 --): {"one":one, "two":2}
( one -- 3.1): {"one":one, "three":3.1}
( one -- --): {"one":one}
(-- 2 3.1): {"two":2, "three":3.1}
(-- 2 --): {"two":2}
(-- -- 3.1): {"three":3.1}
(-- -- --): {}
¹ Note this limits re-entrant use of the parser, as well as making it non-const etc. karma::locals are the true answer to that, adding a little more complexity
I won to parse structure like "text { < > }". Spirit documentation contents similar AST example.
For parsing string like this
<tag1>text1<tag2>text2</tag1></tag2>
this code work:
templ = (tree | text) [_val = _1];
start_tag = '<'
>> !lit('/')
>> lexeme[+(char_- '>') [_val += _1]]
>>'>';
end_tag = "</"
>> string(_r1)
>> '>';
tree = start_tag [at_c<1>(_val) = _1]
>> *templ [push_back(at_c<0>(_val), _1) ]
>> end_tag(at_c<1>(_val) )
;
For parsing string like this
<tag<tag>some_text>
This code not work:
templ = (tree | text) [_val = _1];
tree = '<'
>> *templ [push_back(at_c<0>(_val), _1) ]
>> '>'
;
templ is parsing structure with recursive_wrapper inside:
namespace client {
struct tmp;
typedef boost::variant <
boost::recursive_wrapper<tmp>,
std::string
> tmp_node;
struct tmp {
std::vector<tmp_node> content;
std::string text;
};
}
BOOST_FUSION_ADAPT_STRUCT(
tmp_view::tmp,
(std::vector<tmp_view::tmp_node>, content)
(std::string,text)
)
Who may explain why it happened? Maybe who knows similar parsers wrote on boost::spirit?
Just guessing you didn't actually want to parse XML at all, but rather some kind of mixed-content markup language for hierarchical text, I'd do
simple = +~qi::char_("><");
nested = '<' >> *soup >> '>';
soup = nested|simple;
With the AST/rules defined as
typedef boost::make_recursive_variant<
boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;
qi::rule<It, std::string()> simple;
qi::rule<It, std::vector<tag_soup>()> nested;
qi::rule<It, tag_soup()> soup;
See it Live On Coliru:
//// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <iostream>
#include <fstream>
namespace client
{
typedef boost::make_recursive_variant<
boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;
namespace qi = boost::spirit::qi;
template <typename It>
struct parser : qi::grammar<It, tag_soup()>
{
parser() : parser::base_type(soup)
{
simple = +~qi::char_("><");
nested = '<' >> *soup >> '>';
soup = nested|simple;
BOOST_SPIRIT_DEBUG_NODES((simple)(nested)(soup))
}
private:
qi::rule<It, std::string()> simple;
qi::rule<It, std::vector<tag_soup>()> nested;
qi::rule<It, tag_soup()> soup;
};
}
namespace boost { // leverage ADL on variant<>
static std::ostream& operator<<(std::ostream& os, std::vector<client::tag_soup> const& soup)
{
os << "<";
std::copy(soup.begin(), soup.end(), std::ostream_iterator<client::tag_soup>(os));
return os << ">";
}
}
int main(int argc, char **argv)
{
if (argc < 2) {
std::cerr << "Error: No input file provided.\n";
return 1;
}
std::ifstream in(argv[1]);
std::string const storage(std::istreambuf_iterator<char>(in), {}); // We will read the contents here.
if (!(in || in.eof())) {
std::cerr << "Error: Could not read from input file\n";
return 1;
}
static const client::parser<std::string::const_iterator> p;
client::tag_soup ast; // Our tree
bool ok = parse(storage.begin(), storage.end(), p, ast);
if (ok) std::cout << "Parsing succeeded\nData: " << ast << "\n";
else std::cout << "Parsing failed\n";
return ok? 0 : 1;
}
If you define BOOST_SPIRIT_DEBUG you'll get verbose output of the parsing process.
For the input
<some text with nested <tags <etc...> >more text>
prints
Parsing succeeded
Data: <some text with nested <tags <etc...> >more text>
Note that the output is printed from the variant, not the original text.
I'm writing a DSL and using a Boost Spirit lexer to tokenize my input. In my grammar, I want a rule similar to this (where tok is the lexer):
header_block =
tok.name >> ':' >> tok.stringval > ';' >>
tok.description >> ':' >> tok.stringval > ';'
;
Rather than specifying reserved words for the language (e.g. "name", "description") and deal with synchronizing these between the lexer and grammar, I want to just tokenize everything that matches [a-zA-Z_]\w* as a single token type (e.g. tok.symbol), and let the grammar sort it out. If I weren't using a lexer, I might do something like this:
stringval = lexeme['"' >> *(char_ - '"') >> '"'];
header_block =
lit("name") >> ':' >> stringval > ';' >>
lit("description") >> ':' >> stringval > ';'
;
With a lexer in the mix, I can compile the following rule, but of course it matches more than I want — it doesn't care about the particular symbol values "name" and "description":
header_block =
tok.symbol >> ':' >> tok.stringval > ';' >>
tok.symbol >> ':' >> tok.stringval > ';'
;
What I'm looking for is something like this:
header_block =
specific_symbol_matcher("name") >> ':' >> tok.stringval > ';' >>
specific_symbol_matcher("description") >> ':' >> tok.stringval > ';'
;
Does Qi provide anything I can use instead of my specific_symbol_matcher hand-waving, there? I'd rather not write my own matcher if I can get close using stuff that's provided. If I must write my own matcher, can anyone suggest how to do that?
If the token exposes a std::string, you should just be able to do:
statement =
( tok.keyword [ qi::_pass = (_1 == "if") ] >> if_stmt )
| ( tok.keyword [ qi::_pass = (_1 == "while) ] >> while_stmt );
If I understood you right, this is, more or less, what you were asking.
While you are at it, do look at qi::symbol<> and an especially nifty application of that, known as the Nabialek Trick.
Bonus material
In case you're just struggling to make an existing grammar work with a lexer, here's what I just did with the calc_utree_ast.cpp example to make it work with a lexer.
It shows
how you can directly consume the exposed attributes
how you can still parse based on char-literals, as long as these char literals are registered as (anonymous) tokens
how the (simple) expression gammar was minimally changed
how the skipping behaviour was moved into the lexer
///////////////////////////////////////////////////////////////////////////////
//
// Plain calculator example demonstrating the grammar. The parser is a
// syntax checker only and does not do any semantic evaluation.
//
// [ JDG May 10, 2002 ] spirit1
// [ JDG March 4, 2007 ] spirit2
// [ HK November 30, 2010 ] spirit2/utree
// [ SH July 17, 2012 ] use a lexer
//
///////////////////////////////////////////////////////////////////////////////
#define BOOST_SPIRIT_DEBUG
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/support_utree.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_function.hpp>
#include <iostream>
#include <string>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace spirit = boost::spirit;
namespace phx = boost::phoenix;
// base iterator type
typedef std::string::const_iterator BaseIteratorT;
// token type
typedef lex::lexertl::token<BaseIteratorT, boost::mpl::vector<char, uint32_t> > TokenT;
// lexer type
typedef lex::lexertl::actor_lexer<TokenT> LexerT;
template <typename LexerT_>
struct Tokens: public lex::lexer<LexerT_> {
Tokens() {
// literals
uint_ = "[0-9]+";
space = " \t\r\n";
// literal rules
this->self += uint_;
this->self += '+';
this->self += '-';
this->self += '*';
this->self += '/';
this->self += '(';
this->self += ')';
using lex::_pass;
using lex::pass_flags;
this->self += space [ _pass = pass_flags::pass_ignore ];
}
lex::token_def<uint32_t> uint_;
lex::token_def<lex::omit> space;
};
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace spirit = boost::spirit;
struct expr
{
template <typename T1, typename T2 = void>
struct result { typedef void type; };
expr(char op) : op(op) {}
void operator()(spirit::utree& expr, spirit::utree const& rhs) const
{
spirit::utree lhs;
lhs.swap(expr);
expr.push_back(spirit::utf8_symbol_range_type(&op, &op+1));
expr.push_back(lhs);
expr.push_back(rhs);
}
char const op;
};
boost::phoenix::function<expr> const plus = expr('+');
boost::phoenix::function<expr> const minus = expr('-');
boost::phoenix::function<expr> const times = expr('*');
boost::phoenix::function<expr> const divide = expr('/');
struct negate_expr
{
template <typename T1, typename T2 = void>
struct result { typedef void type; };
void operator()(spirit::utree& expr, spirit::utree const& rhs) const
{
char const op = '-';
expr.clear();
expr.push_back(spirit::utf8_symbol_range_type(&op, &op+1));
expr.push_back(rhs);
}
};
boost::phoenix::function<negate_expr> neg;
///////////////////////////////////////////////////////////////////////////////
// Our calculator grammar
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator>
struct calculator : qi::grammar<Iterator, spirit::utree()>
{
template <typename Tokens>
calculator(Tokens const& toks) : calculator::base_type(expression)
{
using qi::_val;
using qi::_1;
expression =
term [_val = _1]
>> *( ('+' >> term [plus(_val, _1)])
| ('-' >> term [minus(_val, _1)])
)
;
term =
factor [_val = _1]
>> *( ('*' >> factor [times(_val, _1)])
| ('/' >> factor [divide(_val, _1)])
)
;
factor =
toks.uint_ [_val = _1]
| '(' >> expression [_val = _1] >> ')'
| ('-' >> factor [neg(_val, _1)])
| ('+' >> factor [_val = _1])
;
BOOST_SPIRIT_DEBUG_NODE(expression);
BOOST_SPIRIT_DEBUG_NODE(term);
BOOST_SPIRIT_DEBUG_NODE(factor);
}
qi::rule<Iterator, spirit::utree()> expression, term, factor;
};
}
///////////////////////////////////////////////////////////////////////////////
// Main program
///////////////////////////////////////////////////////////////////////////////
int main()
{
std::cout << "/////////////////////////////////////////////////////////\n\n";
std::cout << "Expression parser...\n\n";
std::cout << "/////////////////////////////////////////////////////////\n\n";
std::cout << "Type an expression...or [q or Q] to quit\n\n";
using boost::spirit::utree;
typedef std::string::const_iterator iterator_type;
typedef Tokens<LexerT>::iterator_type IteratorT;
typedef client::calculator<IteratorT> calculator;
Tokens<LexerT> l;
calculator calc(l); // Our grammar
std::string str;
while (std::getline(std::cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q')
break;
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
utree ut;
bool r = lex::tokenize_and_parse(iter, end, l, calc, ut);
if (r && iter == end)
{
std::cout << "-------------------------\n";
std::cout << "Parsing succeeded: " << ut << "\n";
std::cout << "-------------------------\n";
}
else
{
std::string rest(iter, end);
std::cout << "-------------------------\n";
std::cout << "Parsing failed\n";
std::cout << "stopped at: \"" << rest << "\"\n";
std::cout << "-------------------------\n";
}
}
std::cout << "Bye... :-) \n\n";
return 0;
}
For the input
8*12312*(4+5)
It prints (without debug info)
Parsing succeeded: ( * ( * 8 12312 ) ( + 4 5 ) )