modify regex to include comma - c++

I have the following string:
arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21')
The regex to extract the value looks like:
boost::regex re_arg_values("('[^']*(?:''[^']*)*'[^)]*)");
The above regex properly extracts the values. BUT when I include a comma , the code fails. For eg:
arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**')
How shall I modify this regex to include the comma?
FYI. The value can contain spaces, special characters, and also tabs. The code is in CPP.
Thanks in advance.

I'd not use a regex here.
The goal MUST be to parse values, and no doubt they will have useful values, that you need interpreted.
I'd devise a datastructure like:
#include <map>
namespace Config {
using Key = std::string;
using Value = boost::variant<int, std::string, bool>;
using Setting = std::pair<Key, Value>;
using Settings = std::map<Key, Value>;
}
For this you can write 1:1 a parser using Boost Spirit:
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
namespace Parser {
using It = std::string::const_iterator;
using namespace Config;
namespace qi = boost::spirit::qi;
using Skip = qi::blank_type;
qi::rule<It, std::string()> quoted_ = "'" >> *(
"'" >> qi::char_("'") // double ''
| '\\' >> qi::char_ // any character escaped
| ~qi::char_("'") // non-quotes
) >> "'";
qi::rule<It, Key()> key_ = +qi::char_("a-zA-Z0-9_"); // for example
qi::rule<It, Value()> value_ = qi::int_ | quoted_ | qi::bool_;
qi::rule<It, Setting(), Skip> setting_ = key_ >> '(' >> value_ >> ')';
qi::rule<It, Settings()> settings_ = qi::skip(qi::blank) [*setting_];
}
Note how this
interprets non-string values correctly
specifies what keys look like and parses them too
interprets string escapes, so the Value in the map contains the "real" string, after un-escaping
ignores whitespace outside values (use space_type if you want to ignore newlines as whitespace as well)
You can use it like:
int main() {
std::string const input = R"( arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'))";
Config::Settings map;
if (parse(input.begin(), input.end(), Parser::settings_, map)) {
for(auto& entry : map)
std::cout << "config setting {" << entry.first << ", " << entry.second << "}\n";
}
}
Which prints
config setting {arg1, value1}
config setting {arg2, value ')2}
config setting {arg3, user'~!##$%^&*_~!##$%^&"*_-=+[{]}|;:<.>?21**,**}
Live Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <boost/fusion/adapted/std_pair.hpp>
namespace Config {
using Key = std::string;
using Value = boost::variant<int, std::string, bool>;
using Setting = std::pair<Key, Value>;
using Settings = std::map<Key, Value>;
}
namespace Parser {
using It = std::string::const_iterator;
using namespace Config;
namespace qi = boost::spirit::qi;
using Skip = qi::blank_type;
qi::rule<It, std::string()> quoted_ = "'" >> *(
"'" >> qi::char_("'") // double ''
| '\\' >> qi::char_ // any character escaped
| ~qi::char_("'") // non-quotes
) >> "'";
qi::rule<It, Key()> key_ = +qi::char_("a-zA-Z0-9_"); // for example
qi::rule<It, Value()> value_ = qi::int_ | quoted_ | qi::bool_;
qi::rule<It, Setting(), Skip> setting_ = key_ >> '(' >> value_ >> ')';
qi::rule<It, Settings()> settings_ = qi::skip(qi::blank) [*setting_];
}
int main() {
std::string const input = R"( arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'))";
Config::Settings map;
if (parse(input.begin(), input.end(), Parser::settings_, map)) {
for(auto& entry : map)
std::cout << "config setting {" << entry.first << ", " << entry.second << "}\n";
}
}
BONUS
For comparison, here's the "same" but using regex:
Live On Coliru
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
#include <map>
namespace Config {
using Key = std::string;
using RawValue = std::string;
using Settings = std::map<Key, RawValue>;
Settings parse(std::string const& input) {
Settings settings;
boost::regex re(R"((\w+)\(('.*?')\))");
auto f = boost::make_regex_iterator(input, re);
for (auto& match : boost::make_iterator_range(f, {}))
settings.emplace(match[1].str(), match[2].str());
return settings;
}
}
int main() {
std::string const input = R"( arg1('value1') arg2('value '')2') arg3('user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'))";
Config::Settings map = Config::parse(input);
for(auto& entry : map)
std::cout << "config setting {" << entry.first << ", " << entry.second << "}\n";
}
Prints
config setting {arg1, 'value1'}
config setting {arg2, 'value ''}
config setting {arg3, 'user\'~!##$%^&*_~!##$%^&"*_-=+[{]}\|;:<.>?21**,**'}
Notes:
it no longer interprets and converts any values
it no longer processes escapes
it requires an additional runtime library dependency on boost_regex

Related

boost spirit: copy the result in a vector of strings

I want to parse a function (with an arbitrary name and an arbitrary numbers af arguments) in this form:
function(bye, 1, 3, 4, foo)
The arguments could be generic strings comma separated.
And I want to copy the name of the function and the arguments in a vector of strings.
like this
std::vector<std::string> F;
std::string fun = "function(bye, 1, 3, 4, foo)";
// The parser must produce this vector from the example
F[0] == "function"
F[1] == "1"
F[2] == "3"
F[3] == "4"
F[4] == "foo"
I've written the following code by after reading some tutorial but it does not work (In the sense that it not compile).
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <string>
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator>
struct command_parser : qi::grammar<Iterator, std::vector<std::string>(), ascii::space_type>
{
command_parser() : command_parser::base_type(start)
{
using qi::int_;
using qi::lit;
using qi::double_;
using qi::lexeme;
using ascii::char_;
fn_name = +qi::char_("a-zA-Z");
string = +qi::char_("a-zA-Z_0-9");
rec = *( lit(",") >> string );
start %= fn_name >> lit("(") >> string >> rec >> lit(")") ;
}
qi::rule<Iterator, std::string(), ascii::space_type> fn_name;
qi::rule<Iterator, std::string(), ascii::space_type> string;
qi::rule<Iterator, std::string(), ascii::space_type> rec;
qi::rule<Iterator, std::vector<std::string>, ascii::space_type> start;
};
}
////////////////////////////////////////////////////////////////////////////
// Main program
////////////////////////////////////////////////////////////////////////////
int
main()
{
namespace qi = boost::spirit::qi;
std::cout << "/////////////////////////////////////////////////////////\n\n";
client::command_parser<std::string::iterator> CP;
std::string cmd("fun(1,2,3,4 , 5, foo) ");
std::vector<std::string> VV;
bool result = qi::parse(cmd.begin(), cmd.end(), CP, VV);
if (result) {
for ( auto sss : VV ){
std::cout << sss << std::endl;
}
} else {
std::cout << "Fail" << std::endl;
}
return 0 ;
}
Just for fun, here's my minimalist take on this grammar:
using CallList = std::vector<std::string>;
struct ParseError : std::runtime_error {
ParseError() : std::runtime_error("ParseError") {}
};
// The parse implementation
CallList parse_function_call(std::string const& fun) {
CallList elements;
using namespace boost::spirit::qi;
using It = decltype(begin(fun));
static const rule<It, std::string()> identifier = alpha >> +(alnum | char_('_'));
if (!phrase_parse(begin(fun), end(fun),
identifier >> '(' >> -(lexeme[+~char_(",)")] % ",") >> ')' >> eoi,
space, elements))
throw ParseError{};
return elements;
}
With a little bit of plumbing
// just for test output
using TestResult = std::variant<CallList, ParseError>;
// exceptions are equivalent
static constexpr bool operator==(ParseError const&, ParseError const&)
{ return true; }
static inline std::ostream& operator<<(std::ostream& os, TestResult const& tr) {
using namespace std;
if (holds_alternative<ParseError>(tr)) {
return os << "ParseError";
} else {
auto& list = get<CallList>(tr);
copy(begin(list), end(list), std::experimental::make_ostream_joiner(os << "{", ","));
return os << "}";
}
}
TestResult try_parse(std::string const& fun) {
try { return parse_function_call(fun); }
catch(ParseError const& e) { return e; }
}
Here's a test runner:
for (auto const& [input, expected]: {
Case("function(bye, 1, 3, 4, foo)", CallList{"function", "1", "3", "4", "foo"}),
{"liar(pants on fire)", CallList{"liar", "pants on fire"}},
{"liar('pants on fire')", CallList{"liar", "'pants on fire'"}},
{"nullary()", CallList{"nullary"}},
{"nullary( )", CallList{"nullary"}},
{"zerolength(a,,b)", ParseError{}},
{"zerolength(a, ,b)", ParseError{}},
{"noarglust", ParseError{}},
{"", ParseError{}},
{"()", ParseError{}},
{"1(invalidfunctionname)", ParseError{}},
{"foo(bar) BOGUS", ParseError{}},
})
{
auto const actual = try_parse(input);
bool const ok = (actual == expected);
cout << std::quoted(input) << ": " << (ok? "PASS":"FAIL") << "\n";
if (!ok) {
std::cout << " -- expected: " << expected << "\n";
std::cout << " -- actual: " << actual << "\n";
}
}
Which prints Live On Coliru
"function(bye, 1, 3, 4, foo)": FAIL
-- expected: {function,1,3,4,foo}
-- actual: {function,bye,1,3,4,foo}
"liar(pants on fire)": PASS
"liar('pants on fire')": PASS
"nullary()": PASS
"nullary( )": PASS
"zerolength(a,,b)": PASS
"zerolength(a, ,b)": PASS
"noarglust": PASS
"": PASS
"()": PASS
"1(invalidfunctionname)": PASS
"foo(bar) BOGUS": PASS
Note that your example test-case doesn't pass, but I think that was a mistake in the test case.
Full Listing
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <experimental/iterator>
#include <variant>
#include <iomanip>
using CallList = std::vector<std::string>;
struct ParseError : std::runtime_error {
ParseError() : std::runtime_error("ParseError") {}
};
// The parse implementation
CallList parse_function_call(std::string const& fun) {
CallList elements;
using namespace boost::spirit::qi;
using It = decltype(begin(fun));
static const rule<It, std::string()> identifier = alpha >> +(alnum | char_('_'));
if (!phrase_parse(begin(fun), end(fun),
identifier >> '(' >> -(lexeme[+~char_(",)")] % ",") >> ')' >> eoi,
space, elements))
throw ParseError{};
return elements;
}
// just for test output
using TestResult = std::variant<CallList, ParseError>;
// exceptions are equivalent
static constexpr bool operator==(ParseError const&, ParseError const&)
{ return true; }
static inline std::ostream& operator<<(std::ostream& os, TestResult const& tr) {
using namespace std;
if (holds_alternative<ParseError>(tr)) {
return os << "ParseError";
} else {
auto& list = get<CallList>(tr);
copy(begin(list), end(list), std::experimental::make_ostream_joiner(os << "{", ","));
return os << "}";
}
}
TestResult try_parse(std::string const& fun) {
try { return parse_function_call(fun); }
catch(ParseError const& e) { return e; }
}
int main() {
using namespace std;
using Case = pair<std::string, TestResult>;
for (auto const& [input, expected]: {
Case("function(bye, 1, 3, 4, foo)", CallList{"function", "1", "3", "4", "foo"}),
{"liar(pants on fire)", CallList{"liar", "pants on fire"}},
{"liar('pants on fire')", CallList{"liar", "'pants on fire'"}},
{"nullary()", CallList{"nullary"}},
{"nullary( )", CallList{"nullary"}},
{"zerolength(a,,b)", ParseError{}},
{"zerolength(a, ,b)", ParseError{}},
{"noarglust", ParseError{}},
{"", ParseError{}},
{"()", ParseError{}},
{"1(invalidfunctionname)", ParseError{}},
{"foo(bar) BOGUS", ParseError{}},
})
{
auto const actual = try_parse(input);
bool const ok = (actual == expected);
cout << std::quoted(input) << ": " << (ok? "PASS":"FAIL") << "\n";
if (!ok) {
std::cout << " -- expected: " << expected << "\n";
std::cout << " -- actual: " << actual << "\n";
}
}
}
I'm correcting my answer per suggestions made by #sehe. All the credit for these corrections go to him. I am referencing your line numbers below. So the first error is from spirit and it says:
incompatible_start_rule:
// If you see the assertion below failing then the start rule
// passed to the constructor of the grammar is not compatible with
// the grammar (i.e. it uses different template parameters).
The signature of the start parser does not match that of the parser deceleration.
22. struct command_parser : qi::grammar<Iterator, std::vector<std::string>(), ascii::space_type>
43. qi::rule<Iterator, std::vector<std::string>, ascii::space_type> start;
I googled this and could not find an explanation but using an object rather than a type is preferable. I did it the other way in my first answer. The proper fix is at line 43:
43. qi::rule<Iterator, std::vector<std::string>(), ascii::space_type> start;
The next spirit error is:
The rule was instantiated with a skipper type but you have not pass
any. Did you use parse instead of phrase_parse?");
So a phrase_parse is required with a skipper. Note that we need a skipper to pass along.
64. using qi::ascii::space;
65. bool result = qi::phrase_parse(cmd.begin(), cmd.end(), CP, space, VV);
Now it compiles and the output is:
fun
1
2345foo
I see that won't do and you are looking to stuff the vector with each of the passed parameters. So you need a rule that is compatible with your attribute and intention. The kleene operator working with a std::string will put all the data into one string. So use your attribute:
41. qi::rule<Iterator, std::vector<std::string>(), ascii::space_type> rec;``
Now as #sehe points out, the skipper with fn_name and string will just concatenate names with spaces and newlines. So don't use skippers there.
39. qi::rule<Iterator, std::string()> fn_name;
40. qi::rule<Iterator, std::string()> string;
The other error I made was to see the %= and call it a list operator. From here, it is a definition operator. I'm not sure why there are two but playing around, it seems you need to use %= with semantic action. Here is the corrected code:
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <string>
namespace client
{
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator>
struct command_parser : qi::grammar<Iterator, std::vector<std::string>(), ascii::space_type>
{
command_parser() : command_parser::base_type(start)
{
using qi::int_;
using qi::lit;
using qi::double_;
using qi::lexeme;
using ascii::char_;
fn_name = +qi::char_("a-zA-Z");
string = +qi::char_("a-zA-Z_0-9");
rec = *(lit(",") >> string);
start %= fn_name >> lit("(") >> string >> rec >> lit(")");
}
qi::rule<Iterator, std::string()> fn_name;
qi::rule<Iterator, std::string()> string;
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type> rec;
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type> start;
};
}
int main()
{
namespace qi = boost::spirit::qi;
client::command_parser<std::string::iterator> CP;
std::string cmd("function(1,2,3,4 , 5, foo) ");
std::vector<std::string> VV;
bool result = qi::phrase_parse(cmd.begin(), cmd.end(), CP, qi::ascii::space, VV);
if (result) {
for (auto sss : VV) {
std::cout << sss << std::endl;
}
}
else {
std::cout << "Fail" << std::endl;
}
return 0;
}
And here is an example using X3:
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <vector>
//your attribute, could be more complex, might use namespace
using attr = std::vector<std::string>;
namespace parser {
namespace x3 = boost::spirit::x3;
const auto fn_name = +x3::char_("a-zA-Z");
const auto string = +x3::char_("a-zA-Z_0-9");
const auto start = x3::rule<struct _, attr>() = fn_name >> "(" >> string % ',' >> ")";
}
int main()
{
namespace x3 = boost::spirit::x3;
std::string cmd("fun(1,.2,3,4 , 5, foo) ");
attr VV;
auto it = cmd.begin();
bool result = phrase_parse(it, cmd.end(), parser::start, x3::space, VV);
if (result) {
for (auto sss : VV) {
std::cout << "-> " << sss << std::endl;
}
}
else
std::cout << "Fail at" << std::endl;
return 0;
}

boost spirit parsing with no skipper

Think about a preprocessor which will read the raw text (no significant white space or tokens).
There are 3 rules.
resolve_para_entry should solve the Argument inside a call. The top-level text is returned as string.
resolve_para should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve is the entry
On the way I track the iterator and get the text portion
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
Rules:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
Indeed this should be a lot simpler.
First off, I fail to see why the absense of a skipper is at all relevant.
Second, exposing the raw input is best done using qi::raw[] instead of dancing with iter_pos and clumsy semantic actions¹.
Among the other observations I see:
negating a charset is done with ~, so e.g. ~char_(",()")
(p|eps) would be better spelled -p
(lit('(') >> lit(')')) could be just "()" (after all, there's no skipper, right)
p >> *(',' >> p) is equivalent to p % ','
With the above, resolve_para simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
resolve_para_entry seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
Here's my take on it:
Define An AST
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
Creating The Grammar Rules
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
And their definitions:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
Notes:
No more semantic actions
No more eps
No more iter_pos
I've opted to make arglist not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
But in our sample it will generate a lot of noisy output.
Of course your entry point (start) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos that you were already using): seek[]
The hold is there for this reason: boost::spirit::qi duplicate parsing on the output - You might not need it in your actual parser.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
More Ideas
That last output shows you a problem with your current grammar: lit(',') should obviously not be seen as a call with two parameters.
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
Boost spirit parse rule is not applied
or this one boost spirit reporting semantic error
BONUS
Bonus version that uses string_view and also shows exact line/column information of all extracted words.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view from an iterator range.
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
Prints
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
¹ Boost Spirit: "Semantic actions are evil"?

parse std::vector<int> from comma separated integers

I'm trying to implement a very specific grammar, which requires me at a certain point to parse a list of comma separated integers. The qi rule looks like the following:
qi::rule<Iterator, ascii::space_type> ident;
qi::rule<Iterator, ascii::space_type> nlist;
...
ident = char_ >> nlist;
nlist = ("(" >> int_ % "," >> ")");
...
I need to pass the values up to the ident rule (The expression ident has to create a syntax tree node, where the parsed values from nlist are required for the constructor). I thought about creating and filling a std::vector and use the semantic action like _val = vector<int>.... What is now unclear to me is how do I create a vector of arbitrary length from this rule, since I do not make any assumptions on how long the input will be or using a predefined vector like the examples.
Is this even possible or does is there a better way to do it?
This is the bread and butter of Spirit Qi.
Just use any compatible attribute type and profit:
using nlist_t = std::vector<int>;
using ident_t = std::pair<char, nlist_t>;
qi::rule<Iterator, ident_t(), qi::ascii::space_type> ident;
qi::rule<Iterator, nlist_t(), qi::ascii::space_type> nlist;
Note: For std::pair attribute compatibility, include the relevant fusion header:
Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
using nlist_t = std::vector<int>;
using ident_t = std::pair<char, nlist_t>;
using Iterator = std::string::const_iterator;
qi::rule<Iterator, ident_t(), qi::ascii::space_type> ident;
qi::rule<Iterator, nlist_t(), qi::ascii::space_type> nlist;
ident = qi::char_ >> nlist;
nlist = '(' >> qi::int_ % ',' >> ')';
for (std::string const input : { "a (1,2,3)", "+(881,-2,42) \n", "?(0)" }) {
ident_t data;
if (qi::phrase_parse(input.begin(), input.end(), ident, qi::ascii::space, data)) {
std::cout << "Parsed: " << data.first << "(";
for (auto i : data.second) std::cout << i << ",";
std::cout << ")\n";
} else
std::cout << "Parse failed: '" << input << "'\n";
}
}
Prints
Parsed: a(1,2,3,)
Parsed: +(881,-2,42,)
Parsed: ?(0,)
BONUS
Version with imagined Ast type using phoenix::construct:
Also Live On Coliru
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
namespace OoShinyAst {
using MyName = char;
using MyArgument = int;
using MyArgumentList = std::vector<MyArgument>;
struct MyIdent {
MyName name;
MyArgumentList args;
MyIdent() = default;
MyIdent(MyName name, MyArgumentList args)
: name(std::move(name)), args(std::move(args)) { }
};
}
int main()
{
using Iterator = std::string::const_iterator;
qi::rule<Iterator, OoShinyAst::MyIdent(), qi::ascii::space_type> ident;
qi::rule<Iterator, OoShinyAst::MyArgumentList(), qi::ascii::space_type> nlist;
nlist = '(' >> qi::int_ % ',' >> ')';
ident = (qi::char_ >> nlist) [ qi::_val = px::construct<OoShinyAst::MyIdent>(qi::_1, qi::_2) ];
for (std::string const input : { "a (1,2,3)", "+(881,-2,42) \n", "?(0)" }) {
OoShinyAst::MyIdent data;
if (qi::phrase_parse(input.begin(), input.end(), ident, qi::ascii::space, data)) {
std::cout << "Parsed: " << data.name << "(";
for (auto i : data.args) std::cout << i << ",";
std::cout << ")\n";
} else
std::cout << "Parse failed: '" << input << "'\n";
}
}

Treat escaped newline as line continuation

Here is an example of the syntax -- two groups of items:
I_name m_name parameter1=value parameter2=value
I_name m_name parameter1=value \
parameter2=value
My question is how to define the skip-type.
It is not just space_type but space_type minus newline.
But newline followed by backslash is a skip-type.
E.g.
I define name like that:
qi::rule<Iterator, std::string(), ascii::space_type> m_sName;
m_sName %= qi::lexeme[ascii::alpha >> *ascii::alnum];
This is obviously not correct, as the space_type must include newline-backslash.
The following grammar works for me.
*("\\\n" | ~qi::char_('\n')) % '\n'
It will ignore any newline after the backslash. And the following is a simple test.
#include <vector>
#include <string>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#define BOOST_TEST_MODULE example
#include <boost/test/unit_test.hpp>
typedef std::vector<std::string> Lines;
inline auto ParseLines(std::string const& str) {
Lines lines;
namespace qi = boost::spirit::qi;
if (qi::parse(
str.begin(), str.end(),
*("\\\n" | ~qi::char_('\n')) % '\n',
lines)) {
return lines;
}
else {
throw std::invalid_argument("Parse error at ParseLines");
}
}
BOOST_AUTO_TEST_CASE(TestParseLines) {
std::string const str =
"I_name m_name parameter1=value parameter2=value\n"
"I_name m_name parameter1 = value \\\n"
"parameter2 = value";
Lines const expected{
"I_name m_name parameter1=value parameter2=value",
"I_name m_name parameter1 = value parameter2 = value"
};
BOOST_TEST(ParseLines(str) == expected);
}
You should use "-std=c++14 -lboost_unit_test_framework" for compilation. Anyway, it is easy to convert the code for c++03.
qi::blank is exactly that. It's qi::space without newlines.
You can do this too: ("\\\n" | qi::blank)
To be able to declare a rule with such a skipper, define a skipper grammar:
template <typename It>
struct my_skipper : qi::grammar<It> {
my_skipper() : my_skipper::base_type(start) {}
qi::rule<It> start = ("\\\n" | qi::blank);
};
Full Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapted.hpp>
#include <map>
namespace qi = boost::spirit::qi;
namespace ast {
struct record {
std::string iname, mname;
std::map<std::string, std::string> params;
};
using records = std::vector<record>;
}
BOOST_FUSION_ADAPT_STRUCT(ast::record, iname, mname, params)
template <typename It>
struct my_parser : qi::grammar<It, ast::records()> {
using Skipper = qi::rule<It>;
my_parser() : my_parser::base_type(start) {
skipper = ("\\\n" | qi::blank);
name = +qi::graph;
key = +(qi::graph - '=');
param = key >> '=' >> name;
record = name >> name >> *param;
records = *(record >> +qi::eol);
start = qi::skip(qi::copy(skipper)) [ records ];
}
private:
Skipper skipper;
qi::rule<It, ast::records(), Skipper> records;
qi::rule<It, ast::record(), Skipper> record;
qi::rule<It, ast::records()> start;
qi::rule<It, std::pair<std::string, std::string>()> param;
qi::rule<It, std::string()> name, key;
};
int main() {
#if 1
using It = boost::spirit::istream_iterator;
It f(std::cin >> std::noskipws), l;
#else
using It = std::string::const_iterator;
std::string const input = "something here a=1\n";
It f = input.begin(), l = input.end();
#endif
ast::records data;
bool ok = qi::parse(f, l, my_parser<It>(), data);
if (ok) {
std::cout << "Parsed:\n";
for (auto& r : data) {
std::cout << "\t" << r.iname << " " << r.mname;
for (auto& p : r.params)
std::cout << " [" << p.first << ": " << p.second << "]";
std::cout << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
}
Prints (for the input in your question):
Parsed:
I_name m_name [parameter1: value] [parameter2: value]
I_name m_name [parameter1: value] [parameter2: value]

Parsing recursive structure on boost::spirit

I won to parse structure like "text { < > }". Spirit documentation contents similar AST example.
For parsing string like this
<tag1>text1<tag2>text2</tag1></tag2>
this code work:
templ = (tree | text) [_val = _1];
start_tag = '<'
>> !lit('/')
>> lexeme[+(char_- '>') [_val += _1]]
>>'>';
end_tag = "</"
>> string(_r1)
>> '>';
tree = start_tag [at_c<1>(_val) = _1]
>> *templ [push_back(at_c<0>(_val), _1) ]
>> end_tag(at_c<1>(_val) )
;
For parsing string like this
<tag<tag>some_text>
This code not work:
templ = (tree | text) [_val = _1];
tree = '<'
>> *templ [push_back(at_c<0>(_val), _1) ]
>> '>'
;
templ is parsing structure with recursive_wrapper inside:
namespace client {
struct tmp;
typedef boost::variant <
boost::recursive_wrapper<tmp>,
std::string
> tmp_node;
struct tmp {
std::vector<tmp_node> content;
std::string text;
};
}
BOOST_FUSION_ADAPT_STRUCT(
tmp_view::tmp,
(std::vector<tmp_view::tmp_node>, content)
(std::string,text)
)
Who may explain why it happened? Maybe who knows similar parsers wrote on boost::spirit?
Just guessing you didn't actually want to parse XML at all, but rather some kind of mixed-content markup language for hierarchical text, I'd do
simple = +~qi::char_("><");
nested = '<' >> *soup >> '>';
soup = nested|simple;
With the AST/rules defined as
typedef boost::make_recursive_variant<
boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;
qi::rule<It, std::string()> simple;
qi::rule<It, std::vector<tag_soup>()> nested;
qi::rule<It, tag_soup()> soup;
See it Live On Coliru:
//// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <iostream>
#include <fstream>
namespace client
{
typedef boost::make_recursive_variant<
boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;
namespace qi = boost::spirit::qi;
template <typename It>
struct parser : qi::grammar<It, tag_soup()>
{
parser() : parser::base_type(soup)
{
simple = +~qi::char_("><");
nested = '<' >> *soup >> '>';
soup = nested|simple;
BOOST_SPIRIT_DEBUG_NODES((simple)(nested)(soup))
}
private:
qi::rule<It, std::string()> simple;
qi::rule<It, std::vector<tag_soup>()> nested;
qi::rule<It, tag_soup()> soup;
};
}
namespace boost { // leverage ADL on variant<>
static std::ostream& operator<<(std::ostream& os, std::vector<client::tag_soup> const& soup)
{
os << "<";
std::copy(soup.begin(), soup.end(), std::ostream_iterator<client::tag_soup>(os));
return os << ">";
}
}
int main(int argc, char **argv)
{
if (argc < 2) {
std::cerr << "Error: No input file provided.\n";
return 1;
}
std::ifstream in(argv[1]);
std::string const storage(std::istreambuf_iterator<char>(in), {}); // We will read the contents here.
if (!(in || in.eof())) {
std::cerr << "Error: Could not read from input file\n";
return 1;
}
static const client::parser<std::string::const_iterator> p;
client::tag_soup ast; // Our tree
bool ok = parse(storage.begin(), storage.end(), p, ast);
if (ok) std::cout << "Parsing succeeded\nData: " << ast << "\n";
else std::cout << "Parsing failed\n";
return ok? 0 : 1;
}
If you define BOOST_SPIRIT_DEBUG you'll get verbose output of the parsing process.
For the input
<some text with nested <tags <etc...> >more text>
prints
Parsing succeeded
Data: <some text with nested <tags <etc...> >more text>
Note that the output is printed from the variant, not the original text.