Parsing file input with spirit - c++

I played around with boost::spirit recently and wanted to use it to parse file input. What i got is this: defining some semantic actions:
data = ifstream("herpderp", ios::in);
std::string line;
auto pri = [&](auto &ctx){cout << "got this" << endl;};
auto bri = [&](auto &ctx){cout << "got that" << endl;};
and the actual reading happens like this:
while(getline(data, line, '\n'))
{
bool r = phrase_parse(line.begin(), line.end(), (int_ >> char_ >> int_ >> double_)[pri] | (int_ >> char_ >> int_)[bri], space);
}
However the problem is - I have no idea how to access the contents of _attr(ctx) inside the lambdas pri and bri. I know they work as intended, depending on the contents of the file because of the cout prints (they alternate) - they are however compound type as one can tell from the parsing rules. If anyone can shed some light on this, I'd be grateful.
Edit:
Got this to work the way I wanted it to. It required another import
#include <boost/mpl/int.hpp>
And each of the lambdas looks like this:
auto bri = [&](auto &ctx)
{
int firstIntFromMatch = at<boost::mpl::int_<0>>(_attr(ctx));
char charFromMatch = at<boost::mpl::int_<1>>(_attr(ctx));
int secondIntFromMatch = at<boost::mpl::int_<2>>(_attr(ctx));
doSomething(firstIntFromMatch, charFromMatch, secondIntFromMatch);
};
auto pri = [&](auto &ctx)
{
int firstIntFromMatch = at<boost::mpl::int_<0>>(_attr(ctx));
char charFromMatch = at<boost::mpl::int_<1>>(_attr(ctx));
int secondIntFromMatch = at<boost::mpl::int_<2>>(_attr(ctx));
double doubleFromMatch = at<boost::mpl::int_<3>>(_attr(ctx));
doSomething(firstIntFromMatch, charFromMatch, secondIntFromMatch);
doSomethingElse(doubleFromMatch);
};

I'm with #lakeweb, see also http://stackoverflow.com/questions/8259440/boost-spirit-semantic-actions-are-evil
However to answer your specific question: the attributes are fusion sequences. Including fusion/include/io.hpp enables you to just print them:
auto pri = [&](auto &ctx){std::cout << "got this: " << _attr(ctx) << std::endl;};
auto bri = [&](auto &ctx){std::cout << "got that: " << _attr(ctx) << std::endl;};
Prints
Live On Coliru
got this: (321 a 321 3.14)
Parsed
got that: (432 b 432)
Parsed
Doing Useful Stuff
Doing useful stuff is always more exciting. You could manually take apart these fusion sequences. Defining the simplest data struct I can think of to receive our data:
struct MyData {
int a = 0;
char b = 0;
int c = 0;
double d = 0;
friend std::ostream& operator<<(std::ostream& os, MyData const& md) {
return os << "MyData{" << md.a << "," << md.b << "," << md.c << "," << md.d << "}";
}
};
Now, we can "enhance" (read: complicate) stuff to parse into it:
auto pri = [&](auto &ctx) {
auto& attr = _attr(ctx);
std::cout << "got this: " << attr << std::endl;
using boost::fusion::at_c;
_val(ctx) = { at_c<0>(attr), at_c<1>(attr), at_c<2>(attr), at_c<3>(attr) };
};
auto bri = [&](auto &ctx)
{
auto& attr = _attr(ctx);
std::cout << "got that: " << attr << std::endl;
using boost::fusion::at_c;
_val(ctx) = { at_c<0>(attr), at_c<1>(attr), at_c<2>(attr), std::numeric_limits<double>::infinity()};
};
auto const pri_rule = x3::rule<struct _pri, MyData> {"pri_rule"} =
(x3::int_ >> x3::char_ >> x3::int_ >> x3::double_)[pri];
auto const bri_rule = x3::rule<struct _bri, MyData> {"bri_rule"} =
(x3::int_ >> x3::char_ >> x3::int_)[bri];
And yes, this "works":
Live On Coliru
for(std::string const line : {
"321 a 321 3.14",
"432 b 432"
})
{
MyData data;
bool r = x3::phrase_parse(
line.begin(), line.end(),
pri_rule | bri_rule,
x3::space,
data);
if (r)
std::cout << "Parsed " << data << "\n";
else
std::cout << "Failed\n";
}
Prints
got this: (321 a 321 3.14)
Parsed MyData{321,a,321,3.14}
got that: (432 b 432)
Parsed MyData{432,b,432,inf}
However this seems horribly complicated.
SIMPLIFY!!!
It seems you merely have an optional trailing double_. With a little bit of help:
BOOST_FUSION_ADAPT_STRUCT(MyData, a,b,c,d);
You can have the same effect without any of the mess:
bool r = x3::phrase_parse(
line.begin(), line.end(),
x3::int_ >> x3::char_ >> x3::int_ >> (x3::double_ | x3::attr(9999)),
x3::space, data);
Which would print Live On Coliru
Parsed MyData{321,a,321,3.14}
Parsed MyData{432,b,432,9999}
Optional: Optionality
If you don't have a valid default for the double you could make it an optional:
x3::int_ >> x3::char_ >> x3::int_ >> -x3::double_,
And could still parse it:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/optional/optional_io.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyData {
int a = 0;
char b = 0;
int c = 0;
boost::optional<double> d;
friend std::ostream& operator<<(std::ostream& os, MyData const& md) {
return os << "MyData{" << md.a << "," << md.b << "," << md.c << "," << md.d << "}";
}
};
BOOST_FUSION_ADAPT_STRUCT(MyData, a,b,c,d)
int main() {
for(std::string const line : { "321 a 321 3.14", "432 b 432" }) {
MyData data;
bool r = x3::phrase_parse(
line.begin(), line.end(),
x3::int_ >> x3::char_ >> x3::int_ >> -x3::double_,
x3::space, data);
if (r)
std::cout << "Parsed " << data << "\n";
else
std::cout << "Failed\n";
}
}
Prints:
Parsed MyData{321,a,321, 3.14}
Parsed MyData{432,b,432,--}

Related

Boost X3: Can a variant member be avoided in disjunctions?

I'd like to parse string | (string, int) and store it in a structure that defaults the int component to some value. The attribute of such a construction in X3 is a variant<string, tuple<string, int>>. I was thinking I could have a struct that takes either a string or a (string, int) to automagically be populated:
struct bar
{
bar (std::string x = "", int y = 0) : baz1 {x}, baz2 {y} {}
std::string baz1;
int baz2;
};
BOOST_FUSION_ADAPT_STRUCT (disj::ast::bar, baz1, baz2)
and then simply have:
const x3::rule<class bar, ast::bar> bar = "bar";
using x3::int_;
using x3::ascii::alnum;
auto const bar_def = (+(alnum) | ('(' >> +(alnum) >> ',' >> int_ >> ')')) >> ';';
BOOST_SPIRIT_DEFINE(bar);
However this does not work:
/usr/include/boost/spirit/home/x3/core/detail/parse_into_container.hpp:139:59: error: static assertion failed: Expecting a single element fusion sequence
139 | static_assert(traits::has_size<Attribute, 1>::value,
Setting baz2 to an optional does not help. One way to solve this is to have a variant field or inherit from that type:
struct string_int {
std::string s;
int i;
};
struct foo {
boost::variant<std::string, string_int> var;
};
BOOST_FUSION_ADAPT_STRUCT (disj::ast::string_int, s, i)
BOOST_FUSION_ADAPT_STRUCT (disj::ast::foo, var)
(For some reason, I have to use boost::variant instead of x3::variant for operator<< to work; also, using std::pair or tuple for string_int does not work, but boost::fusion::deque does.) One can then equip foo somehow to get the string and integer.
Question: What is the proper, clean way to do this in X3? Is there a more natural way than this second option and equipping foo with accessors?
Live On Coliru
Sadly the wording in the x3 section is exceedingly sparse and allows it (contrast the Qi section). A quick test confirms it:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
template <typename Expr>
std::string inspect(Expr const& expr) {
using A = typename x3::traits::attribute_of<Expr, x3::unused_type>::type;
return boost::core::demangle(typeid(A).name());
}
int main()
{
std::cout << inspect(x3::double_ | x3::int_) << "\n"; // variant expected
std::cout << inspect(x3::int_ | "bla" >> x3::int_) << "\n"; // variant "understandable"
std::cout << inspect(x3::int_ | x3::int_) << "\n"; // variant suprising:
}
Prints
boost::variant<double, int>
boost::variant<int, int>
boost::variant<int, int>
All Hope Is Not Lost
In your specific case you could trick the system:
auto const bar_def = //
(+x3::alnum >> x3::attr(-1) //
| '(' >> +x3::alnum >> ',' >> x3::int_ >> ')' //
) >> ';';
Note how we "inject" an int value for the first branch. That satisfies the attribute propagation gods:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iomanip>
namespace x3 = boost::spirit::x3;
namespace disj::ast {
struct bar {
std::string x;
int y;
};
using boost::fusion::operator<<;
} // namespace disj::ast
BOOST_FUSION_ADAPT_STRUCT(disj::ast::bar, x, y)
namespace disj::parser {
const x3::rule<class bar, ast::bar> bar = "bar";
auto const bar_def = //
(+x3::alnum >> x3::attr(-1) //
| '(' >> +x3::alnum >> ',' >> x3::int_ >> ')' //
) >> ';';
BOOST_SPIRIT_DEFINE(bar)
}
namespace disj {
void run_tests() {
for (std::string const input : {
"",
";",
"bla;",
"bla, 42;",
"(bla, 42);",
}) {
ast::bar val;
auto f = begin(input), l = end(input);
std::cout << "\n" << quoted(input) << " -> ";
if (phrase_parse(f, l, parser::bar, x3::space, val)) {
std::cout << "Parsed: " << val << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l) {
std::cout << " -- Remaining " << quoted(std::string_view(f, l)) << "\n";
}
}
}
}
int main()
{
disj::run_tests();
}
Prints
"" -> Failed
";" -> Failed
-- Remaining ";"
"bla;" -> Parsed: (bla -1)
"bla, 42;" -> Failed
-- Remaining "bla, 42;"
"(bla, 42);" -> Parsed: (bla 42)
ยน just today

Tokenize a "Braced Initializer List"-Style String in C++ (With Boost?)

I have a string (nested strings even) that are formatted like a C++ braced initializer list. I want to tokenize them one level at a time into a vector of strings.
So when I input "{one, two, three}" to the function should output a three element vector
"one",
"two",
"three"
To complicate this, it needs to support quoted tokens and preserve nested lists:
Input String: "{one, {2, \"three four\"}}, \"five, six\", {\"seven, eight\"}}"
Output is a four element vector:
"one",
"{2, \"three four\"}",
"five, six",
"{\"seven, eight\"}"
I've looked at a few other SO posts:
Using Boost Tokenizer escaped_list_separator with different parameters
Boost split not traversing inside of parenthesis or braces
And used those to start a solution, but this seems slightly too complicated for the tokenizer (because of the braces):
#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>
std::vector<std::string> TokenizeBracedList(const std::string& x)
{
std::vector<std::string> tokens;
std::string separator1("");
std::string separator2(",\n\t\r");
std::string separator3("\"\'");
boost::escaped_list_separator<char> elements(separator1, separator2, separator3);
boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(x, elements);
for(auto i = std::begin(tokenizer); i != std::end(tokenizer); ++i)
{
auto token = *i;
boost::algorithm::trim(token);
tokens.push_back(token);
}
return tokens;
}
With this, even in the trivial case, it doesn't strip the opening and closing braces.
Boost and C++17 are fair game for a solution.
Simple (Flat) Take
Defining a flat data structure like:
using token = std::string;
using tokens = std::vector<token>;
We can define an X3 parser like:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Live On Wandbox
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
using token = std::string;
using tokens = std::vector<token>;
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
int main() {
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
std::vector<std::string> parsed;
bool ok = phrase_parse(f, l, Parser::list, x3::space, parsed);
if (ok) {
std::cout << "Parsed: " << parsed.size() << " elements\n";
for (auto& el : parsed) {
std::cout << " - " << std::quoted(el, '\'') << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints
Parsed: 2 elements
- 'one'
- 'five, six'
Parsed: 4 elements
- 'one'
- '{2, "three four"}'
- 'five, six'
- '{"seven, eight"}'
Nested Data
Changing the datastructure to be a bit more specific/realistic:
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
Now we can change the grammar, as we no longer need to treat sublist as if it is a string:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Everything "still works": Live On Wandbox
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
struct pretty_printer {
using result_type = void;
std::ostream& _os;
int _indent;
pretty_printer(std::ostream& os, int indent = 0) : _os(os), _indent(indent) {}
void operator()(ast::value const& v) { boost::apply_visitor(*this, v); }
void operator()(double v) { _os << v; }
void operator()(std::string s) { _os << std::quoted(s); }
void operator()(ast::list const& l) {
_os << "{\n";
_indent += 2;
for (auto& item : l) {
_os << std::setw(_indent) << "";
operator()(item);
_os << ",\n";
}
_indent -= 2;
_os << std::setw(_indent) << "" << "}";
}
};
int main() {
pretty_printer print{std::cout};
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
ast::value parsed;
bool ok = phrase_parse(f, l, Parser::item, x3::space, parsed);
if (ok) {
std::cout << "Parsed: ";
print(parsed);
std::cout << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints:
Parsed: {
"one",
"five, six",
}
Parsed: {
"one",
{
2,
"three four",
},
"five, six",
{
"seven, eight",
},
}

Spirit X3 composed attributes

I am trying to compose spirit rules but I cannot figure out what the attribute of this new rule would be.
The following code is working as I would expect it.
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/fusion/tuple.hpp>
namespace ast{
struct Record{
int id;
std::string name;
};
struct Document{
Record rec;
Record rec2;
//std::vector<Record> rec;
std::string name;
};
using boost::fusion::operator<<;
}
BOOST_FUSION_ADAPT_STRUCT(ast::Record,
name, id
)
BOOST_FUSION_ADAPT_STRUCT(ast::Document,
rec, rec2,
//rec,
name
)
namespace parser{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using x3::lit;
using x3::int_;
using ascii::char_;
const auto identifier = +char_("a-z");
const x3::rule<class record, ast::Record> record = "record";
const auto record_def = lit("record") >> identifier >> lit("{") >> int_ >> lit("}");
const x3::rule<class document, ast::Document> document = "document";
const auto document_def =
record >> record
//+record // This should generate a sequence
>> identifier
;
BOOST_SPIRIT_DEFINE(document, record);
}
namespace{
constexpr char g_input[] = R"input(
record foo{42}
record bar{73}
foobar
)input";
}
int main(){
using boost::spirit::x3::ascii::space;
std::string str = g_input;
ast::Document unit;
bool r = phrase_parse(str.begin(), str.end(), parser::document, space, unit);
std::cout << "Got: " << unit << "\n";
return 0;
}
But when I change the rule to parse multiple records(instead of exactly 2) I would expect it to have a std::vector<Record> as an attribute. But all I get is a long compiler error that does not help me very much.
Can someone point me to what I am doing wrong in order to compose the attributes correctly?
I think the whole reason it didn't compile is because you tried to print the result... and std::vector<Record> doesn't know how to be streamed:
namespace ast {
using boost::fusion::operator<<;
static inline std::ostream& operator<<(std::ostream& os, std::vector<Record> const& rs) {
os << "{ ";
for (auto& r : rs) os << r << " ";
return os << "}";
}
}
Some more notes:
adding lexemes where absolutely required (!)
simplifying (no need to BOOST_SPIRIT_DEFINE unless recursive rules/separate TUs)
dropping redundant lit
I arrived at
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
namespace ast {
struct Record{
int id;
std::string name;
};
struct Document{
std::vector<Record> rec;
std::string name;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::Record, name, id)
BOOST_FUSION_ADAPT_STRUCT(ast::Document, rec, name)
namespace ast {
using boost::fusion::operator<<;
static inline std::ostream& operator<<(std::ostream& os, std::vector<Record> const& rs) {
os << "{ ";
for (auto& r : rs) os << r << " ";
return os << "}";
}
}
namespace parser {
namespace x3 = boost::spirit::x3;
namespace ascii = x3::ascii;
const auto identifier = x3::lexeme[+x3::char_("a-z")];
const auto record = x3::rule<class record, ast::Record> {"record"}
= x3::lexeme["record"] >> identifier >> "{" >> x3::int_ >> "}";
const auto document = x3::rule<class document, ast::Document> {"document"}
= +record
>> identifier
;
}
int main(){
std::string const str = "record foo{42} record bar{73} foobar";
auto f = str.begin(), l = str.end();
ast::Document unit;
if (phrase_parse(f, l, parser::document, parser::ascii::space, unit)) {
std::cout << "Got: " << unit << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
Prints
Got: ({ (foo 42) (bar 73) } foobar)

tokenizing string , accepting everything between given set of characters in CPP

I have the following code:
int main()
{
string s = "server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')";
regex re("(\'[!-~]+\')");
sregex_token_iterator i(s.begin(), s.end(), re, 1);
sregex_token_iterator j;
unsigned count = 0;
while(i != j)
{
cout << "the token is "<<*i<< endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
return 0;
}
Using the above regex, I wanted to extract the string between the paranthesis and single quote:, The out put should look like :
the token is 'm1.labs.teradata.com'
the token is 'use\')r_*5'
the token is 'u" er 5'
the token is 'default'
There were 4 tokens found.
Basically, the regex supposed to extract everything between " (' " and " ') ". It can be anything space , special character, quote or a closing parathesis.
I has earlier used the following regex:
boost::regex re_arg_values("(\'[!-~]+\')");
But is was not accepting space. Please can someone help me out with this. Thanks in advance.
Here's a sample of using Spirit X3 to create grammar to actually parse this. I'd like to parse into a map of (key->value) pairs, which makes a lot more sense than just blindly assuming the names are always the same:
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
Now, we setup some grammar rules using X3:
namespace parser {
using namespace boost::spirit::x3;
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
The helpers as<> and quoted are simple lambdas:
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *('\\' >> char_ | char_ - q) >> q]; };
Now we can parse the string into a map directly:
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
And the demo program
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Prints
Key dbname has value default
Key password has value u" er 5
Key server has value m1.labs.teradata.com
Key username has value use')r_*5
LIVE DEMO
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
namespace parser {
using namespace boost::spirit::x3;
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *(('\\' >> char_) | (char_ - q)) >> q]; };
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Bonus
If you want to learn how to extract the raw input: just try
auto source = skip(space) [ *raw [ pair ] ];
as in this:
using RawSettings = std::vector<std::string>;
RawSettings parse_raw_config(std::string const& cfg) {
RawSettings parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::source, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
for (auto& setting : parse_raw_config(text))
std::cout << "Raw: " << setting << "\n";
}
Which prints: Live On Coliru
Raw: server ('m1.labs.teradata.com')
Raw: username ('use\')r_*5')
Raw: password('u" er 5')
Raw: dbname ('default')
Fixing a few syntax and style issues:
you need to escape \ in C strings
you had a " in s, making a syntax error
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
int main() {
std::string s = "server ('m1.labs.teradata.com') username ('use\')r_*5') password('u' er 5') dbname ('default')";
boost::regex re(R"(('([^'\\]*(?:\\[\s\S][^'\\]*)*)'))");
size_t count = 0;
for (auto tok : boost::make_iterator_range(boost::sregex_token_iterator(s.begin(), s.end(), re, 1), {})) {
std::cout << "Token " << ++count << " is " << tok << "\n";
}
}
Prints
Token 1 is 'm1.labs.teradata.com'
Token 2 is 'use'
Token 3 is ') password('
Token 4 is ' er 5'
Token 5 is 'default'

boost::spirit access position iterator from semantic actions

Lets say I have code like this (line numbers for reference):
1:
2:function FuncName_1 {
3: var Var_1 = 3;
4: var Var_2 = 4;
5: ...
I want to write a grammar that parses such text, puts all indentifiers (function and variable names) infos into a tree (utree?).
Each node should preserve: line_num, column_num and symbol value. example:
root: FuncName_1 (line:2,col:10)
children[0]: Var_1 (line:3, col:8)
children[1]: Var_1 (line:4, col:9)
I want to put it into the tree because I plan to traverse through that tree and for each node I must know the 'context': (all parent nodes of current nodes).
E.g, while processing node with Var_1, I must know that this is a name for local variable for function FuncName_1 (that is currently being processed as node, but one level earlier)
I cannot figure out few things
Can this be done in Spirit with semantic actions and utree's ? Or should I use variant<> trees ?
How to pass to the node those three informations (column,line,symbol_name) at the same time ? I know I must use pos_iterator as iterator type for grammar but how to access those information in sematic action ?
I'm a newbie in Boost so I read the Spirit documentaiton over and over, I try to google my problems but I somehow cannot put all the pieces together ot find the solution. Seems like there was no one me with such use case like mine before (or I'm just not able to find it)
Looks like the only solutions with position iterator are the ones with parsing error handling, but this is not the case I'm interested in.
The code that only parses the code I was taking about is below but I dont know how to move forward with it.
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
namespace qi = boost::spirit::qi;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, Skipper>
{
ParseGrammar():ParseGrammar::base_type(SourceCode)
{
using namespace qi;
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = lexeme [alpha >> *(alnum | '_')];
VarAssignemnt = KeywordVar >> Identifier >> char_('=') >> int_ >> SemiColon;
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignemnt >> '}';
}
qi::rule<Iterator, Skipper> SourceCode;
qi::rule<Iterator > KeywordFunction;
qi::rule<Iterator, Skipper> VarAssignemnt;
qi::rule<Iterator> KeywordVar;
qi::rule<Iterator> SemiColon;
qi::rule<Iterator > Identifier;
};
int main()
{
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 = 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver; // Our parser
bool ok = phrase_parse(iter,
last,
resolver,
qi::space);
std::cout << std::boolalpha;
std::cout << "\nok : " << ok << std::endl;
std::cout << "full : " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
This has been a fun exercise, where I finally put together a working demo of on_success[1] to annotate AST nodes.
Let's assume we want an AST like:
namespace ast
{
struct LocationInfo {
unsigned line, column, length;
};
struct Identifier : LocationInfo {
std::string name;
};
struct VarAssignment : LocationInfo {
Identifier id;
int value;
};
struct SourceCode : LocationInfo {
Identifier function;
std::vector<VarAssignment> assignments;
};
}
I know, 'location information' is probably overkill for the SourceCode node, but you know... Anyways, to make it easy to assign attributes to these nodes without requiring semantic actions or lots of specifically crafted constructors:
#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier, (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode, (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))
There. Now we can declare the rules to expose these attributes:
qi::rule<Iterator, ast::SourceCode(), Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()> Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
We don't (essentially) modify the grammar, at all: attribute propagation is "just automatic"[2] :
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment = KeywordVar >> Identifier >> '=' >> int_ >> SemiColon;
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';
The magic
How do we get the source location information attached to our nodes?
auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier, set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode, set_location_info);
Now, annotate is just a lazy version of a calleable that is defined as:
template<typename It>
struct annotation_f {
typedef void result_type;
annotation_f(It first) : first(first) {}
It const first;
template<typename Val, typename First, typename Last>
void operator()(Val& v, First f, Last l) const {
do_annotate(v, f, l, first);
}
private:
void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
using std::distance;
li.line = get_line(f);
li.column = get_column(first, f);
li.length = distance(f, l);
}
static void do_annotate(...) { }
};
Due to way in which get_column works, the functor is stateful (as it remembers the start iterator)[3]. As you can see do_annotate just accepts anything that derives from LocationInfo.
Now, the proof of the pudding:
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 = 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first); // Our parser
ast::SourceCode program;
bool ok = phrase_parse(iter,
last,
resolver,
qi::space,
program);
std::cout << std::boolalpha;
std::cout << "ok : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
for (auto const& va : program.assignments)
std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
This prints:
ok : true
full: true
OK: Parsing fully succeeded
Function name: FuncName_1 (see L1:1:56)
variable Var_1 assigned value 3 at L2:3:14
variable Var_2 assigned value 4 at L3:3:15
Full Demo Program
See it Live On Coliru
Also showing:
error handling, e.g.:
Error: expecting "=" in line 3:
var Var_2 - 4; }
^---- here
ok : false
full: false
-------------------------
ERROR: Parsing failed or not complete
stopped at: 1:1
remaining: 'function FuncName_1 {
var Var_1 = 3;
var Var_2 - 4; }'
-------------------------
BOOST_SPIRIT_DEBUG macros
A bit of a hacky way to conveniently stream the LocationInfo part of any AST node, sorry :)
//#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
typedef boost::spirit::line_pos_iterator<std::string::const_iterator> pos_iterator_t;
namespace ast
{
namespace manip { struct LocationInfoPrinter; }
struct LocationInfo {
unsigned line, column, length;
manip::LocationInfoPrinter printLoc() const;
};
struct Identifier : LocationInfo {
std::string name;
};
struct VarAssignment : LocationInfo {
Identifier id;
int value;
};
struct SourceCode : LocationInfo {
Identifier function;
std::vector<VarAssignment> assignments;
};
///////////////////////////////////////////////////////////////////////////
// Completely unnecessary tweak to get a "poor man's" io manipulator going
// so we can do `std::cout << x.printLoc()` on types of `x` deriving from
// LocationInfo
namespace manip {
struct LocationInfoPrinter {
LocationInfoPrinter(LocationInfo const& ref) : ref(ref) {}
LocationInfo const& ref;
friend std::ostream& operator<<(std::ostream& os, LocationInfoPrinter const& lip) {
return os << lip.ref.line << ':' << lip.ref.column << ':' << lip.ref.length;
}
};
}
manip::LocationInfoPrinter LocationInfo::printLoc() const { return { *this }; }
// feel free to disregard this hack
///////////////////////////////////////////////////////////////////////////
}
BOOST_FUSION_ADAPT_STRUCT(ast::Identifier, (std::string, name))
BOOST_FUSION_ADAPT_STRUCT(ast::VarAssignment, (ast::Identifier, id)(int, value))
BOOST_FUSION_ADAPT_STRUCT(ast::SourceCode, (ast::Identifier, function)(std::vector<ast::VarAssignment>, assignments))
struct error_handler_f {
typedef qi::error_handler_result result_type;
template<typename T1, typename T2, typename T3, typename T4>
qi::error_handler_result operator()(T1 b, T2 e, T3 where, T4 const& what) const {
std::cerr << "Error: expecting " << what << " in line " << get_line(where) << ": \n"
<< std::string(b,e) << "\n"
<< std::setw(std::distance(b, where)) << '^' << "---- here\n";
return qi::fail;
}
};
template<typename It>
struct annotation_f {
typedef void result_type;
annotation_f(It first) : first(first) {}
It const first;
template<typename Val, typename First, typename Last>
void operator()(Val& v, First f, Last l) const {
do_annotate(v, f, l, first);
}
private:
void static do_annotate(ast::LocationInfo& li, It f, It l, It first) {
using std::distance;
li.line = get_line(f);
li.column = get_column(first, f);
li.length = distance(f, l);
}
static void do_annotate(...) {}
};
template<typename Iterator=pos_iterator_t, typename Skipper=qi::space_type>
struct ParseGrammar: public qi::grammar<Iterator, ast::SourceCode(), Skipper>
{
ParseGrammar(Iterator first) :
ParseGrammar::base_type(SourceCode),
annotate(first)
{
using namespace qi;
KeywordFunction = lit("function");
KeywordVar = lit("var");
SemiColon = lit(';');
Identifier = as_string [ alpha >> *(alnum | char_("_")) ];
VarAssignment = KeywordVar > Identifier > '=' > int_ > SemiColon; // note: expectation points
SourceCode = KeywordFunction >> Identifier >> '{' >> *VarAssignment >> '}';
on_error<fail>(VarAssignment, handler(_1, _2, _3, _4));
on_error<fail>(SourceCode, handler(_1, _2, _3, _4));
auto set_location_info = annotate(_val, _1, _3);
on_success(Identifier, set_location_info);
on_success(VarAssignment, set_location_info);
on_success(SourceCode, set_location_info);
BOOST_SPIRIT_DEBUG_NODES((KeywordFunction)(KeywordVar)(SemiColon)(Identifier)(VarAssignment)(SourceCode))
}
phx::function<error_handler_f> handler;
phx::function<annotation_f<Iterator>> annotate;
qi::rule<Iterator, ast::SourceCode(), Skipper> SourceCode;
qi::rule<Iterator, ast::VarAssignment(), Skipper> VarAssignment;
qi::rule<Iterator, ast::Identifier()> Identifier;
// no skipper, no attributes:
qi::rule<Iterator> KeywordFunction, KeywordVar, SemiColon;
};
int main()
{
std::string const content = "function FuncName_1 {\n var Var_1 = 3;\n var Var_2 - 4; }";
pos_iterator_t first(content.begin()), iter = first, last(content.end());
ParseGrammar<pos_iterator_t> resolver(first); // Our parser
ast::SourceCode program;
bool ok = phrase_parse(iter,
last,
resolver,
qi::space,
program);
std::cout << std::boolalpha;
std::cout << "ok : " << ok << std::endl;
std::cout << "full: " << (iter == last) << std::endl;
if(ok && iter == last)
{
std::cout << "OK: Parsing fully succeeded\n\n";
std::cout << "Function name: " << program.function.name << " (see L" << program.printLoc() << ")\n";
for (auto const& va : program.assignments)
std::cout << "variable " << va.id.name << " assigned value " << va.value << " at L" << va.printLoc() << "\n";
}
else
{
int line = get_line(iter);
int column = get_column(first, iter);
std::cout << "-------------------------\n";
std::cout << "ERROR: Parsing failed or not complete\n";
std::cout << "stopped at: " << line << ":" << column << "\n";
std::cout << "remaining: '" << std::string(iter, last) << "'\n";
std::cout << "-------------------------\n";
}
return 0;
}
[1] sadly un(der)documented, except for the conjure sample(s)
[2] well, I used as_string to get proper assignment to Identifier without too much work
[3] There could be smarter ways about this in terms of performance, but for now, let's keep it simple