How to parse two strings using boost::spirit? - c++

I am still trying to wrap my head around Boost::Spirit.
I want to parse two words into a variable. When I can do that, into a struct.
The single word compiles, the Variable doesn't. Why?
#include <boost/spirit/include/qi.hpp>
#include <boost/tuple/tuple.hpp>
#include <string>
#include <iostream>
using namespace boost::spirit;
/*
class Syntax : public qi::parser{
};
*/
int main()
{
//get user input
std::string input;
std::getline(std::cin, input);
auto it = input.begin();
bool result;
//define grammar for a single word
auto word_grammar = +qi::alnum - qi::space;
std::string singleWord;
result = qi::parse(
it, input.end(),
word_grammar,
singleWord
);
if(!result){
std::cout << "Failed to parse a word" << '\n';
return -1;
}
std::cout << "\"" << singleWord << "\"" << '\n';
//Now parse two words into a variable
std::cout << "Variable:\n";
typedef boost::tuple<std::string, std::string> Variable;
Variable variable;
auto variable_grammar = word_grammar >> word_grammar;
result = qi::parse(
it, input.end(),
variable_grammar,
variable
);
if(!result){
std::cout << "Failed to parse a variable" << '\n';
return -1;
}
std::cout << "\"" << variable.get<0>() << "\" \"" << variable.get<1>() << "\"" << '\n';
//now parse a list of variables
std::cout << "List of Variables:\n";
std::list<Variable> variables;
result = qi::parse(
it, input.end(),
variable_grammar % +qi::space,
variable
);
if(!result){
std::cout << "Failed to parse a list of variables" << '\n';
return -1;
}
for(auto var : variables)
std::cout << "DataType: " << var.get<0>() << ", VariableName: " << var.get<1>() << '\n';
}
In the end I want to parse something like this:
int a
float b
string name
Templates are nice, but when problems occur the error messages are just not human readable (thus no point in posting them here).
I am using the gcc

Sorry to take so long. I've been building a new web server in a hurry and had much to learn.
Here is what it looks like in X3. I think it is easier to deal with than qi. And then, I've used it a lot more. But then qi is much more mature, richer. That said, x3 is meant to be adaptable, hackable. So you can make it do just about anything you want.
So, live on coliru
#include <string>
#include <iostream>
#include <vector>
#include <boost/spirit/home/x3.hpp>
#include <boost/tuple/tuple.hpp>
//as pointed out, for the error 'The parser expects tuple-like attribute type'
#include <boost/fusion/adapted/boost_tuple.hpp>
//our declarations
using Variable = boost::tuple<std::string, std::string>;
using Vector = std::vector<Variable>;
namespace parsers {
using namespace boost::spirit::x3;
auto const word = lexeme[+char_("a-zA-Z")];
//note, using 'space' as the stock skipper
auto const tuple = word >> word;
}
std::ostream& operator << (std::ostream& os, /*const*/ Variable& obj) {
return os << obj.get<0>() << ' ' << obj.get<1>();
}
std::ostream& operator << (std::ostream& os, /*const*/ Vector& obj) {
for (auto& item : obj)
os << item << " : ";
return os;
}
template<typename P, typename A>
bool test_parse(std::string in, P parser, A& attr) {
auto begin(in.begin());
bool r = phrase_parse(begin, in.end(), parser, boost::spirit::x3::space, attr);
std::cout << "result:\n " << attr << std::endl;
return r;
}
int main()
{
//not recomended but this is testing stuff
using namespace boost::spirit::x3;
using namespace parsers;
std::string input("first second third forth");
//parse one word
std::string singleWord;
test_parse(input, word, singleWord);
//parse two words into a variable
Variable variable;
test_parse(input, tuple, variable);
//parse two sets of two words
Vector vector;
test_parse(input, *tuple, vector);
}
You may like this form of testing. You can concentrate on testing parsers without a lot of extra code. It makes it easier down the road to keep your basic parsers in their own namespace. Oh yea, x3 compiles much faster than qi!

The single word compiles, the Variable doesn't. Why?
There are missing two #includes:
#include <boost/fusion/adapted/boost_tuple.hpp>
#include <boost/spirit/include/qi_list.hpp>

Related

Why is my string extraction function using back referencing in regex not working as intended?

Extraction Function
string extractStr(string str, string regExpStr) {
regex regexp(regExpStr);
smatch m;
regex_search(str, m, regexp);
string result = "";
for (string x : m)
result = result + x;
return result;
}
The Main Code
#include <iostream>
#include <regex>
using namespace std;
string extractStr(string, string);
int main(void) {
string test = "(1+1)*(n+n)";
cout << extractStr(test, "n\\+n") << endl;
cout << extractStr(test, "(\\d)\\+\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]([a-zA-Z])") << endl;
return 0;
}
The Output
String = (1+1)*(n+n)
n\+n = n+n
(\d)\+\1 = 1+11
([a-zA-Z])[+-/*]\1 = n+nn
([a-zA-Z])[+-/*]([a-zA-Z]) = n+nnn
If anyone could kindly point the error I've done or point me to a similar question in SO that I've missed while searching, it would be greatly appreciated.
Regexes in C++ don't work quite like "normal" regexes. Specialy when you are looking for multiple groups later. I also have some C++ tips in here (constness and references).
#include <cassert>
#include <iostream>
#include <sstream>
#include <regex>
#include <string>
// using namespace std; don't do this!
// https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice
// pass strings by const reference
// 1. const, you promise not to change them in this function
// 2. by reference, you avoid making copies
std::string extractStr(const std::string& str, const std::string& regExpStr)
{
std::regex regexp(regExpStr);
std::smatch m;
std::ostringstream os; // streams are more efficient for building up strings
auto begin = str.cbegin();
bool comma = false;
// C++ matches regexes in parts so work you need to loop
while (std::regex_search(begin, str.end(), m, regexp))
{
if (comma) os << ", ";
os << m[0];
comma = true;
begin = m.suffix().first;
}
return os.str();
}
// small helper function to produce nicer output for your tests.
void test(const std::string& input, const std::string& regex, const std::string& expected)
{
auto output = extractStr(input, regex);
if (output == expected)
{
std::cout << "test succeeded : output = " << output << "\n";
}
else
{
std::cout << "test failed : output = " << output << ", expected : " << expected << "\n";
}
}
int main(void)
{
std::string input = "(1+1)*(n+n)";
test(input, "n\\+n", "n+n");
test(input, "(\\d)\\+\\1", "1+1");
test(input, "([a-zA-Z])[+-/*]\\1", "n+n");
return 0;
}

Boost Spirit X3: skip parser that would do nothing

I'm getting myself familiarized with boost spirit v3. The question I want to ask is how to state the fact that you don't want to use skip parser in any way.
Consider a simple example of parsing comma-separated sequence of integers:
#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/home/x3.hpp>
int main()
{
using namespace boost::spirit::x3;
const std::string input{"2,4,5"};
const auto parser = int_ % ',';
std::vector<int> numbers;
auto start = input.cbegin();
auto r = phrase_parse(start, input.end(), parser, space, numbers);
if(r && start == input.cend())
{
// success
for(const auto &item: numbers)
std::cout << item << std::endl;
return 0;
}
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
This works totally fine. However, I would like to forbid having spaces in between (i.e. "2, 4,5" should not be parsed well).
I tried using eps as a skip parser in phrase_parse, but as you can guess, the program ended up in the infinite loop because eps matches to an empty string.
Solution I found is to use no_skip directive (https://www.boost.org/doc/libs/1_75_0/libs/spirit/doc/html/spirit/qi/reference/directive/no_skip.html). So the parser now becomes:
const auto parser = no_skip[int_ % ','];
This works fine, but I don't find it to be an elegant solution (especially providing "space" parser in phrase_parse when I want no whitespace skips). Are there no skip parsers that would simply do nothing? Am I missing something?
Thanks for Your time. Looking forward to any replies.
You can use either no_skip[] or lexeme[]. They're almost identical, except for pre-skip (Boost Spirit lexeme vs no_skip).
Are there no skip parsers that would simply do nothing? Am I missing something?
A wild guess, but you might be missing the parse API that doesn't accept a skipper in the first place
Live On Coliru
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
std::string const input{ "2,4,5" };
auto f = begin(input), l = end(input);
const auto parser = x3::int_ % ',';
std::vector<int> numbers;
auto r = parse(f, l, parser, numbers);
if (r) {
// success
for (const auto& item : numbers)
std::cout << item << std::endl;
} else {
std::cerr << "Input was not parsed successfully" << std::endl;
return 1;
}
if (f!=l) {
std::cout << "Remaining input " << std::quoted(std::string(f,l)) << "\n";
return 2;
}
}
Prints
2
4
5

Parsing into structs with containers

How can use boost.spirit x3 to parse into structs like:
struct person{
std::string name;
std::vector<std::string> friends;
}
Coming from boost.spirit v2 I would use a grammar but since X3 doesnt support grammars I have no idea how to do this clean.
EDIT: It would be nice if someone could help me writing a parser parsing a list of strings and returns a person with the first string is the name and the res of the strings are in the friends vector.
Parsing with x3 is much simpler than it was with v2, so you shouldn't have too much trouble moving over. Grammars being gone is a good thing!
Here's how you can parse into a vector of strings:
//#define BOOST_SPIRIT_X3_DEBUG
#include <fstream>
#include <iostream>
#include <string>
#include <type_traits>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
namespace x3 = boost::spirit::x3;
struct person
{
std::string name;
std::vector<std::string> friends;
};
BOOST_FUSION_ADAPT_STRUCT(
person,
(std::string, name)
(std::vector<std::string>, friends)
);
auto const name = x3::rule<struct name_class, std::string> { "name" }
= x3::raw[x3::lexeme[x3::alpha >> *x3::alnum]];
auto const root = x3::rule<struct person_class, person> { "person" }
= name >> *name;
int main(int, char**)
{
std::string const input = "bob john ellie";
auto it = input.begin();
auto end = input.end();
person p;
if (phrase_parse(it, end, root >> x3::eoi, x3::space, p))
{
std::cout << "parse succeeded" << std::endl;
std::cout << p.name << " has " << p.friends.size() << " friends." << std::endl;
}
else
{
std::cout << "parse failed" << std::endl;
if (it != end)
std::cout << "remaining: " << std::string(it, end) << std::endl;
}
return 0;
}
As you can see on Coliru, the output is :
parse succeeded
bob has 2 friends.

Boost.Spirit.x3 avoid collapsing two consecutive attributes of the same type into a vector

I am trying to learn Boost.Spirit, but I have found a difficulty.
I am trying to parse a string into the following structure:
struct employee {
std::string name;
std::string location;
};
And it seems that when two attributes with the same type are back to back, they collapse (logically) into a std::vector of that type. Because of that rule, the following parser
+x3::ascii::alnum >>
+x3::space >>
+x3::ascii::alnum
would have the attribute of std::vector<std::string>.
But I am trying to parse this into that struct, which means that the ideal attribute for me would be a boost::fusion::tuple<std::string, std::string>, so I can adapt my struct to it.
The complete version of the not working code (referenced above):
// Example program
#include <iostream>
#include <string>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
struct employee {
std::string name;
std::string location;
};
BOOST_FUSION_ADAPT_STRUCT(employee,
(std::string, name),
(std::string, location)
)
namespace x3 = boost::spirit::x3;
x3::rule<struct parse_emp_id, employee> const parse_emp = "Employee Parser";
auto parse_emp_def =
+x3::ascii::alnum >>
+x3::space >>
+x3::ascii::alnum
;
BOOST_SPIRIT_DEFINE(parse_emp);
int main()
{
std::string input = "Joe Fairbanks";
employee ret;
x3::parse(input.begin(), input.end(), parse_emp, ret);
std::cout << "Name: " << ret.name << "\tLocation: " << ret.location << std::endl;
}
See it live
This code triggers a static_assert telling me that my attribute isn't correct:
error: static_assert failed "Attribute does not have the expected size."
With the command of
clang++ -std=c++14 test.cpp
(it also fails under GCC).
What I have tried
I have found a workaround to this problem, but it is messy, and I can't believe that this is the cleanest way:
// Example program
#include <iostream>
#include <string>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
struct employee {
std::string name;
std::string location;
};
namespace x3 = boost::spirit::x3;
x3::rule<struct parse_emp_id, employee> const parse_emp = "Employee Parser";
auto parse_emp_def =
x3::eps [
([](auto& ctx) {
x3::_val(ctx) = employee{};
})
]>>
(+x3::ascii::alnum)[
([](auto& ctx) {
x3::_val(ctx).name = x3::_attr(ctx);
})
]>>
+x3::space >>
(+x3::ascii::alnum)[
([](auto& ctx) {
x3::_val(ctx).location = x3::_attr(ctx);
})
]
;
BOOST_SPIRIT_DEFINE(parse_emp);
int main()
{
std::string input = "Joe Fairbanks";
employee ret;
x3::parse(input.begin(), input.end(), parse_emp, ret);
std::cout << "Name: " << ret.name << "\tLocation: " << ret.location << std::endl;
}
See it live
I really don't like that solution: it kinda ruins the amazing expressiveness of spirit and makes it super ugly, also if I want to add new fields into the employee struct, then I have to add an extra lambda, instead of just updating my BOOST_FUSION_ADAPT_STRUCT, which is much easier.
So the question is: Is there some way to (hopefully) cleanly split two consecutive attributes of the same type from the std::vector and into a boost::fusion::vector?
Thank you in advance for getting this far ;).
The problem is that unlike character literals, x3::space has an attribute. So you don't have an attribute of two separate character sequences separated by whitespace, but rather an attribute of one big character sequence which includes the whitespace.
The omit directive is what you're after, and with that single addition your 'not working code' works. :-]
// Example program
#include <string>
#include <iostream>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
struct employee {
std::string name;
std::string location;
};
BOOST_FUSION_ADAPT_STRUCT(employee, name, location)
x3::rule<struct parse_emp_id, employee> const parse_emp = "Employee Parser";
auto parse_emp_def
= +x3::ascii::alnum
>> x3::omit[+x3::space]
>> +x3::ascii::alnum
;
BOOST_SPIRIT_DEFINE(parse_emp)
int main()
{
std::string const input = "Joe Fairbanks";
employee ret;
x3::parse(input.begin(), input.end(), parse_emp, ret);
std::cout << "Name: " << ret.name << "\tLocation: " << ret.location << '\n';
}
Online Demo

How to expand environment variables in .ini files using Boost

I have a INI file like
[Section1]
Value1 = /home/%USER%/Desktop
Value2 = /home/%USER%/%SOME_ENV%/Test
and want to parse it using Boost. I tried using Boost property_tree like
boost::property_tree::ptree pt;
boost::property_tree::ini_parser::read_ini("config.ini", pt);
std::cout << pt.get<std::string>("Section1.Value1") << std::endl;
std::cout << pt.get<std::string>("Section1.Value2") << std::endl;
But it didn't expand the environment variable. Output looks like
/home/%USER%/Desktop
/home/%USER%/%SOME_ENV%/Test
I was expecting something like
/home/Maverick/Desktop
/home/Maverick/Doc/Test
I am not sure if it is even possible with boost property_tree.
I would appreciate any hint to parse this kind of file using boost.
And here's another take on it, using the old crafts:
not requiring Spirit, or indeed Boost
not hardwiring the interface to std::string (instead allowing any combination of input iterators and output iterator)
handling %% "properly" as a single % 1
The essence:
#include <string>
#include <algorithm>
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
template <typename It, typename Out>
Out expand_env(It f, It l, Out o)
{
bool in_var = false;
std::string accum;
while (f!=l)
{
switch(auto ch = *f++)
{
case '%':
if (in_var || (*f!='%'))
{
in_var = !in_var;
if (in_var)
accum.clear();
else
{
accum = safe_getenv(accum);
o = std::copy(begin(accum), end(accum), o);
}
break;
} else
++f; // %% -> %
default:
if (in_var)
accum += ch;
else
*o++ = ch;
}
}
return o;
}
#include <iterator>
std::string expand_env(std::string const& input)
{
std::string result;
expand_env(begin(input), end(input), std::back_inserter(result));
return result;
}
#include <iostream>
#include <sstream>
#include <list>
int main()
{
// same use case as first answer, show `%%` escape
std::cout << "'" << expand_env("Greeti%%ng is %HOME% world!") << "'\n";
// can be done streaming, to any container
std::istringstream iss("Greeti%%ng is %HOME% world!");
std::list<char> some_target;
std::istreambuf_iterator<char> f(iss), l;
expand_env(f, l, std::back_inserter(some_target));
std::cout << "Streaming results: '" << std::string(begin(some_target), end(some_target)) << "'\n";
// some more edge cases uses to validate the algorithm (note `%%` doesn't
// act as escape if the first ends a 'pending' variable)
std::cout << "'" << expand_env("") << "'\n";
std::cout << "'" << expand_env("%HOME%") << "'\n";
std::cout << "'" << expand_env(" %HOME%") << "'\n";
std::cout << "'" << expand_env("%HOME% ") << "'\n";
std::cout << "'" << expand_env("%HOME%%HOME%") << "'\n";
std::cout << "'" << expand_env(" %HOME%%HOME% ") << "'\n";
std::cout << "'" << expand_env(" %HOME% %HOME% ") << "'\n";
}
Which, on my box, prints:
'Greeti%ng is /home/sehe world!'
Streaming results: 'Greeti%ng is /home/sehe world!'
''
'/home/sehe'
' /home/sehe'
'/home/sehe '
'/home/sehe/home/sehe'
' /home/sehe/home/sehe '
' /home/sehe /home/sehe '
1 Of course, "properly" is subjective. At the very least, I think this
would be useful (how else would you configure a value legitimitely containing %?)
is how cmd.exe does it on Windows
I'm pretty sure that this could be done trivally (see my newer answer) using a handwritten parser, but I'm personally a fan of Spirit:
grammar %= (*~char_("%")) % as_string ["%" >> +~char_("%") >> "%"]
[ _val += phx::bind(safe_getenv, _1) ];
Meaning:
take all non-% chars, if any
then take any word from inside %s and pass it through safe_getenv before appending
Now, safe_getenv is a trivial wrapper:
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
Here's a complete minimal implementation:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
std::string expand_env(std::string const& input)
{
using namespace boost::spirit::qi;
using boost::phoenix::bind;
static const rule<std::string::const_iterator, std::string()> compiled =
*(~char_("%")) [ _val+=_1 ]
% as_string ["%" >> +~char_("%") >> "%"] [ _val += bind(safe_getenv, _1) ];
std::string::const_iterator f(input.begin()), l(input.end());
std::string result;
parse(f, l, compiled, result);
return result;
}
int main()
{
std::cout << expand_env("Greeting is %HOME% world!\n");
}
This prints
Greeting is /home/sehe world!
on my box
Notes
this is not optimized (well, not beyond compiling the rule once)
replace_regex_copy would do as nicely and more efficient (?)
see this answer for a slightly more involved 'expansion' engine: Compiling a simple parser with Boost.Spirit
using output iterator instead of std::string for accumulation
allowing nested variables
allowing escapes