Using boost spirit, I'd like to extract a string that is followed by some data in parentheses. The relevant string is separated by a space from the opening parenthesis. Unfortunately, the string itself may contain spaces. I'm looking for a concise solution that returns the string without a trailing space.
The following code illustrates the problem:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;
void
test_input(const string &input)
{
string::const_iterator b = input.begin();
string::const_iterator e = input.end();
string parsed;
bool const r = qi::parse(b, e,
*(qi::char_ - qi::char_("(")) >> qi::lit("(Spirit)"),
parsed
);
if(r) {
cout << "PASSED:" << endl;
} else {
cout << "FAILED:" << endl;
}
cout << " Parsed: \"" << parsed << "\"" << endl;
cout << " Rest: \"" << string(b, e) << "\"" << endl;
}
int main()
{
test_input("Fine (Spirit)");
test_input("Hello, World (Spirit)");
return 0;
}
Its output is:
PASSED:
Parsed: "Fine "
Rest: ""
PASSED:
Parsed: "Hello, World "
Rest: ""
With this simple grammar, the extracted string is always followed by a space (that I 'd like to eliminate).
The solution should work within Spirit since this is only part of a larger grammar. (Thus, it would probably be clumsy to trim the extracted strings after parsing.)
Thank you in advance.
Like the comment said, in the case of a single space, you can just hard code it. If you need to be more flexible or tolerant:
I'd use a skipper with raw to "cheat" the skipper for your purposes:
bool const r = qi::phrase_parse(b, e,
qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
qi::space,
parsed
);
This works, and prints
PASSED:
Parsed: "Fine"
Rest: ""
PASSED:
Parsed: "Hello, World"
Rest: ""
See it Live on Coliru
Full program for reference:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;
void
test_input(const string &input)
{
string::const_iterator b = input.begin();
string::const_iterator e = input.end();
string parsed;
bool const r = qi::phrase_parse(b, e,
qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
qi::space,
parsed
);
if(r) {
cout << "PASSED:" << endl;
} else {
cout << "FAILED:" << endl;
}
cout << " Parsed: \"" << parsed << "\"" << endl;
cout << " Rest: \"" << string(b, e) << "\"" << endl;
}
int main()
{
test_input("Fine (Spirit)");
test_input("Hello, World (Spirit)");
return 0;
}
Related
I am a beginner to regex in c++ I was wondering why this code:
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string s = "? 8==2 : true ! false";
boost::regex re("\\?\\s+(.*)\\s*:\\s*(.*)\\s*\\!\\s*(.*)");
boost::sregex_token_iterator p(s.begin(), s.end(), re, -1); // sequence and that reg exp
boost::sregex_token_iterator end; // Create an end-of-reg-exp
// marker
while (p != end)
std::cout << *p++ << '\n';
}
Prints a empty string. I put the regex in regexTester and it matches the string correctly but here when I try to iterate over the matches it returns nothing.
I think the tokenizer is actually meant to split text by some delimiter, and the delimiter is not included. Compare with std::regex_token_iterator:
std::regex_token_iterator is a read-only LegacyForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. It can also be used to access the parts of the sequence that were not matched by the given regular expression (e.g. as a tokenizer).
Indeed you invoke exactly this mode as per the docs:
if submatch is -1, then enumerates all the text sequences that did not match the expression re (that is to performs field splitting).
(emphasis mine).
So, just fix that:
for (boost::sregex_token_iterator p(s.begin(), s.end(), re), e; p != e;
++p)
{
boost::sub_match<It> const& current = *p;
if (current.matched) {
std::cout << std::quoted(current.str()) << '\n';
} else {
std::cout << "non matching" << '\n';
}
}
Other Observations
All the greedy Kleene-stars are recipe for trouble. You won't ever find a second match, because the first one's .* at the end will by definition gobble up all remaining input.
Instead, make them non-greedy (.*?) and or much more precise (like isolating some character set, or mandating non-space characters?).
boost::regex re(R"(\?\s+(.*?)\s*:\s*(.*?)\s*\!\s*(.*?))");
// Or, if you don't want raw string literals:
boost::regex re("\\?\\s+(.*?)\\s*:\\s*(.*?)\\s*\\!\\s*(.*?)");
Live Demo
#include <boost/regex.hpp>
#include <iomanip>
#include <iostream>
#include <string>
int main() {
using It = std::string::const_iterator;
std::string const s =
"? 8==2 : true ! false;"
"? 9==3 : 'book' ! 'library';";
boost::regex re(R"(\?\s+(.*?)\s*:\s*(.*?)\s*\!\s*(.*?))");
{
std::cout << "=== regex_search:\n";
boost::smatch results;
for (It b = s.begin(); boost::regex_search(b, s.end(), results, re); b = results[0].end()) {
std::cout << results.str() << "\n";
std::cout << "remain: " << std::quoted(std::string(results[0].second, s.end())) << "\n";
}
}
std::cout << "=== token iteration:\n";
for (boost::sregex_token_iterator p(s.begin(), s.end(), re), e; p != e;
++p)
{
boost::sub_match<It> const& current = *p;
if (current.matched) {
std::cout << std::quoted(current.str()) << '\n';
} else {
std::cout << "non matching" << '\n';
}
}
}
Prints
=== regex_search:
? 8==2 : true !
remain: "false;? 9==3 : 'book' ! 'library';"
? 9==3 : 'book' !
remain: "'library';"
=== token iteration:
"? 8==2 : true ! "
"? 9==3 : 'book' ! "
BONUS: Parser Expressions
Instead of abusing regexen to do parsing, you could generate a parser, e.g. using Boost Spirit:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
int main() {
std::string const s =
"? 8==2 : true ! false;"
"? 9==3 : 'book' ! 'library';";
using expression = std::string;
using ternary = std::tuple<expression, expression, expression>;
std::vector<ternary> parsed;
auto expr_ = x3::lexeme [+(x3::graph - ';')];
auto ternary_ = "?" >> expr_ >> ":" >> expr_ >> "!" >> expr_;
std::cout << "=== parser approach:\n";
if (x3::phrase_parse(begin(s), end(s), *x3::seek[ ternary_ ], x3::space, parsed)) {
for (auto [cond, e1, e2] : parsed) {
std::cout
<< " condition " << std::quoted(cond) << "\n"
<< " true expression " << std::quoted(e1) << "\n"
<< " else expression " << std::quoted(e2) << "\n"
<< "\n";
}
} else {
std::cout << "non matching" << '\n';
}
}
Prints
=== parser approach:
condition "8==2"
true expression "true"
else expression "false"
condition "9==3"
true expression "'book'"
else expression "'library'"
This is much more extensible, will easily support recursive grammars and will be able to synthesize a typed representation of your syntax tree, instead of just leaving you with scattered bits of string.
So the objective is to not tolerate characters from 80h through FFh in the input string. I was under the impression that
using ascii::char_;
would take care of this. But as you can see in the example code it will happily print Parsing succeeded.
In the following Spirit mailing list post, Joel suggested to let parse to fail on these non-ascii characters. But I'm not sure whether he proceeded in doing so.
[Spirit-general] ascii encoding assert on invalid input ...
Here my example code:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(char_ - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80" "bla bla bla\"" }) {
std::string output;
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "Parsing succeeded\n";
std::cout << "input: " << input << "\n";
std::cout << "output: " << output << "\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
How can I change the example to have Spirit to fail on this invalid input?
Furthermore, but very related, I would like to know how I should use the character parser that defines a char_set encoding. You know char_(charset) from X3 docs: Character Parsers develop branch.
The documentation is lacking so strongly to describe the basic functionality. Why can't the boost top level people force library authors to come with documentation at least on the level of cppreference.com?
Nothing bad about the docs here. It's just a library bug.
Where the code for any_char says:
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) || encoding::ischar(ch_));
}
It should have said
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) && encoding::ischar(ch_));
}
That makes your program behave as expected and required. That behaviour also matches the Qi behaviour:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
int main() {
namespace qi = boost::spirit::qi;
char const* input = "\x80";
assert(!qi::parse(input, input+1, qi::ascii::char_));
}
Filed a bug here: https://github.com/boostorg/spirit/issues/520
You can achieve that by using print parser:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::print;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(print - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80\"", "\"bla bla bla\"" }) {
std::string output;
std::cout << "input: " << input << "\n";
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "output: " << output << "\n";
std::cout << "Parsing succeeded\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
Output:
input: "naughty �"
Parsing failed
input: "bla bla bla"
output: "bla bla bla"
Parsing succeeded
https://wandbox.org/permlink/HSoB8uqMC3WME5yI
It is a surprising fact that for some reason the check for char_ is done only when the sizeof(iterator char type) > sizeof(char):
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <boost/core/demangle.hpp>
#include <typeinfo>
namespace x3 = boost::spirit::x3;
template <typename Char>
void test(Char const* str)
{
std::basic_string<Char> s = str;
std::cout << boost::core::demangle(typeid(Char).name()) << ":\t";
Char c;
auto it = s.begin();
if (x3::parse(it, s.end(), x3::ascii::char_, c) && it == s.end())
std::cout << "OK: " << int(c) << "\n";
else
std::cout << "Failed\n";
}
int main()
{
test("\x80");
test(L"\x80");
test(u8"\x80");
test(u"\x80");
test(U"\x80");
}
Output:
char: OK: -128
wchar_t: Failed
char8_t: OK: 128
char16_t: Failed
char32_t: Failed
https://wandbox.org/permlink/j9PQeRVnGZQeELFA
I am trying to get the current line of the file I am parsing using boost spirit. I created a grammar class and my structures to parse my commands into. I would also like to keep track of which line the command was found on and parse that into my structures as well. I have wrapped my istream file iterator in a multi_pass iterator and then wrapped that in a boost::spirit::classic::position_iterator2. In my rules of my grammar how would I get the current position of the iterator or is this not possible?
Update: It is similar to that problem but I just need to be able to keep a count of all the lines processed. I don't need to do all of the extra buffering that was done in the solution.
Update: It is similar to that problem but I just need to be able to keep a count of all the lines processed. I don't need to do all of the extra buffering that was done in the solution.
Keeping a count of all lines processed is not nearly the same as "getting the current line".
Simple Take
If this is what you need, just check it after the parse:
Live On Wandbox
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <fstream>
#include <set>
namespace qi = boost::spirit::qi;
int main() {
using It = boost::spirit::istream_iterator;
std::ifstream ifs("main.cpp");
boost::spirit::line_pos_iterator<It> f(It(ifs >> std::noskipws)), l;
std::set<std::string> words;
if (qi::phrase_parse(f, l, *qi::lexeme[+qi::graph], qi::space, words)) {
std::cout << "Parsed " << words.size() << " words";
if (!words.empty())
std::cout << " (from '" << *words.begin() << "' to '" << *words.rbegin() << "')";
std::cout << "\nLast line processed: " << boost::spirit::get_line(f) << "\n";
}
}
Prints
Parsed 50 words (from '"' to '}')
Last line processed: 22
Slightly More Complex Take
If you say "no, wait, I really did want to get the current line /while parsing/". The real full monty is here:
boost::spirit access position iterator from semantic actions
Here's the completely trimmed down version using iter_pos:
Live On Wandbox
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <boost/spirit/repository/include/qi_iter_pos.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <fstream>
#include <map>
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
using LineNum = size_t;
struct line_number_f {
template <typename It> LineNum operator()(It it) const { return get_line(it); }
};
static boost::phoenix::function<line_number_f> line_number_;
int main() {
using Underlying = boost::spirit::istream_iterator;
using It = boost::spirit::line_pos_iterator<Underlying>;
qi::rule<It, LineNum()> line_no = qr::iter_pos [ qi::_val = line_number_(qi::_1) ];
std::ifstream ifs("main.cpp");
It f(Underlying{ifs >> std::noskipws}), l;
std::multimap<LineNum, std::string> words;
if (qi::phrase_parse(f, l, +(line_no >> qi::lexeme[+qi::graph]), qi::space, words)) {
std::cout << "Parsed " << words.size() << " words.\n";
if (!words.empty()) {
auto& first = *words.begin();
std::cout << "First word: '" << first.second << "' (in line " << first.first << ")\n";
auto& last = *words.rbegin();
std::cout << "Last word: '" << last.second << "' (in line " << last.first << ")\n";
}
std::cout << "Line 20 contains:\n";
auto p = words.equal_range(20);
for (auto it = p.first; it != p.second; ++it)
std::cout << " - '" << it->second << "'\n";
}
}
Printing:
Parsed 166 words.
First word: '#include' (in line 1)
Last word: '}' (in line 46)
Line 20 contains:
- 'int'
- 'main()'
- '{'
I am trying to create a C++ code that using boost libraries reads an input file like the following,
1 12 13 0 0 1 0 INLE
.
.
.
In this case, I must do an action if the condition specified on the last column of the right is INLE.
I have the following code,
#include <iostream>
#include <fstream>
#include <string>
#include <boost/algorithm/string/predicate.hpp>
int main(int argc, const char * argv[])
{
std::string line;
const std::string B_condition = "INLE";
std::ifstream myfile ("ramp.bnd");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
if (boost::algorithm::ends_with(line,B_condition)==true)
{
std::cout << "Its True! \n"; // just for testing
//add complete code
}
}
myfile.close();
}
else std::cout << "Unable to open file";
return 0;
}
while compiling there are no issues, but when I run, it doesnt shows anything.
By the other side, if I modify my boolean condition to false, it will print "Its true!" the number of lines that my input file has.
What am I doing wrong?
Thanks!!
I can only assume that:
your file contains whitespace at the end (use trim)
your file has windows line ends (CRLF) but you're reading it as UNIX text files, meaning that the lines will include a trailing `\r' (CR) (often shown as ^M in various text editors/pagers).
So, either
fix the line endings
trim whitespace from the lines before comparing
or both
Best: use a 'proper' parser to do the work.
Update adding a quick & dirty approach using Boost Spirit: see it Live On Coliru
int main()
{
std::ifstream myfile("ramp.bnd");
myfile.unsetf(std::ios::skipws);
boost::spirit::istream_iterator f(myfile), l;
using namespace qi;
bool ok = phrase_parse(f, l,
(repeat(7) [ int_ ] >> as_string[lexeme[+(char_ - eol)]])
[ phx::bind(process_line, _1, _2) ]
% eol, // supports CRLF and LF
blank);
if (!ok)
std::cerr << "Parse errors\n";
if (f!=l)
std::cerr << "Remaing input: '" << std::string(f,l) << "'\n";
}
As you can see, it validates the whole line, assuming (for now) that the columns are 7 integer values and a string (e.g. "INLE"). Now, the actual work is much simpler and can be implemented in a separate function:
void process_line(std::vector<int> const& values, std::string const& kind)
{
if (kind == "INLE")
{
std::cout << "Column 1: " << values[0] << "\n";
}
}
The actual processing function doesn't have to meddle with trimming, line ends, even parsing the details columns :)
Full Code for reference
#include <iostream>
#include <fstream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
static const std::string B_condition = "INLE";
void process_line(std::vector<int> const& values, std::string const& kind)
{
if (kind == "INLE")
{
std::cout << "Column 1: " << values[0] << "\n";
}
}
int main()
{
std::ifstream myfile("ramp.bnd");
myfile.unsetf(std::ios::skipws);
boost::spirit::istream_iterator f(myfile), l;
using namespace qi;
bool ok = phrase_parse(f, l,
(repeat(7) [ int_ ] >> as_string[lexeme[+(char_ - eol)]])
[ phx::bind(process_line, _1, _2) ]
% eol, // supports CRLF and LF
blank);
if (!ok)
std::cerr << "Parse errors\n";
if (f!=l)
std::cerr << "Remaing input: '" << std::string(f,l) << "'\n";
}
You don't need a library like boost at all. A solution with pur standard C++ is possible in some lines of code too:
const std::string B_condition = "INLE";
std::ifstream myfile ("ramp.bnd");
for( char c; myfile >> c; )
{
if( std::isdigit(c, myfile.getloc() ) ) // needs #include <locale>
{
int i;
if( myfile.putback(c) >> i )
std::cout << "read " << i << std::endl; // do something with 'i'
}
else
{
std::string token;
if( myfile.putback(c) >> token )
{
if( token == B_condition )
std::cout << B_condition << " found\n";
else
; // no number, no B_condition -> what ever You want to do
}
}
}
I have a INI file like
[Section1]
Value1 = /home/%USER%/Desktop
Value2 = /home/%USER%/%SOME_ENV%/Test
and want to parse it using Boost. I tried using Boost property_tree like
boost::property_tree::ptree pt;
boost::property_tree::ini_parser::read_ini("config.ini", pt);
std::cout << pt.get<std::string>("Section1.Value1") << std::endl;
std::cout << pt.get<std::string>("Section1.Value2") << std::endl;
But it didn't expand the environment variable. Output looks like
/home/%USER%/Desktop
/home/%USER%/%SOME_ENV%/Test
I was expecting something like
/home/Maverick/Desktop
/home/Maverick/Doc/Test
I am not sure if it is even possible with boost property_tree.
I would appreciate any hint to parse this kind of file using boost.
And here's another take on it, using the old crafts:
not requiring Spirit, or indeed Boost
not hardwiring the interface to std::string (instead allowing any combination of input iterators and output iterator)
handling %% "properly" as a single % 1
The essence:
#include <string>
#include <algorithm>
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
template <typename It, typename Out>
Out expand_env(It f, It l, Out o)
{
bool in_var = false;
std::string accum;
while (f!=l)
{
switch(auto ch = *f++)
{
case '%':
if (in_var || (*f!='%'))
{
in_var = !in_var;
if (in_var)
accum.clear();
else
{
accum = safe_getenv(accum);
o = std::copy(begin(accum), end(accum), o);
}
break;
} else
++f; // %% -> %
default:
if (in_var)
accum += ch;
else
*o++ = ch;
}
}
return o;
}
#include <iterator>
std::string expand_env(std::string const& input)
{
std::string result;
expand_env(begin(input), end(input), std::back_inserter(result));
return result;
}
#include <iostream>
#include <sstream>
#include <list>
int main()
{
// same use case as first answer, show `%%` escape
std::cout << "'" << expand_env("Greeti%%ng is %HOME% world!") << "'\n";
// can be done streaming, to any container
std::istringstream iss("Greeti%%ng is %HOME% world!");
std::list<char> some_target;
std::istreambuf_iterator<char> f(iss), l;
expand_env(f, l, std::back_inserter(some_target));
std::cout << "Streaming results: '" << std::string(begin(some_target), end(some_target)) << "'\n";
// some more edge cases uses to validate the algorithm (note `%%` doesn't
// act as escape if the first ends a 'pending' variable)
std::cout << "'" << expand_env("") << "'\n";
std::cout << "'" << expand_env("%HOME%") << "'\n";
std::cout << "'" << expand_env(" %HOME%") << "'\n";
std::cout << "'" << expand_env("%HOME% ") << "'\n";
std::cout << "'" << expand_env("%HOME%%HOME%") << "'\n";
std::cout << "'" << expand_env(" %HOME%%HOME% ") << "'\n";
std::cout << "'" << expand_env(" %HOME% %HOME% ") << "'\n";
}
Which, on my box, prints:
'Greeti%ng is /home/sehe world!'
Streaming results: 'Greeti%ng is /home/sehe world!'
''
'/home/sehe'
' /home/sehe'
'/home/sehe '
'/home/sehe/home/sehe'
' /home/sehe/home/sehe '
' /home/sehe /home/sehe '
1 Of course, "properly" is subjective. At the very least, I think this
would be useful (how else would you configure a value legitimitely containing %?)
is how cmd.exe does it on Windows
I'm pretty sure that this could be done trivally (see my newer answer) using a handwritten parser, but I'm personally a fan of Spirit:
grammar %= (*~char_("%")) % as_string ["%" >> +~char_("%") >> "%"]
[ _val += phx::bind(safe_getenv, _1) ];
Meaning:
take all non-% chars, if any
then take any word from inside %s and pass it through safe_getenv before appending
Now, safe_getenv is a trivial wrapper:
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
Here's a complete minimal implementation:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
std::string expand_env(std::string const& input)
{
using namespace boost::spirit::qi;
using boost::phoenix::bind;
static const rule<std::string::const_iterator, std::string()> compiled =
*(~char_("%")) [ _val+=_1 ]
% as_string ["%" >> +~char_("%") >> "%"] [ _val += bind(safe_getenv, _1) ];
std::string::const_iterator f(input.begin()), l(input.end());
std::string result;
parse(f, l, compiled, result);
return result;
}
int main()
{
std::cout << expand_env("Greeting is %HOME% world!\n");
}
This prints
Greeting is /home/sehe world!
on my box
Notes
this is not optimized (well, not beyond compiling the rule once)
replace_regex_copy would do as nicely and more efficient (?)
see this answer for a slightly more involved 'expansion' engine: Compiling a simple parser with Boost.Spirit
using output iterator instead of std::string for accumulation
allowing nested variables
allowing escapes