Using boost::spirit to match words

Using boost::spirit to match words - c++

I want to create a parser that will match exactly two alphanumeric words from a string, such as:
message1 message2
and then save that into two variables of type std::string.
I've read this previous answer which seems to work for an endless amount of repetitions, which uses the following parser:
+qi::alnum % +qi::space
However when I try to do this:
bool const result = qi::phrase_parse(
input.begin(), input.end(),
+qi::alnum >> +qi::alnum,
+qi::space,
words
);
the words vector contains every single letter in a different string:
't'
'h'
'i'
's'
'i'
's'
This is extremely counter-intuitive, and I'm not sure as to why it's happening. Could someone please explain that?
Also, can I have two predefined strings to be populated instead of a std::vector?
Final note: I would like to avoid the using statement, as I would like to have every namespace clearly defined to help me understand how Spirit works.

Yes, but the skipper ignores the whitespace before you can act on it.
Use lexeme to control the skipper:
bool const result = qi::phrase_parse(
input.begin(), input.end(),
qi::lexeme [+qi::alnum] >> qi::lexeme [+qi::alnum],
qi::space,
words
);
Note the skipper should be qi::space instead of +qi::space.
See also Boost spirit skipper issues

Related

Boost Spirit optional parser and backtracking

Why this parser leave 'b' in attributes, even if option wasn't matched?
using namespace boost::spirit::qi;
std::string str = "abc";
auto a = char_("a");
auto b = char_("b");
qi::rule<std::string::iterator, std::string()> expr;
expr = +a >> -(b >> +a);
std::string res;
bool r = qi::parse(
str.begin(),
str.end(),
expr >> lit("bc"),
res
);
It parses successfully, but res is "ab".
If parse "abac" with expr alone, option is matched and attribute is "aba".
Same with "aac", option doesn't start to match and attribute is "aa".
But with "ab", attribute is "ab", even though b gets backtracked, and, as in example, matched with next parser.
UPD
With expr.name("expr"); and debug(expr); I got
<expr>
<try>abc</try>
<success>bc</success>
<attributes>[[a, b]]</attributes>
</expr>

Firstly, it's UB to use the auto variables to keep the expression templates, because they hold references to the temporaries "a" and "b" [1].
Instead write
expr = +qi::char_("a") >> -(qi::char_("b") >> +qi::char_("a"));
or, if you insist:
auto a = boost::proto::deep_copy(qi::char_("a"));
auto b = boost::proto::deep_copy(qi::char_("b"));
expr = +a >> -(b >> +a);
Now noticing the >> lit("bc") part hiding in the parse call, suggests you may expect backtracking to on succesfully matched tokens when a parse failure happens down the road.
That doesn't happen: Spirit generates PEG grammars, and always greedily matches from left to right.
On to the sample given, ab results, even though backtracking does occur, the effects on the attribute are not rolled back without qi::hold: Live On Coliru
Container attributes are passed along by ref and the effects of previous (successful) expressions is not rolled back, unless you tell Spirit too. This way, you can "pay for what you use" (as copying temporaries all the time would be costly).
See e.g.
boost::spirit::qi duplicate parsing on the output
Understanding Boost.spirit's string parser
Boost spirit revert parsing
<a>
<try>abc</try>
<success>bc</success>
<attributes>[a]</attributes>
</a>
<a>
<try>bc</try>
<fail/>
</a>
<b>
<try>bc</try>
<success>c</success>
<attributes>[b]</attributes>
</b>
<a>
<try>c</try>
<fail/>
</a>
<bc>
<try>bc</try>
<success></success>
<attributes>[]</attributes>
</bc>
Success: 'ab'
[1] see here:
Assigning parsers to auto variables
Generating Spirit parser expressions from a variadic list of alternative parser expressions
boost spirit V2 qi bug associated with optimization level

Quoting #sehe from this SO question
A string attribute is a container attribute and many elements could be
assigned into it by different parser subexpressions. Now for
efficiency reasons, Spirit doesn't rollback the values of emitted
attributes on backtracking.
So, I've put optional parser on hold, and it's done.
expr = +qi::char_("a") >> -(qi::hold[qi::char_("b") >> +qi::char_("a")]);
For more information see mentioned question and hold docs

Vector push_back on duplicate strings with the help of delimiter

I am trying to read the PATH Environment variable and remove any duplicates that are present in it using vector functionalities such as - sort, erase and unique. But as I've seen vector will delimit each element default by newline. When I get the path as C:\Program Files(x86)\..., its breaking at C:/ Program. This is my code so far:
char *path = getenv("PATH");
char str[10012] = "";
strcpy(str,path);
string strr(str);
vector<string> vec;
stringstream ss(strr);
string s;
while(ss >> s)
{
push_back(s);
}
sort(vec.begin(),vec.end());
vec.erase(unique(vec.begin(),vec.end()),vec.end());
for(unsigned i=0;i<vec.size();i++)
{
cout<<vec[i]<<endl;
}
Is it the delimiter problem? I need to pus_back at every ; and search for duplicates. Can anyone help me in this regard.

I would use a stringstream to chop it up, and the use a set to ensure there are no duplicates.
std::string p { std::getenv("PATH") }
std::set<string> set;
std::stringstream ss { p };
std::string s;
while(std::getline(ss, s, ':')) //this might need to be ';' for windows
{
set.insert(s);
}
for(const auto& elem : set)
std::cout << elem << std::endl;
Should you need to use a vector for some reason, you'd want to sort it with std::sort then remove duplicates with std::unique then erase the slack with erase.
std::sort(begin(vec), end(vec));
auto it=std::unique(begin(vec), end(vec));
vec.erase(it, end(vec));
EDIT: link to docs
http://en.cppreference.com/w/cpp/container/set
http://en.cppreference.com/w/cpp/algorithm/unique
http://en.cppreference.com/w/cpp/algorithm/sort

For this task it is better to use std::set<std::string> which will eliminate duplicates automatically. To read in PATH, use strtok to split it into substrings.

You need to use a different delimiter (':' or ';' to split the directories from the PATH, depending on the system). For instance, you can have a look at the std::getline() function to replace your current while () / push_back loop. This function allows you to specify a custom delimiter and would be a drop-in replacement in your code.

It isn't so much that std::vector<T> is delimiting anything but that the formatted input operator (operator>>()) for strings uses whitespace as delimiters. Other already posted about using std::getline() and the like. There are two other approaches:
Change what is considered to be whitespace for the stream! The std::string input operator uses the stream's std::locale object to obtain a std::ctype<char> facet which can be replaced. The std::ctype<char> facet has functions to do character classification and it can be used to consider, e.g., the character ';' as a space. It is a bit involved but a more solid approach than the next one.
I don't think path components can include newlines, i.e., a simple approach could be to replace all semicolons by newlines before reading the components:
std::string path(std::getenv("PATH"));
std::replace(path.begin(), path.end(), path.begin(), ';', '\n');
std::istringstream pin(path);
std::istream_iterator<std::string> pbegin(pin), pend;
std::vector<std::string> vec(pbegin, pend);
This approach may have the problem that the PATH may contain components which contain spaces: these would be split into individual object. You might want to replace spaces with another character (e.g., the now unused ';') and restore those at an appropriate to become spaces.

Finding if a string is 'numeric only' using tr1 regex

tIs it possible for me to detect if a string is 'all numeric' or not using tr1 regex?
If yes, please help me with a snipped as well since I am new to regex.
Why I am looking towards tr1 regex for something like this, because I don't want to create a separate function for detecting if the string is numeric. I want to do it inline in rest of the client code but do not want it to look ugly as well. I feel maybe tr1 regex might help. Not sure, any advises on this?

If you just want to test whether the string has all numeric characters, you can use std::find_if_not and std::isdigit:
std::find_if_not(s.begin(), s.end(), (int(*)(int))std::isdigit) == s.end()
If you do not have a Standard Library implementation with std::find_if_not, you can easily write it:
template <typename ForwardIt, typename Predicate>
ForwardIt find_if_not(ForwardIt first, ForwardIt last, Predicate pred)
{
for (; first != last; ++first)
if (!pred(first))
return first;
return first;
}

You can use the string::find_first_not_of member function to test for numeric characters.
if (mystring.find_first_not_of("0123456789") == std::string::npos)
{
std::cout << "numeric only!";
}

The regular expression for this is rather trivial. Just try to match "\\D". This will match on any character that's not a digit. If you'd like it to include a decimal separator too, you could use "[^\\d\\.]", which translates to "not a digit or dot".
However, how about simply using strtol() to read the number? You'll be able to retrieve a pointer to the first non-number character. So, if this points to the end of the string, it's been fine. Plus side here is, you won't even need TR1 for this.

C++: storing CSV in contianer

I have a std::string that contains comma separated values, i need to store those values in some suitable container e.g. array, vector or some other container. Is there any built in function through which i could do this? Or i need to write custom code for this?

If you're willing and able to use the Boost libraries, Boost Tokenizer would work really well for this task.
That would look like:
std::string str = "some,comma,separated,words";
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
boost::char_separator<char> sep(",");
tokenizer tokens(str, sep);
std::vector<std::string> vec(tokens.begin(), tokens.end());

You basically need to tokenize the string using , as the delimiter. This earlier Stackoverflow thread shall help you with it.
Here is another relevant post.

I don't think there is any available in the standard library. I would approach like -
Tokenize the string based on , delimeter using strtok.
Convert it to integer using atoi function.
push_back the value to the vector.
If you are comfortable with boost library, check this thread.

Using AXE parser generator you can easily parse your csv string, e.g.
std::string input = "aaa,bbb,ccc,ddd";
std::vector<std::string> v; // your strings get here
auto value = *(r_any() - ',') >> r_push_back(v); // rule for single value
auto csv = *(value & ',') & value & r_end(); // rule for csv string
csv(input.begin(), input.end());
Disclaimer: I didn't test the code above, it might have some superficial errors.

Parsing a pair of ints with boost spirit

I have the following code:
std::string test("1.1");
std::pair<int, int> d;
bool r = qi::phrase_parse(
test.begin(),
test.end(),
qi::int_ >> '.' >> qi::int_,
space,
d
);
So I'm trying to parse the string test and place the result in the std::pair d. However it is not working, I suspect it has to do with the Compound Attribute Rules.
Any hints to how to get this working?
The compiler error is the following:
error: no matching function for call
to 'std::pair::pair(const
int&)'

It should work. What people forget very often is to add a
#include <boost/fusion/include/std_pair.hpp>
to their list of includes. This is necessary to make std::pair a full blown Fusion citizen.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using boost::spirit to match words - c++

Related

Boost Spirit optional parser and backtracking

Vector push_back on duplicate strings with the help of delimiter

Finding if a string is 'numeric only' using tr1 regex

C++: storing CSV in contianer

Parsing a pair of ints with boost spirit

Categories

Resources