How to use the result of std::regex_search?

How to use the result of std::regex_search? - c++

I'm simply calling
std::smatch m;
if (std::regex_search
(std::string (strT.GetString ()),
m,
std::regex ("((\\d[\\s_\\-.]*){10,13})")))
{
...
}
I can't for the life of me figure out how to extract the matched values from m.
EVERY SINGLE page on the subject writes it to cout which is worthless to me. I just want to get what's been captured in a string, but no matter what I try it crashes with a "string iterators incompatible" error message.
OK so I tried a few more things and got annoyed at a lot more, most notably about how the same code worked in online testers but not on my computer. I've come down to this
std::string s (strT.GetString ()) ;
std::smatch m;
if (std::regex_search (
s,
m,
std::regex ("((\\d[\\s_\\-.]*){10,13})")))
{
std::string v = m[ 0 ] ;
}
working, but this
std::smatch m;
if (std::regex_search (
std::string (strT.GetString ()),
m,
std::regex ("((\\d[\\s_\\-.]*){10,13})")))
{
std::string v = m[ 0 ] ;
}
Not Working For Some Reason (with the incompatible string iterator error thingy).
There's surely some trick to it. I'll let someone who knows explain it.

You are correct that you can just assign the match to a std::string; you don't have to use the stream insertion feature.
However, your third example crashes because std::smatch holds references/handles to positions in the original source data … which in your crashy case is the temporary strT.GetString() that went out of scope as soon as the regex was done (read here).
Your second example is correct.
I concede that the C++ regex implementation is not entirely intuitive at first glance.

Related

Checking if a string contains more than just keywords C++

Thank you for clicking on my question.
After countless hours of searching, I have not come across a solution and its quite difficult to search for something you don't know how to properly phrase in a search. Please help me out, I would appreciate it.
The data of the string would be like:
std::string keyword 1 "Hello";
std::string keyword 2 "Ola";
std::string test = Keyword1+Keyword2+keyword2;
Example of what I'm trying to achieve as a pseudocode:
if(test.contains(more then the 2 keywords))
I wanna make sure the string has other text than just the keywords above.

You can remove all instances of these keywords from your data and see what's left. It's not terribly efficient but shouldn't matter for reasonably sized inputs.
bool contains_more_than(std::vector<std::string> const& keywords, std::string sample) {
for (std::string const& keyword: keywords) {
size_t pos;
while ((pos = sample.find(keyword)) != sample.npos) {
sample.replace(pos, keyword.size(), "");
}
}
return !sample.empty();
}
Note that this might fail if some keyword is a substring of another:
contains_more_than({"123", "12345"}, "12345") returns True.
To avoid this you can first sort your keywords by std::string::size:
std::string(keywords.begin(), keywords.end(),
[](std::string const& s1, std::string const& s2) {
return s1.size() > s2.size();
});
Now:
contains_more_than({"12345", "123"}, "12345") returns False

A possible solution: expressed as a regular expression, you are testing whether the string matches ^(Hello|Ola)*$. That is, does the whole string match any number of repeats of "Hello" and/or "Ola" (and with nothing else)? You can use the regex standard library to match regular expressions in C++.

Regex matches under g++ 4.9 but fails under g++-5.3.1

I am tokenizing a string with a regex; this works normally under g++-4.9, but fails under g++-5.3.1.
I have the following txt file:
0001-SCAND ==> "Scandaroon" (from Philjumba)
0002-KINVIN ==> "King's Vineyard" (from Philjumba)
0003-HANNI ==> "Hannibal: Rome vs. Carthage" (from Philjumba)
0004-LOX ==> "Lords of Xidit" (from Philjumba)
which I am tokenizing using regular expressions, by spaces, quotation marks pairs and parentheses pairs. For example, the first line should be tokenized as follows:
0001-SCAND
==>
"Scandaroon"
(from Philjumba)
I have written the following std::regex:
std::regex FPAT("(\\S+)|(\"[^\"]*\")|(\\([^\\)]+\\))";
And I am tokenizing the string with:
std::vector<std::string>
split( const std::string & input, const std::regex & regex ) {
std::sregex_token_iterator
first{input.begin(), input.end(), regex, 0},
last;
return {first, last};
}
This returns the matches. Under g++-4.9 the string is tokenized as requested, but under g++-5.3.1 it's tokenized as follows:
0001-SCAND
==>
"Scandaroon"
(from
Philjumba)
or the third line is tokenized as follows:
0003-HANNI
==>
"Hannibal:
Rome
vs.
Carthage"
(from
Philjumba)
What could the issue be?
edit: I am calling the function as follows:
std::string line("0001-SCAND ==> \"Scandaroon\" (from Philjumba)");
auto elems = split( line, FPAT );
edit: following feedback from #xaxxon, I replaced returning the iterator by a vector, but it's still not working correctly under g++-5.3.
std::vector<std::string>
split( const std::string & input, const std::regex & regex ) {
std::sregex_token_iterator
first{input.begin(), input.end(), regex, 0},
last;
std::vector< std::string > elems;
elems.reserve( std::distance(first,last) );
for ( auto it = first; it != last; ++ it ) {
//std::cout << (*it) << std::endl;
elems.push_back( *it );
}
return elems;
}

Regular expression is Eager
so for a regular expression "Set|SetValue" and the text "SetValue", regex founds "Set".
You have to choose order carefully:
std::regex FPAT(R"(("[^\"]*\")|(\([^\)])+\)|(\S+))");
\S+ at the end to be the last considered.
An other alternative is to use not the default option (see http://en.cppreference.com/w/cpp/regex/syntax_option_type)
and use std::::regex::extended
std::regex FPAT(R"((\S+)|("[^\"]*\")|(\([^\)])+\))", std::::regex::extended);
So it seems that g++-5.3.1 has fixed a bug since g++-4.9 in this regard.

You don't post enough for me to know for sure (you updated it showing you are calling it with an lvalue, so this post probably doesn't pertain, but I'll leave it up unless people want me to take it down), but if you're doing what I did, you forgot that the iterators are into the source string and that string is no longer valid.
You could remove the const from input, but it's so damn convenient to be able to put an rvalue there, so.....
Here's what I do to avoid this - I return a unique_ptr to something that looks like the results, but I hide the actual source string along with it so the strsing can't go away before I'm done using it. This is likely UB, but I think it will work virtually all the time:
// Holds a regex match as well as the original source string so the matches remain valid as long as the
// caller holds on to this object - but it acts just like a std::smatch
struct MagicSmatch {
std::smatch match;
std::string data;
// constructor makes a copy of the string and associates
// the copy's lifetime with the iterators into the string (the smatch)
MagicSmatch(const std::string & data) : data(data)
{}
};
// this deleter knows about the hidden string and makes sure to delete it
// this cast is probably UB because std::smatch isn't a standard layout type
struct MagicSmatchDeleter {
void operator()(std::smatch * smatch) {
delete reinterpret_cast<MagicSmatch *>(smatch);
}
};
// the caller just thinks they're getting a smatch ptr.. but we know the secret
std::unique_ptr<std::smatch, MagicSmatchDeleter> regexer(const std::regex & regex, const std::string & source)
{
auto magic_smatch = new MagicSmatch(source);
std::regex_search(magic_smatch->data, magic_smatch->match, regex);
return std::unique_ptr<std::smatch, MagicSmatchDeleter>(reinterpret_cast<std::smatch *>(magic_smatch));
}
as long as you call it as auto results = regexer(....) then it's quite easy to use, though results is a pointer, not a proper smatch, so the [] syntax doesn't work as nicely.

boost::regex_search refuses to take my arguments

I'm struggling on this one and I'm to a point where I not making any headway and it's time to ask for help. My familiarity with the boost libraries is only slightly better than superficial. I'm trying to do a progressive scan through a rather large string. In fact, it's the entire contents of a file read into a std::string object (the file isn't going to be that large, it's the output from a command line program).
The output of this program, pnputil, is repetitive. I'm looking for certain patterns in an effort to find the "oemNNN.inf" file I want. Essentially, my algorithm is to find the first "oemNNN.inf", search for identifying characteristics for that file. If it's not the one I want, move on to the next.
In code, it's something like:
std::string filesContents;
std::string::size_type index(filesContents.find_first_of("oem"));
std::string::iterator start(filesContents.begin() + index);
boost::match_results<std::string::const_iterator> matches;
while(!found) {
if(boost::regex_search(start, filesContents.end(), matches, re))
{
// do important stuff with the matches
found = true; // found is used outside of loop too
break;
}
index = filesContents.find_first_of("oem", index + 1);
if(std::string::npos == index) break;
start = filesContents.being() + index;
}
I'm using this example from the boost library documentation for 1.47 (the version I'm using). Someone please explain to me how my usage differs from what this example has (aside from the fact that I'm not storing stuff into maps and such).
From what I can tell, I'm using the same type of iterators the example uses. Yet, when I compile the code, Microsoft's compiler tells me that: no instance of overloaded function boost::regex_search matches argument list. Yet, the intellisense shows this function with the arguments I'm using, although the iterators are named something BidiIterator. I don't know the significance of this, but given the example, I'm assuming that whatever the BidiIterator is, it takes a std::string::iterator for construction (perhaps a bad assumption, but seems to make sense given the example). The example does show a fifth argument, match_flags, but that argument is defaulted to the value: boost::match_default. Therefore, it should be unnecessary. However, just for kicks and grins, I've added that fifth argument and still it doesn't work. How am I misusing the arguments? Especially, when considering the example.
Below is a simple program which demonstrates the problem without the looping algorithm.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string haystack("This is a string which contains stuff I want to find");
boost::regex needle("stuff");
boost::match_results<std::string::const_iterator> what;
if(boost::regex_search(haystack.begin(), haystack.end(), what, needle, boost::match_default)) {
std::cout << "Found some matches" << std::endl;
std::cout << what[0].first << std::endl;
}
return 0;
}
If you decide to compile, I am compiling and linking against 1.47 of the boost library. The project that I'm working with uses this version extensively and updating isn't for me to decide.
Thanks for any help. This is most frustrating.
Andy

In general iterator's types are different.
std::string haystack("This is a string which contains stuff I want to find");
returning values from begin() and end() will be std::string::iterator.
But your match type is
boost::match_results<std::string::const_iterator> what;
std::string::iterator and std::string::const_iterator are different types.
So there is few variants
declare string as const (i.e. const std::string haystack;)
declare iterators as const_iterators (i.e. std::string::const_iterator begin = haystack.begin(), end = haystack.end();) and pass them to regex_search.
use boost::match_results<std::string::iterator> what;
if you have C++11 you can use haystack.cbegin() and haystack.cend()
example of work

boost::regex_search - boost kills my brain cells, again

Good programmers keep simple things easy right?
And it's not like the boost documentation makes your life less uneasy...
All I want is an implementation for:
// fulfils the function of a regex matching where the pattern may match a
// substring instead of the entire string
bool search( std::string, std::string, SomeResultType )
So it can be used as in:
std::string text, pattern;
SomeResultsType match;
if( search( text, pattern, match ) )
{
std::string result = match[0];
if( match[1].matched )
// where this is the second capture group, not recapturing the same group
std::string secondMatch = match[1];
}
I want my client code not to be bothered with templates and iterators... I know, I'm a wuss. After peering for an hour over the template spaghetti in the boost docs for doing something so simple, I feel like my productivity is seriously getting hampered and I don't feel like I've learned anything from it.
boost::regex_match does it pretty simple with boost::cmatch, except that it only matches the whole string, so I've been adapting all my patterns to match the whole strings, but I feel that it is a dirty hack and would prefer some more proper solution. If I would have known it would take this long, I would have stuck with regex_match
Also welcome, a copy of Reading boost documentation for dummies
Next week in Keep it simple and easy with boost, function binders! No, just kidding, I wouldn't do that to anyone.
Thanks for all help

I think you want regex_search: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
Probably this overload is the one you want:
bool regex_search(const basic_string& s,
match_results::const_iterator, Allocator>& m,
const basic_regex& e,
match_flag_type flags = match_default);
That seems to match what you wanted - SomeResultsType is smatch, and you need to convert your pattern to a regex first.

On Windows, you can use the .NET Regex class:
Example (copied from the linked page):
#using <System.dll>
using namespace System;
using namespace System::Text::RegularExpressions;
int main()
{
// Define a regular expression for repeated words.
Regex^ rx = gcnew Regex( "\\b(?<word>\\w+)\\s+(\\k<word>)\\b",static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );
// Define a test string.
String^ text = "The the quick brown fox fox jumped over the lazy dog dog.";
// Find matches.
MatchCollection^ matches = rx->Matches( text );
// Report the number of matches found.
Console::WriteLine( "{0} matches found.", matches->Count );
// Report on each match.
for each (Match^ match in matches)
{
String^ word = match->Groups["word"]->Value;
int index = match->Index;
Console::WriteLine("{0} repeated at position {1}", word, index);
}
}

Boost phoenix or lambda library problem: removing elements from a std::vector

I recently ran into a problem that I thought boost::lambda or boost::phoenix could help be solve, but I was not able to get the syntax right and so I did it another way. What I wanted to do was remove all the elements in "strings" that were less than a certain length and not in another container.
This is my first try:
std::vector<std::string> strings = getstrings();
std::set<std::string> others = getothers();
strings.erase(std::remove_if(strings.begin(), strings.end(), (_1.length() < 24 && others.find(_1) == others.end())), strings.end());
How I ended up doing it was this:
struct Discard
{
bool operator()(std::set<std::string> &cont, const std::string &s)
{
return cont.find(s) == cont.end() && s.length() < 24;
}
};
lines.erase(std::remove_if( lines.begin(), lines.end(), boost::bind<bool>(Discard(), old_samples, _1)), lines.end());

You need boost::labmda::bind to lambda-ify function calls, for example the length < 24 part becomes:
bind(&string::length, _1) < 24
EDIT
See "Head Geek"'s post for why set::find is tricky. He got it to resolve the correct set::find overload (so I copied that part), but he missed an essential boost::ref() -- which is why the comparison with end() always failed (the container was copied).
int main()
{
vector<string> strings = getstrings();
set<string> others = getothers();
set<string>::const_iterator (set<string>::*findFn)(const std::string&) const = &set<string>::find;
strings.erase(
remove_if(strings.begin(), strings.end(),
bind(&string::length, _1) < 24 &&
bind(findFn, boost::ref(others), _1) == others.end()
), strings.end());
copy(strings.begin(), strings.end(), ostream_iterator<string>(cout, ", "));
return 0;
}

The main problem, other than the bind calls (Adam Mitz was correct on that part), is that std::set<std::string>::find is an overloaded function, so you can't specify it directly in the bind call. You need to tell the compiler which find to use, like so:
using namespace boost::lambda;
typedef std::vector<std::string> T1;
typedef std::set<std::string> T2;
T1 strings = getstrings();
T2 others = getothers();
T2::const_iterator (T2::*findFn)(const std::string&) const=&T2::find;
T2::const_iterator othersEnd=others.end();
strings.erase(std::remove_if(strings.begin(), strings.end(),
(bind(&std::string::length, _1) < 24
&& bind(findFn, boost::ref(others), _1) == othersEnd)),
strings.end());
This compiles, but it doesn't work properly, for reasons I haven't yet figured out... the find function is never returning others.end(), so it's never deleting anything. Still working on that part.
EDIT: Correction, the find function is returning others.end(), but the comparison isn't recognizing it. I don't know why.
LATER EDIT: Thanks to Adam's comment, I see what was going wrong, and have corrected the problem. It now works as intended.
(Look at the edit history if you want to see my full test program.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to use the result of std::regex_search? - c++

Related

Checking if a string contains more than just keywords C++

Regex matches under g++ 4.9 but fails under g++-5.3.1

boost::regex_search refuses to take my arguments

boost::regex_search - boost kills my brain cells, again

Boost phoenix or lambda library problem: removing elements from a std::vector

Categories

Resources