boost::regex_search - boost kills my brain cells, again - c++

Good programmers keep simple things easy right?
And it's not like the boost documentation makes your life less uneasy...
All I want is an implementation for:
// fulfils the function of a regex matching where the pattern may match a
// substring instead of the entire string
bool search( std::string, std::string, SomeResultType )
So it can be used as in:
std::string text, pattern;
SomeResultsType match;
if( search( text, pattern, match ) )
{
std::string result = match[0];
if( match[1].matched )
// where this is the second capture group, not recapturing the same group
std::string secondMatch = match[1];
}
I want my client code not to be bothered with templates and iterators... I know, I'm a wuss. After peering for an hour over the template spaghetti in the boost docs for doing something so simple, I feel like my productivity is seriously getting hampered and I don't feel like I've learned anything from it.
boost::regex_match does it pretty simple with boost::cmatch, except that it only matches the whole string, so I've been adapting all my patterns to match the whole strings, but I feel that it is a dirty hack and would prefer some more proper solution. If I would have known it would take this long, I would have stuck with regex_match
Also welcome, a copy of Reading boost documentation for dummies
Next week in Keep it simple and easy with boost, function binders! No, just kidding, I wouldn't do that to anyone.
Thanks for all help

I think you want regex_search: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
Probably this overload is the one you want:
bool regex_search(const basic_string& s,
match_results::const_iterator, Allocator>& m,
const basic_regex& e,
match_flag_type flags = match_default);
That seems to match what you wanted - SomeResultsType is smatch, and you need to convert your pattern to a regex first.

On Windows, you can use the .NET Regex class:
Example (copied from the linked page):
#using <System.dll>
using namespace System;
using namespace System::Text::RegularExpressions;
int main()
{
// Define a regular expression for repeated words.
Regex^ rx = gcnew Regex( "\\b(?<word>\\w+)\\s+(\\k<word>)\\b",static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );
// Define a test string.
String^ text = "The the quick brown fox fox jumped over the lazy dog dog.";
// Find matches.
MatchCollection^ matches = rx->Matches( text );
// Report the number of matches found.
Console::WriteLine( "{0} matches found.", matches->Count );
// Report on each match.
for each (Match^ match in matches)
{
String^ word = match->Groups["word"]->Value;
int index = match->Index;
Console::WriteLine("{0} repeated at position {1}", word, index);
}
}

Related

Checking if a string contains more than just keywords C++

Thank you for clicking on my question.
After countless hours of searching, I have not come across a solution and its quite difficult to search for something you don't know how to properly phrase in a search. Please help me out, I would appreciate it.
The data of the string would be like:
std::string keyword 1 "Hello";
std::string keyword 2 "Ola";
std::string test = Keyword1+Keyword2+keyword2;
Example of what I'm trying to achieve as a pseudocode:
if(test.contains(more then the 2 keywords))
I wanna make sure the string has other text than just the keywords above.
You can remove all instances of these keywords from your data and see what's left. It's not terribly efficient but shouldn't matter for reasonably sized inputs.
bool contains_more_than(std::vector<std::string> const& keywords, std::string sample) {
for (std::string const& keyword: keywords) {
size_t pos;
while ((pos = sample.find(keyword)) != sample.npos) {
sample.replace(pos, keyword.size(), "");
}
}
return !sample.empty();
}
Note that this might fail if some keyword is a substring of another:
contains_more_than({"123", "12345"}, "12345") returns True.
To avoid this you can first sort your keywords by std::string::size:
std::string(keywords.begin(), keywords.end(),
[](std::string const& s1, std::string const& s2) {
return s1.size() > s2.size();
});
Now:
contains_more_than({"12345", "123"}, "12345") returns False
A possible solution: expressed as a regular expression, you are testing whether the string matches ^(Hello|Ola)*$. That is, does the whole string match any number of repeats of "Hello" and/or "Ola" (and with nothing else)? You can use the regex standard library to match regular expressions in C++.

How to use the result of std::regex_search?

I'm simply calling
std::smatch m;
if (std::regex_search
(std::string (strT.GetString ()),
m,
std::regex ("((\\d[\\s_\\-.]*){10,13})")))
{
...
}
I can't for the life of me figure out how to extract the matched values from m.
EVERY SINGLE page on the subject writes it to cout which is worthless to me. I just want to get what's been captured in a string, but no matter what I try it crashes with a "string iterators incompatible" error message.
OK so I tried a few more things and got annoyed at a lot more, most notably about how the same code worked in online testers but not on my computer. I've come down to this
std::string s (strT.GetString ()) ;
std::smatch m;
if (std::regex_search (
s,
m,
std::regex ("((\\d[\\s_\\-.]*){10,13})")))
{
std::string v = m[ 0 ] ;
}
working, but this
std::smatch m;
if (std::regex_search (
std::string (strT.GetString ()),
m,
std::regex ("((\\d[\\s_\\-.]*){10,13})")))
{
std::string v = m[ 0 ] ;
}
Not Working For Some Reason (with the incompatible string iterator error thingy).
There's surely some trick to it. I'll let someone who knows explain it.
You are correct that you can just assign the match to a std::string; you don't have to use the stream insertion feature.
However, your third example crashes because std::smatch holds references/handles to positions in the original source data … which in your crashy case is the temporary strT.GetString() that went out of scope as soon as the regex was done (read here).
Your second example is correct.
I concede that the C++ regex implementation is not entirely intuitive at first glance.

Prebuilt function to find character sequence in a string?

I'm working on a multithreading project where for one segment of the project I need to find if a given character sequence exists within a string. Im wondering if C++/C have any pre-built functions which can handle this, but am having trouble figuring out the exact 'definition' to search for.
I know about 'strtr' and 'find', the issue is the function needs to be able to find a sequence which is SPLIT across a string.
Given the string 'Hello World', I need a function that returns true if the sequence 'H-W-l' exists. Is there anything prebuilt which can handle this?
As far as I know, subsequence searching as such is not part of either the standard C library or the standard C++ library.
However, you can express subsequence searching as either a regular expression or a "glob". Posix mandates both regex and glob matching functions, while the C++ standard library includes regular expressions since C++11. Both of these techniques require modifying the search string:
Regular expression: HWl ⇒ H.*W.*l. regexec will do a search for the regular expression (unless anchored, which this one is not); in C++, you would want to use std::regex_search rather than std::regex_match.
Glob: HWl ⇒ *H*W*l*. Glob matching is always a complete match, although in all the implementations I know of a trailing * is optimized. This is available as the fnmatch function in the Posix header fnmatch.h. For this application, provide 0 for the flags parameter.
If you don't like any of the above, you can use the standard C strchr function in a simple loop:
bool has_subsequence(const char* haystack, const char* needle) {
const char* p;
for (p = haystack; *needle && (p = strchr(p, *needle)); ++needle) {
}
return p != NULL;
}
If I understand correctly, then you're trying to search for chars in a given order but aren't necessarily contiguous. If you're in C++, I don't see why you couldn't use the std::find function under the <algorithm> system header. I would load both into a string and then search as follows:
bool has_noncontig_sequence(const std::string& str, const std::string& subStr)
{
typedef std::string::const_iterator iter;
iter start = str.begin();
// loop over substr and save iterator position;
for (iter i = subStr.begin(); i != subStr.end(); ++i)
start = std::find(start, str.end(), *i);
// check position, if at end, then false;
return start != str.end() ? true : false;
}
The std::find function will position start over the first correct character in str if it can find it and then search for the next. If it can't, then start will be positioned at the end, indicating failure.

Parse string into and unknown amount of regex groups in C++

I know the exact format of the text I should be getting. In particular, it should match a regex with a variable number of groups.
I want to use the C++ regex library to determine (a) if it is valid text, and (b) to parse those groups into a vector. How can I do this? I can find examples online to do (a), but not (b).
#include <string>
#include <regex>
#include <vector>
bool parse_this_text(std::string & text, std::vector<std::string> & group) {
// std::string text_regex = "^([a-z]*)(,[0-9]+)*$"
// if the text matches the regex, return true and parse each group into the vector
// else return false
???
}
Such that the following lines of code return the expected results.
std::vector<std::string> group;
parse_this_text("green,1", group);
// should return true with group = {"green", ",1"};
parse_this_text("yellow", group);
// should return true with group = {"yellow"};
parse_this_text("red,1,2,3", group);
// should return true with group = {"red", ",1", ",2", ",3"};
parse_this_text("blue,1.0,3.0,1,a", group);
// should return false (since it doesn't match the regex)
Thanks!
(?=^([a-zA-Z]*)(?:\,\d+)+$)^.*?(?:((?:\,\d+)+)).*?$
You can use this.This will first validate using lookahead and then return 2 groups.
1) containing name
2) containing all the rest of integers (This can be easily split) or you can use re.findall here
Though it doesnot answer your question fully , it might be of help.
Have a look.
http://regex101.com/r/wE3dU7/3
One option is to scan the string twice, the first time to check for validity and the second time to split it into fields. With the example in the OP, you don't really need regexen to split the line, once you know that it is correct; you can simply split on commas. But for the sake of exposition, you could use a std::regex_token_iterator (assuming you have a C++ library which supports those), something like this:
bool parse_this_text(const std::string& s, std::vector<std::string>& result) {
static const std::regex check("[[:alpha:]][[:alnum:]]*(,[[:digit:]])*",
std::regex_constants::nosubs);
static const std::regex split(",");
if (!std::regex_match(s, check))
return false;
std::sregex_token_iterator tokens(s.begin(), s.end(), split, -1);
result.clear();
std::copy(tokens, std::sregex_token_iterator(), std::back_inserter(result));
return true;
}
For more complicated cases, or applications in which the double scan is undesired, you can tokenize using successive calls to std::regex_search(), supplying the end of the previous match as the starting point, and std::regex_constants::continuous as the match flags; that will anchor each search to the character after the previous match. You could, in that case, use a std::regex_iterator, but I'm not convinced that the resulting code is any simpler.

Using regex_search

I'm trying to parse some text to figure out if it's a link or not. This is the code I have:
smatch m;
regex e("http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+");
bool match = regex_search(proto->message, m, e);
if(match == true) {
chan->Say("Matched link");
}
The error I'm getting is:
/main.cpp|26|error: no matching function for call to ‘regex_search(char [1024], boost::smatch&, boost::regex&)’|
When I take the m out of regex_search it works and returns a boolean, but I want know what the actual match was.
Boost's regex_search does not define such a signature that you're trying to use. In none of the overloads is boost::smatch the second parameter. In fact, smatch isn't used at all. See docs and in particular the example at the bottom of the page.