Using regex_search - c++

I'm trying to parse some text to figure out if it's a link or not. This is the code I have:
smatch m;
regex e("http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+");
bool match = regex_search(proto->message, m, e);
if(match == true) {
chan->Say("Matched link");
}
The error I'm getting is:
/main.cpp|26|error: no matching function for call to ‘regex_search(char [1024], boost::smatch&, boost::regex&)’|
When I take the m out of regex_search it works and returns a boolean, but I want know what the actual match was.

Boost's regex_search does not define such a signature that you're trying to use. In none of the overloads is boost::smatch the second parameter. In fact, smatch isn't used at all. See docs and in particular the example at the bottom of the page.

Related

How to match a vector of regular expression with a one string?

If I want to verify one string is completely matches with the any one in the vector of strings then i will use
std::find(vectOfStrings.begin(), vectOfStrings.end(), "<targetString>") != v.end()
If the target string matches with any of the string in the vector then it will return true.
But what if i want to check one string is matches with any one of the vector of regular expressions?
Is there any standard library i can use to make it work like
std::find(vectOfRegExprsns.begin(), vectOfRegExprsns.end(), "<targetString>") != v.end()?
Any suggestions would be highly appreciated.
How about using std::find_if() with a lambda?
std::find_if(
vectOfRegExprsns.begin(), vectOfRegExprsns.end(),
[](const std::string& item) { return regex_match(item, std::regex(targetString))});

Prebuilt function to find character sequence in a string?

I'm working on a multithreading project where for one segment of the project I need to find if a given character sequence exists within a string. Im wondering if C++/C have any pre-built functions which can handle this, but am having trouble figuring out the exact 'definition' to search for.
I know about 'strtr' and 'find', the issue is the function needs to be able to find a sequence which is SPLIT across a string.
Given the string 'Hello World', I need a function that returns true if the sequence 'H-W-l' exists. Is there anything prebuilt which can handle this?
As far as I know, subsequence searching as such is not part of either the standard C library or the standard C++ library.
However, you can express subsequence searching as either a regular expression or a "glob". Posix mandates both regex and glob matching functions, while the C++ standard library includes regular expressions since C++11. Both of these techniques require modifying the search string:
Regular expression: HWl ⇒ H.*W.*l. regexec will do a search for the regular expression (unless anchored, which this one is not); in C++, you would want to use std::regex_search rather than std::regex_match.
Glob: HWl ⇒ *H*W*l*. Glob matching is always a complete match, although in all the implementations I know of a trailing * is optimized. This is available as the fnmatch function in the Posix header fnmatch.h. For this application, provide 0 for the flags parameter.
If you don't like any of the above, you can use the standard C strchr function in a simple loop:
bool has_subsequence(const char* haystack, const char* needle) {
const char* p;
for (p = haystack; *needle && (p = strchr(p, *needle)); ++needle) {
}
return p != NULL;
}
If I understand correctly, then you're trying to search for chars in a given order but aren't necessarily contiguous. If you're in C++, I don't see why you couldn't use the std::find function under the <algorithm> system header. I would load both into a string and then search as follows:
bool has_noncontig_sequence(const std::string& str, const std::string& subStr)
{
typedef std::string::const_iterator iter;
iter start = str.begin();
// loop over substr and save iterator position;
for (iter i = subStr.begin(); i != subStr.end(); ++i)
start = std::find(start, str.end(), *i);
// check position, if at end, then false;
return start != str.end() ? true : false;
}
The std::find function will position start over the first correct character in str if it can find it and then search for the next. If it can't, then start will be positioned at the end, indicating failure.

Parse string into and unknown amount of regex groups in C++

I know the exact format of the text I should be getting. In particular, it should match a regex with a variable number of groups.
I want to use the C++ regex library to determine (a) if it is valid text, and (b) to parse those groups into a vector. How can I do this? I can find examples online to do (a), but not (b).
#include <string>
#include <regex>
#include <vector>
bool parse_this_text(std::string & text, std::vector<std::string> & group) {
// std::string text_regex = "^([a-z]*)(,[0-9]+)*$"
// if the text matches the regex, return true and parse each group into the vector
// else return false
???
}
Such that the following lines of code return the expected results.
std::vector<std::string> group;
parse_this_text("green,1", group);
// should return true with group = {"green", ",1"};
parse_this_text("yellow", group);
// should return true with group = {"yellow"};
parse_this_text("red,1,2,3", group);
// should return true with group = {"red", ",1", ",2", ",3"};
parse_this_text("blue,1.0,3.0,1,a", group);
// should return false (since it doesn't match the regex)
Thanks!
(?=^([a-zA-Z]*)(?:\,\d+)+$)^.*?(?:((?:\,\d+)+)).*?$
You can use this.This will first validate using lookahead and then return 2 groups.
1) containing name
2) containing all the rest of integers (This can be easily split) or you can use re.findall here
Though it doesnot answer your question fully , it might be of help.
Have a look.
http://regex101.com/r/wE3dU7/3
One option is to scan the string twice, the first time to check for validity and the second time to split it into fields. With the example in the OP, you don't really need regexen to split the line, once you know that it is correct; you can simply split on commas. But for the sake of exposition, you could use a std::regex_token_iterator (assuming you have a C++ library which supports those), something like this:
bool parse_this_text(const std::string& s, std::vector<std::string>& result) {
static const std::regex check("[[:alpha:]][[:alnum:]]*(,[[:digit:]])*",
std::regex_constants::nosubs);
static const std::regex split(",");
if (!std::regex_match(s, check))
return false;
std::sregex_token_iterator tokens(s.begin(), s.end(), split, -1);
result.clear();
std::copy(tokens, std::sregex_token_iterator(), std::back_inserter(result));
return true;
}
For more complicated cases, or applications in which the double scan is undesired, you can tokenize using successive calls to std::regex_search(), supplying the end of the previous match as the starting point, and std::regex_constants::continuous as the match flags; that will anchor each search to the character after the previous match. You could, in that case, use a std::regex_iterator, but I'm not convinced that the resulting code is any simpler.

Finding in a std map using regex

I want to know how I can look up for an item in a map using a regex function. In my case, I have a map with expressions such as en*, es*, en-AU and so on and I have a string of possible values like en, en-US, en-GB, es-CL and so on.
I want to search using that string to find that item in the map.
Looking for the key without the wildcard first then looking for the key with the wildcard as the 2nd priority.
Please help me out with this problem or if this is an inefficient or if any one has a different approach please tell me another method for that. I use C++ with boost and stl.
If the map is small or the search is executed rarely, then just iterate through the map and match each key with the regular expression.
Otherwise: If the regular expression is used only for some kind of prefix search, you can use the member function lower_bound to efficiently find all entries with the given prefix. For example the following function first looks for an entry that matches exactly. If no such entry exists, the function returns the range of all entries with a matching prefix.
using items = std::map<std::string, item>;
auto lookup(const items& items, const std::string& key)
-> std::pair<items::const_iterator, items::const_iterator>
{
auto p = items.lower_bound(key);
auto q = items.end();
if (p != q && p->first == key) {
return std::make_pair(p, std::next(p));
} else {
auto r = p;
while (r != q && r->first.compare(0, key.size(), key) == 0) {
++r;
}
return std::make_pair(p, r);
}
}
Otherwise: If you have to cope with regular expressions or wildcards, then you can combine the two approaches. First search for an entry that matches exactly with the member function find. If no such entry exists, then extract the constant prefix from the regular expression. The prefix may be empty. Use the member function lower_bound to find the first entry with that prefix. Iterate through all entries with that prefix and test if the regular expression matches.

boost::regex_search - boost kills my brain cells, again

Good programmers keep simple things easy right?
And it's not like the boost documentation makes your life less uneasy...
All I want is an implementation for:
// fulfils the function of a regex matching where the pattern may match a
// substring instead of the entire string
bool search( std::string, std::string, SomeResultType )
So it can be used as in:
std::string text, pattern;
SomeResultsType match;
if( search( text, pattern, match ) )
{
std::string result = match[0];
if( match[1].matched )
// where this is the second capture group, not recapturing the same group
std::string secondMatch = match[1];
}
I want my client code not to be bothered with templates and iterators... I know, I'm a wuss. After peering for an hour over the template spaghetti in the boost docs for doing something so simple, I feel like my productivity is seriously getting hampered and I don't feel like I've learned anything from it.
boost::regex_match does it pretty simple with boost::cmatch, except that it only matches the whole string, so I've been adapting all my patterns to match the whole strings, but I feel that it is a dirty hack and would prefer some more proper solution. If I would have known it would take this long, I would have stuck with regex_match
Also welcome, a copy of Reading boost documentation for dummies
Next week in Keep it simple and easy with boost, function binders! No, just kidding, I wouldn't do that to anyone.
Thanks for all help
I think you want regex_search: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
Probably this overload is the one you want:
bool regex_search(const basic_string& s,
match_results::const_iterator, Allocator>& m,
const basic_regex& e,
match_flag_type flags = match_default);
That seems to match what you wanted - SomeResultsType is smatch, and you need to convert your pattern to a regex first.
On Windows, you can use the .NET Regex class:
Example (copied from the linked page):
#using <System.dll>
using namespace System;
using namespace System::Text::RegularExpressions;
int main()
{
// Define a regular expression for repeated words.
Regex^ rx = gcnew Regex( "\\b(?<word>\\w+)\\s+(\\k<word>)\\b",static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );
// Define a test string.
String^ text = "The the quick brown fox fox jumped over the lazy dog dog.";
// Find matches.
MatchCollection^ matches = rx->Matches( text );
// Report the number of matches found.
Console::WriteLine( "{0} matches found.", matches->Count );
// Report on each match.
for each (Match^ match in matches)
{
String^ word = match->Groups["word"]->Value;
int index = match->Index;
Console::WriteLine("{0} repeated at position {1}", word, index);
}
}