Regex replace names of methods - c++

I'm trying to replace all occurrences of names within a given string. I'm using regex, since a simple substring match won't work in this case and I need to match full words.
My problem is that I can only match words before and after blanks. But for example I cannot replace a string when it's followed by a blank, like:
toReplace()
with:
theReplacement()
My regex replace method looks like this:
void replaceWord(std::string &str, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
// std::regex e ("(\\b("+search+"))([^,. ]*)");
// std::regex e ("(\\b("+search+"))\\b)");
std::regex e("(\\b("+search+"))([^,.()<>{} ]*)");
str = std::regex_replace(str,e,replace) ;
}
How should the regex look like in order to ignore leading and trailing non-alphanumericals?

You need to
Escape all special characters in the regex pattern with std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"))
Escape all special chars in the replacement pattern with std::regex_replace(replace, std::regex("[$]"), std::string("$$$$")) (that is in case you replace with literal $1 text, $ can be set with $$, so to replace with a double $, we need $$$$ in the replacement here)
Wrap your search pattern with unambiguous word boundaries, i.e. "(\\W|^)("+search+")(?!\\w)
When you replace, add $1 at the start of the replacement pattern to keep the whitespace (if it is matched and captured into the first group with the (\W|^) pattern).
See C++ sample code:
std::string replaceWord(std::string &str, std::string& search, std::string& replace)
{
// Escape the literal regex pattern
search = std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"));
// Escape the literal replacement pattern
replace = std::regex_replace(replace, std::regex("[$]"), std::string("$$$$"));
std::regex e("(\\W|^)("+search+")(?!\\w)");
return std::regex_replace(str, e, std::string("$1") + replace);
}
Then,
std::string text("String toReplace()");
std::string s("toReplace()");
std::string r("theReplacement()");
std::cout << replaceWord(text, s, r);
// => String theReplacement()

Related

How to find the exact substring with regex in c++11?

I am trying to find substrings that are not surrounded by other a-zA-Z0-9 symbols.
For example: I want to find substring hello, so it won't match hello1 or hellow but will match Hello and heLLo!##$%.
And I have such sample below.
std::string s = "1mySymbol1, /_mySymbol_ mysymbol";
const std::string sub = "mysymbol";
std::regex rgx("[^a-zA-Z0-9]*" + sub + "[^a-zA-Z0-9]*", std::regex::icase);
std::smatch match;
while (std::regex_search(s, match, rgx)) {
std::cout << match.size() << "match: " << match[0] << '\n';
s = match.suffix();
}
The result is:
1match: mySymbol
1match: , /_mySymbol_
1match: mysymbol
But I don't understand why first occurance 1mySymbol1 also matches my regex?
How to create a proper regex that will ignore such strings?
UDP
If I do like this
std::string s = "mySymbol, /_mySymbol_ mysymbol";
const std::string sub = "mysymbol";
std::regex rgx("[^a-zA-Z0-9]+" + sub + "[^a-zA-Z0-9]+", std::regex::icase);
then I find only substring in the middle
1match: , /_mySymbol_
And don't find substrings at the beggining and at the end.
The regex [^a-zA-Z0-9]* will match 0 or more characters, so it's perfectly valid for [^a-zA-Z0-9]*mysymbol[^a-zA-Z0-9]* to match mysymbol in 1mySymbol1 (allowing for case insensitivity). As you saw, this is fixed when you use [^a-zA-Z0-9]+ (matching 1 or more characters) instead.
With your update, you see that this doesn't match strings at the beginning or end. That's because [^a-zA-Z0-9]+ has to match 1 or more characters (which don't exist at the beginning or end of the string).
You have a few options:
Use beginning/end anchors: (?:[^a-zA-Z0-9]+|^)mysymbol(?:[^a-zA-Z0-9]+|$) (non-alphanumeric OR beginning of string, followed by mysymbol, followed by non-alphanumeric OR end of string).
Use negative lookahead and negative lookbehind: (?<![a-zA-Z0-9])mysymbol(?![a-zA-Z0-9]) (match mysymbol which doesn't have an alphanumeric character before or after it). Note that using this the match won't include the characters before/after mysymbol.
I recommend using https://regex101.com/ to play around with regular expressions. It lists all the different constructs you can use.

C++ regex string literal with capturing group

I have a std::string containing backslashes, double quotes. I want to extract a substring using capture group, but I am not able to get the syntax right.
e.g.
std::string str(R"(some\"string"name":"john"\"lastname":"doe")"); //==> want to extract "john"
std::regex re(R"(some\"string"name":")"(.*)R"("\"lastname":"doe")"); //==> wrong syntax
std::smatch match;
std::string name;
if (std::regex_search(str, match, re) && match.size() > 1)
{
name = match.str(1);
}
Use a delimeter that does not occur in the string. E.g. R"~( .... )~"
You still need to escape the \ for regex. To match \ literally use \\.
You probably want to stop as soon as the shortest possible match is found. So use (.*?):
std::regex re(R"~(some\\"string"name":"(.*?)"\\"lastname":"doe")~");

Search only beginning of string in c++ using regex

Edit I am trying to token left to right using regex a string with a list of regex strings to compare with. I decided to do this by adding carets to each regex string, and when I find a match I will make a substring after the matching regex string, and look for the next match at the beginning of that string.
I have a list of strings to convert to regex to search for inside a vectorcontainer. Here is just an example of one
vector<vector<string>> operators = {
{{",|;|//.*"}} //punctuation
};
I then take substrings and search each one for a match at the beginning. In this case I add a caret at the beginning of each string before I add it to the regex to do that:
Token *find_Match(string &s, int i)
{
string substring = s.substr(i, s.length() - i);
string somestring
for (string c : operators[x])
{
regex r = regex("^" + c);
smatch sm;
regex_search(substring, sm, r); // , std::regex_constants::;
int size = sm.size();
if (size > 0) //MATCH FOUND
{
somestring = sm[0]
}
}
return somestring;
}
Now the problem is that for the punctuation regexes, it will only look for the comma at the beginning, and then find any other match for the rest anywhere in the string, such as a; will return a match for ;. What is the best way in C++ to say that I want the beginning first match without having to search through every | operator to add the caret?

c++ Find and replace whole word

How i can Find and replace (Match Whole word).
I have this.
void ReplaceString(std::string &subject, const std::string& search, const std::string& replace)
{
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
}
but it dosnt search for whole word.
for example if i try
string test = "i like cake";
ReplaceString(test, "cak", "notcake");
it will still replace but i want it to match whole word.
You're just blindly replacing any instances of search with replace without checking if they're full words prior to performing the replacement.
Here are just a couple of things you can try to work around that:
Split the string into individual words, then check each word against search, and replace if necessary. Then rebuild the string.
Replace only if pos-1 and pos + search.length() + 1 are both spaces.
Regular expressions solution if you have access to c++11 compiler:
#include <iostream>
#include <string>
#include <regex>
void ReplaceString(std::string &subject, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
std::regex e ("(\\b("+search+"))([^,. ]*)");
subject = std::regex_replace(subject,e,replace) ;
}
int main ()
{
// String to search within and do replacement
std::string s ("Cakemoney, cak, cake, thecakeisalie, cake.\n");
// String you want to find and replace
std::string find ("cak") ;
// String you want to replace with
std::string replace("notcake") ;
ReplaceString(s, find, replace) ;
std::cout << s << std::endl;
return 0 ;
}
Output:
Cakemoney, notcake, notcake, thecakeisalie, notcake.
More about the regular expression string (\\b("+search+"))([^,. ]*). Note that after replacing search this string will be:
(\\b(cak))([^,. ]*)
\b(cak) - match words beginning with cak regardless of what comes after
([^,. ]*) - matches anything up to a ,, ., or (space).
The above basically just rips off the example provided here. The answer is case sensitive, and will also replace punctuation other than the three listed after ^, but feel free to learn more about regular expressions to make a more general solution.

Ignore String containing special words (Months)

I am trying to find alphanumeric strings by using the following regular expression:
^(?=.*\d)(?=.*[a-zA-Z]).{3,90}$
Alphanumeric string: an alphanumeric string is any string that contains at least a number and a letter plus any other special characters it can be # - _ [] () {} ç _ \ ù %
I want to add an extra constraint to ignore all alphanumerical strings containing the following month formats :
JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
One solution is to actually match an alphanumerical string. Then check if this string contains one of these names by using the following function:
vector<string> findString(string s)
{
vector<string> vec;
boost::regex rgx("JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
");
boost::smatch match;
boost::sregex_iterator begin {s.begin(), s.end(), rgx},
end {};
for (boost::sregex_iterator& i = begin; i != end; ++i)
{
boost::smatch m = *i;
vec.push_back(m.str());
}
return vec;
}
Question: How can I add this constraint directly into the regular expression instead of using this function.
One solution is to use negative lookahead as mentioned in How to ignore words in string using Regular Expressions.
I used it as follows:
String : 2-hello-001
Regular expression : ^(?=.*\d)(?=.*[a-zA-Z]^(?!Jan|Feb|Mar)).{3,90}$
Result: no match
Test website: http://regexlib.com/
The edit provided by #Robin and #RyanCarlson : ^[][\w#_(){}ç\\ù%-]{3,90}$ works perfectly in detecting alphanumeric strings with special characters. It's just the negative lookahead part that isn't working.
You can use negative look ahead, the same way you're using positive lookahead:
(?=.*\d)(?=.*[a-zA-Z])
(?!.*(?:JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)).{3,90}$
Also you regex is pretty unclear. If you want alphanumerical strings with a length between 3 and 90, you can just do:
/^(?!.*(?:JANVIER|F[Eé]VRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AO[Uù]T|SEPTEMBRE|OCTOBRE|NOVEMBRE|D[Eé]CEMBRE|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
[][\w#_(){}ç\\ù%-]{3,90}$/i
the i flag means it will match upper and lower case (so you can reduce your forbidden list), \w is a shortcut for [0-9a-zA-Z_] (careful if you copy-paste, there's a linebreak here for readability between (?! ) and [ ]). Just add in the final [...] whatever special characters you wanna match.