Search only beginning of string in c++ using regex - c++

Edit I am trying to token left to right using regex a string with a list of regex strings to compare with. I decided to do this by adding carets to each regex string, and when I find a match I will make a substring after the matching regex string, and look for the next match at the beginning of that string.
I have a list of strings to convert to regex to search for inside a vectorcontainer. Here is just an example of one
vector<vector<string>> operators = {
{{",|;|//.*"}} //punctuation
};
I then take substrings and search each one for a match at the beginning. In this case I add a caret at the beginning of each string before I add it to the regex to do that:
Token *find_Match(string &s, int i)
{
string substring = s.substr(i, s.length() - i);
string somestring
for (string c : operators[x])
{
regex r = regex("^" + c);
smatch sm;
regex_search(substring, sm, r); // , std::regex_constants::;
int size = sm.size();
if (size > 0) //MATCH FOUND
{
somestring = sm[0]
}
}
return somestring;
}
Now the problem is that for the punctuation regexes, it will only look for the comma at the beginning, and then find any other match for the rest anywhere in the string, such as a; will return a match for ;. What is the best way in C++ to say that I want the beginning first match without having to search through every | operator to add the caret?

Related

Regex replace names of methods

I'm trying to replace all occurrences of names within a given string. I'm using regex, since a simple substring match won't work in this case and I need to match full words.
My problem is that I can only match words before and after blanks. But for example I cannot replace a string when it's followed by a blank, like:
toReplace()
with:
theReplacement()
My regex replace method looks like this:
void replaceWord(std::string &str, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
// std::regex e ("(\\b("+search+"))([^,. ]*)");
// std::regex e ("(\\b("+search+"))\\b)");
std::regex e("(\\b("+search+"))([^,.()<>{} ]*)");
str = std::regex_replace(str,e,replace) ;
}
How should the regex look like in order to ignore leading and trailing non-alphanumericals?
You need to
Escape all special characters in the regex pattern with std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"))
Escape all special chars in the replacement pattern with std::regex_replace(replace, std::regex("[$]"), std::string("$$$$")) (that is in case you replace with literal $1 text, $ can be set with $$, so to replace with a double $, we need $$$$ in the replacement here)
Wrap your search pattern with unambiguous word boundaries, i.e. "(\\W|^)("+search+")(?!\\w)
When you replace, add $1 at the start of the replacement pattern to keep the whitespace (if it is matched and captured into the first group with the (\W|^) pattern).
See C++ sample code:
std::string replaceWord(std::string &str, std::string& search, std::string& replace)
{
// Escape the literal regex pattern
search = std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"));
// Escape the literal replacement pattern
replace = std::regex_replace(replace, std::regex("[$]"), std::string("$$$$"));
std::regex e("(\\W|^)("+search+")(?!\\w)");
return std::regex_replace(str, e, std::string("$1") + replace);
}
Then,
std::string text("String toReplace()");
std::string s("toReplace()");
std::string r("theReplacement()");
std::cout << replaceWord(text, s, r);
// => String theReplacement()

Regex Replace everything except between the first " and the last "

i need a regex that replaces everything except the content between the first " and the last ".
I need it like this:
Input String:["Key:"Value""]
And after the regex i only need this:
Output String:Key:"Value"
Thanks!
You can try something like this.
patern:
^.*?"(.*)".*$
Substion:
$1
On Regex101
Explination:
the first part ^.*?" matches as few characters as possible that are between the start of the string and a double quote
the second part(.*)" makes the largest match it can that ends in a double quote, and stuffs it all in a capture group
the last part .*$ grabs what ever is left and includes it in the match
Finally you replace the entire match with the contents of the first capture group
Can you say why you need a RegExp?
A function like:
String unquote(String input) {
int start = input.indexOf('"');
if (start < 0) return input; // or throw.
int end = input.lastIndexOf('"');
if (start == end) return input; // or throw
return input.substring(start + 1, end);
}
is going to be faster and easier to understand than a RegExp.
Anyway, for the challenge, let's say we do want a RegExp that replaces the part up to the first " and from the last " with nothing. That's two replaces, so you can do an
input.replaceAll(RegExp(r'^[^"]*"|"[^"]*$'), "")`
or you can use a capturing group and a computed replacement like:
input.replaceFirstMapped(RegExp(r'^[^"]*"([^]*)"[^"]*$'), (m) => m[1])
Alternatively, you can use the capturing group to select the text between the two and extract it in code, instead of doing string replacement:
String unquote(String input) {
var re = RegExp(r'^[^"]*"([^]*)"[^"]$');
var match = re.firstMatch(input);
if (match == null) return input; // or throw.
return match[1];
}

c++ Find and replace whole word

How i can Find and replace (Match Whole word).
I have this.
void ReplaceString(std::string &subject, const std::string& search, const std::string& replace)
{
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
}
but it dosnt search for whole word.
for example if i try
string test = "i like cake";
ReplaceString(test, "cak", "notcake");
it will still replace but i want it to match whole word.
You're just blindly replacing any instances of search with replace without checking if they're full words prior to performing the replacement.
Here are just a couple of things you can try to work around that:
Split the string into individual words, then check each word against search, and replace if necessary. Then rebuild the string.
Replace only if pos-1 and pos + search.length() + 1 are both spaces.
Regular expressions solution if you have access to c++11 compiler:
#include <iostream>
#include <string>
#include <regex>
void ReplaceString(std::string &subject, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
std::regex e ("(\\b("+search+"))([^,. ]*)");
subject = std::regex_replace(subject,e,replace) ;
}
int main ()
{
// String to search within and do replacement
std::string s ("Cakemoney, cak, cake, thecakeisalie, cake.\n");
// String you want to find and replace
std::string find ("cak") ;
// String you want to replace with
std::string replace("notcake") ;
ReplaceString(s, find, replace) ;
std::cout << s << std::endl;
return 0 ;
}
Output:
Cakemoney, notcake, notcake, thecakeisalie, notcake.
More about the regular expression string (\\b("+search+"))([^,. ]*). Note that after replacing search this string will be:
(\\b(cak))([^,. ]*)
\b(cak) - match words beginning with cak regardless of what comes after
([^,. ]*) - matches anything up to a ,, ., or (space).
The above basically just rips off the example provided here. The answer is case sensitive, and will also replace punctuation other than the three listed after ^, but feel free to learn more about regular expressions to make a more general solution.

C++ boost::regex multiples captures

I'm trying to recover multiples substrings thanks to boost::regex and put each one in a var. Here my code :
unsigned int i = 0;
std::string string = "--perspective=45.0,1.33,0.1,1000";
std::string::const_iterator start = string.begin();
std::string::const_iterator end = string.end();
std::vector<std::string> matches;
boost::smatch what;
boost::regex const ex(R"(^-?\d*\.?\d+),(^-?\d*\.?\d+),(^-?\d*\.?\d+),(^-?\d*\.?\d+))");
string.resize(4);
while (boost::regex_search(start, end, what, ex)
{
std::string stest(what[1].first, what[1].second);
matches[i] = stest;
start = what[0].second;
++i;
}
I'm trying to extract each float of my string and put it in my vector variable matches. My result, at the moment, is that I can extract the first one (in my vector var, I can see "45" without double quotes) but the second one in my vector var is empty (matches[1] is "").
I can't figure out why and how to correct this. So my question is how to correct this ? Is my regex not correct ? My smatch incorrect ?
Firstly, ^ is symbol for the beginning of a line. Secondly, \ must be escaped. So you should fix each (^-?\d*\.?\d+) group to (-?\\d*\\.\\d+). (Probably, (-?\\d+(?:\\.\\d+)?) is better.)
Your regular expression searches for the number,number,number,number pattern, not for the each number. You add only the first substring to matches and ignore others. To fix this, you can replace your expression with (-?\\d*\\.\\d+) or just add all the matches stored in what to your matches vector:
while (boost::regex_search(start, end, what, ex))
{
for(int j = 1; j < what.size(); ++j)
{
std::string stest(what[j].first, what[j].second);
matches.push_back(stest);
}
start = what[0].second;
}
You are using ^ at several times in your regex. That's why it didn't match. ^ means the beginning of the string. Also you have an extra ) at the end of the regex. I don't know that closing bracket doing there.
Here is your regex after correction:
(-?\d*\.?\d+),(-?\d*\.?\d+),(-?\d*\.?\d+),(-?\d*\.?\d+)
A better version of your regex can be(only if you want to avoid matching numbers like .01, .1):
(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?)
A repeated search in combination with a regular expression that apparently is built to match all of the target string is pointless.
If you are searching repeatedly in a moving window delimited by a moving iterator and string.end() then you should reduce the pattern to something that matches a single fraction.
If you know that the number of fractions in your string is/must be constant, match once, not in a loop and extract the matched substrings from what.

Ignore String containing special words (Months)

I am trying to find alphanumeric strings by using the following regular expression:
^(?=.*\d)(?=.*[a-zA-Z]).{3,90}$
Alphanumeric string: an alphanumeric string is any string that contains at least a number and a letter plus any other special characters it can be # - _ [] () {} ç _ \ ù %
I want to add an extra constraint to ignore all alphanumerical strings containing the following month formats :
JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
One solution is to actually match an alphanumerical string. Then check if this string contains one of these names by using the following function:
vector<string> findString(string s)
{
vector<string> vec;
boost::regex rgx("JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
");
boost::smatch match;
boost::sregex_iterator begin {s.begin(), s.end(), rgx},
end {};
for (boost::sregex_iterator& i = begin; i != end; ++i)
{
boost::smatch m = *i;
vec.push_back(m.str());
}
return vec;
}
Question: How can I add this constraint directly into the regular expression instead of using this function.
One solution is to use negative lookahead as mentioned in How to ignore words in string using Regular Expressions.
I used it as follows:
String : 2-hello-001
Regular expression : ^(?=.*\d)(?=.*[a-zA-Z]^(?!Jan|Feb|Mar)).{3,90}$
Result: no match
Test website: http://regexlib.com/
The edit provided by #Robin and #RyanCarlson : ^[][\w#_(){}ç\\ù%-]{3,90}$ works perfectly in detecting alphanumeric strings with special characters. It's just the negative lookahead part that isn't working.
You can use negative look ahead, the same way you're using positive lookahead:
(?=.*\d)(?=.*[a-zA-Z])
(?!.*(?:JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)).{3,90}$
Also you regex is pretty unclear. If you want alphanumerical strings with a length between 3 and 90, you can just do:
/^(?!.*(?:JANVIER|F[Eé]VRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AO[Uù]T|SEPTEMBRE|OCTOBRE|NOVEMBRE|D[Eé]CEMBRE|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
[][\w#_(){}ç\\ù%-]{3,90}$/i
the i flag means it will match upper and lower case (so you can reduce your forbidden list), \w is a shortcut for [0-9a-zA-Z_] (careful if you copy-paste, there's a linebreak here for readability between (?! ) and [ ]). Just add in the final [...] whatever special characters you wanna match.