c++ Find and replace whole word - c++

How i can Find and replace (Match Whole word).
I have this.
void ReplaceString(std::string &subject, const std::string& search, const std::string& replace)
{
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
}
but it dosnt search for whole word.
for example if i try
string test = "i like cake";
ReplaceString(test, "cak", "notcake");
it will still replace but i want it to match whole word.

You're just blindly replacing any instances of search with replace without checking if they're full words prior to performing the replacement.
Here are just a couple of things you can try to work around that:
Split the string into individual words, then check each word against search, and replace if necessary. Then rebuild the string.
Replace only if pos-1 and pos + search.length() + 1 are both spaces.

Regular expressions solution if you have access to c++11 compiler:
#include <iostream>
#include <string>
#include <regex>
void ReplaceString(std::string &subject, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
std::regex e ("(\\b("+search+"))([^,. ]*)");
subject = std::regex_replace(subject,e,replace) ;
}
int main ()
{
// String to search within and do replacement
std::string s ("Cakemoney, cak, cake, thecakeisalie, cake.\n");
// String you want to find and replace
std::string find ("cak") ;
// String you want to replace with
std::string replace("notcake") ;
ReplaceString(s, find, replace) ;
std::cout << s << std::endl;
return 0 ;
}
Output:
Cakemoney, notcake, notcake, thecakeisalie, notcake.
More about the regular expression string (\\b("+search+"))([^,. ]*). Note that after replacing search this string will be:
(\\b(cak))([^,. ]*)
\b(cak) - match words beginning with cak regardless of what comes after
([^,. ]*) - matches anything up to a ,, ., or (space).
The above basically just rips off the example provided here. The answer is case sensitive, and will also replace punctuation other than the three listed after ^, but feel free to learn more about regular expressions to make a more general solution.

Related

Regex replace names of methods

I'm trying to replace all occurrences of names within a given string. I'm using regex, since a simple substring match won't work in this case and I need to match full words.
My problem is that I can only match words before and after blanks. But for example I cannot replace a string when it's followed by a blank, like:
toReplace()
with:
theReplacement()
My regex replace method looks like this:
void replaceWord(std::string &str, const std::string& search, const std::string& replace)
{
// Regular expression to match words beginning with 'search'
// std::regex e ("(\\b("+search+"))([^,. ]*)");
// std::regex e ("(\\b("+search+"))\\b)");
std::regex e("(\\b("+search+"))([^,.()<>{} ]*)");
str = std::regex_replace(str,e,replace) ;
}
How should the regex look like in order to ignore leading and trailing non-alphanumericals?
You need to
Escape all special characters in the regex pattern with std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"))
Escape all special chars in the replacement pattern with std::regex_replace(replace, std::regex("[$]"), std::string("$$$$")) (that is in case you replace with literal $1 text, $ can be set with $$, so to replace with a double $, we need $$$$ in the replacement here)
Wrap your search pattern with unambiguous word boundaries, i.e. "(\\W|^)("+search+")(?!\\w)
When you replace, add $1 at the start of the replacement pattern to keep the whitespace (if it is matched and captured into the first group with the (\W|^) pattern).
See C++ sample code:
std::string replaceWord(std::string &str, std::string& search, std::string& replace)
{
// Escape the literal regex pattern
search = std::regex_replace(search, std::regex(R"([.^$|{}()[\]*+?/\\])"), std::string(R"(\$&)"));
// Escape the literal replacement pattern
replace = std::regex_replace(replace, std::regex("[$]"), std::string("$$$$"));
std::regex e("(\\W|^)("+search+")(?!\\w)");
return std::regex_replace(str, e, std::string("$1") + replace);
}
Then,
std::string text("String toReplace()");
std::string s("toReplace()");
std::string r("theReplacement()");
std::cout << replaceWord(text, s, r);
// => String theReplacement()

need support defining the right regex

I would like to parse a file using boost::sregex_token_iterator.
Unfortunately I'm not able to find the right regex to extract strings in the form FOO:BAR out of it.
The below code example is usable only if one such occurence per line is found, but I would like to support multiple of this entries per line, and ideally also a comment after an '#'
So entries like this
AA:BB CC:DD EE:FF #this is a comment
should result in 3 identified token (AA:BB, CC:DD, EE:FF)
boost::regex re("((\\W+:\\W+)\\S*)+");
boost::sregex_token_iterator i(line.begin(), line.end(), re, -1), end;
for(; i != end; i++){
std::stringstream ss(*i);
...
}
Any support is very welcome.
I suggest you use splitting to get the values you need.
I would begin by first splitting using #. This separates the comment from the rest of the line. Then split using white space, which separates the pairs out. After this, individual pairs can be split using :.
If, for whatever reason, you must use regex, you can iterate over the matches. In this case I would use the following regex:
(?:#(?:.*))*(\w+:\w+)\s*
This regex will match every pair until it finds a comment. If there is a comment, it will skip to the next new line.
You want to match sequences of 1 or more word chars followed with : and then having again 1 or more word chars.
Thus, you need to replace -1 with 1 in the call to boost::sregex_token_iterator to get Group 1 text chunks and replace the regex you use with \w+:\w+ pattern:
boost::regex re(R"(#.*|(\w+:\w+))");
boost::sregex_token_iterator i(line.begin(), line.end(), re, 1), end;
Note that R"(#.*|(\w+:\w+))" is a raw string literal that actually represents #.*|(\w+:\w+) pattern that matches # and then the rest of the line or matches and captures the pattern you need into Group 1.
See an std::regex C++ example (you may easily adjust the code for Boost):
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex r(R"(#.*|(\w+:\w+))");
std::string s = "AA:BB CC:DD EE:FF #this is a comment XX:YY";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << m[1].str() << '\n';
}
return 0;
}

Search only beginning of string in c++ using regex

Edit I am trying to token left to right using regex a string with a list of regex strings to compare with. I decided to do this by adding carets to each regex string, and when I find a match I will make a substring after the matching regex string, and look for the next match at the beginning of that string.
I have a list of strings to convert to regex to search for inside a vectorcontainer. Here is just an example of one
vector<vector<string>> operators = {
{{",|;|//.*"}} //punctuation
};
I then take substrings and search each one for a match at the beginning. In this case I add a caret at the beginning of each string before I add it to the regex to do that:
Token *find_Match(string &s, int i)
{
string substring = s.substr(i, s.length() - i);
string somestring
for (string c : operators[x])
{
regex r = regex("^" + c);
smatch sm;
regex_search(substring, sm, r); // , std::regex_constants::;
int size = sm.size();
if (size > 0) //MATCH FOUND
{
somestring = sm[0]
}
}
return somestring;
}
Now the problem is that for the punctuation regexes, it will only look for the comma at the beginning, and then find any other match for the rest anywhere in the string, such as a; will return a match for ;. What is the best way in C++ to say that I want the beginning first match without having to search through every | operator to add the caret?

C++ RegExp and placeholders

I'm on C++11 MSVC2013, I need to extract a number from a file name, for example:
string filename = "s 027.wav";
If I were writing code in Perl, Java or Basic, I would use a regular expression and something like this would do the trick in Perl5:
filename ~= /(\d+)/g;
and I would have the number "027" in placeholder variable $1.
Can I do this in C++ as well? Or can you suggest a different method to extract the number 027 from that string? Also, I should convert the resulting numerical string into an integral scalar, I think atoi() is what I need, right?
You can do this in C++, as of C++11 with the collection of classes found in regex. It's pretty similar to other regular expressions you've used in other languages. Here's a no-frills example of how you might search for the number in the filename you posted:
const std::string filename = "s 027.wav";
std::regex re = std::regex("[0-9]+");
std::smatch matches;
if (std::regex_search(filename, matches, re)) {
std::cout << matches.size() << " matches." << std::endl;
for (auto &match : matches) {
std::cout << match << std::endl;
}
}
As far as converting 027 into a number, you could use atoi (from cstdlib) like you mentioned, but this will store the value 27, not 027. If you want to keep the 0 prefix, I believe you will need to keep this as a string. match above is a sub_match so, extract a string and convert to a const char* for atoi:
int value = atoi(match.str().c_str());
Ok, I solved using std::regex which for some reason I couldn't get to work properly when trying to modify the examples I found around the web. It was simpler than I thought. This is the code I wrote:
#include <regex>
#include <string>
string FileName = "s 027.wav";
// The search object
smatch m;
// The regexp /\d+/ works in Perl and Java but for some reason didn't work here.
// With this other variation I look for exactly a string of 1 to 3 characters
// containing only numbers from 0 to 9
regex re("[0-9]{1,3}");
// Do the search
regex_search (FileName, m, re);
// 'm' is actually an array where every index contains a match
// (equally to $1, $2, $2, etc. in Perl)
string sMidiNoteNum = m[0];
// This casts the string to an integer number
int MidiNote = atoi(sMidiNoteNum.c_str());
Here is an example using Boost, substitute the proper namespace and it should work.
typedef std::string::const_iterator SITR;
SITR start = str.begin();
SITR end = str.end();
boost::regex NumRx("\\d+");
boost::smatch m;
while ( boost::regex_search ( start, end, m, NumRx ) )
{
int val = atoi( m[0].str().c_str() )
start = m[0].second;
}

Get String Between 2 Strings

How can I get a string that is between two other declared strings, for example:
String 1 = "[STRING1]"
String 2 = "[STRING2]"
Source:
"832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
How can I get the "I need this text here"?
Since this is homework, only clues:
Find index1 of occurrence of String1
Find index2 of occurrence of String2
Substring from index1+lengthOf(String1) (inclusive) to index2 (exclusive) is what you need
Copy this to a result buffer if necessary (don't forget to null-terminate)
Might be a good case for std::regex, which is part of C++11.
#include <iostream>
#include <string>
#include <regex>
int main()
{
using namespace std::string_literals;
auto start = "\\[STRING1\\]"s;
auto end = "\\[STRING2\\]"s;
std::regex base_regex(start + "(.*)" + end);
auto example = "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"s;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
// The first sub_match is the whole string; the next
// sub_match is the first parenthesized expression.
if (base_match.size() == 2) {
matched = base_match[1].str();
}
}
std::cout << "example: \""<<example << "\"\n";
std::cout << "matched: \""<<matched << "\"\n";
}
Prints:
example: "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
matched: "I need this text here"
What I did was create a program that creates two strings, start and end that serve as my start and end matches. I then use a regular expression string that will look for those, and match against anything in-between (including nothing). Then I use regex_match to find the matching part of the expression, and set matched as the matched string.
For more info, see http://en.cppreference.com/w/cpp/regex and http://en.cppreference.com/w/cpp/regex/regex_search
Use strstr http://www.cplusplus.com/reference/clibrary/cstring/strstr/ , with that function you will get 2 pointers, now you should compare them (if pointer1 < pointer2) if so, read all chars between them.