C++ regex replace whole word

C++ regex replace whole word - c++

I have a small game to do in which I need to sometimes replace some group of characters with the name of the player in the sentences.
For example, I could have a sentence like :
"[Player]! Are you okay? A plane crash happened, it's on fire!"
And I need to replace the "[Player]" with some name contained in a std::string.
I have been looking for about 20 minutes in other SO questions and in the CPP reference and I really can't understand how to use the regex.
I would like to know how I can replace all instances of the "[Player]" string in a std::string.

Personally I would not use regex for this. A simple search and replace should be enough.
These are (roughly) the functions I use:
// change the string in-place
std::string& replace_all_mute(std::string& s,
const std::string& from, const std::string& to)
{
if(!from.empty())
for(std::size_t pos = 0; (pos = s.find(from, pos) + 1); pos += to.size())
s.replace(--pos, from.size(), to);
return s;
}
// return a copy of the string
std::string replace_all_copy(std::string s,
const std::string& from, const std::string& to)
{
return replace_all_mute(s, from, to);
}
int main()
{
std::string s = "[Player]! Are you okay? A plane crash happened, it's on fire!";
replace_all_mute(s, "[Player]", "Uncle Bob");
std::cout << s << '\n';
}
Output:
Uncle Bob! Are you okay? A plane crash happened, it's on fire!

Regex is meant for more complex patterns. Consider, for example, that instead of simply matching [Player], you wanted to match anything between brackets. That would be a good use for regex.
Following is an example that does just that. Unfortunately, the interface of <regex> is not flexible enough to enable dynamic replacements, so we have to implement the actual replacing ourselves.
#include <iostream>
#include <regex>
int main() {
// Anything stored here can be replaced in the string.
std::map<std::string, std::string> vars {
{"Player1", "Bill"},
{"Player2", "Ted"}
};
// Matches anything between brackets.
std::regex r(R"(\[([^\]]+?)\])");
std::string str = "[Player1], [Player1]! Are you okay? [Player2] said that a plane crash happened!";
// We need to keep track of where we are, or else we would need to search from the start of
// the string everytime, which is very wasteful.
// std::regex_iterator won't help, because the replacement may be smaller
// than the match, and it would cause strings like "[Player1][Player1]" to not match properly.
auto pos=str.cbegin();
do {
// First, we try to get a match. If there's no more matches, exit.
std::smatch m;
regex_search(pos, str.cend(), m, r);
if (m.empty()) break;
// The interface of std::match_results is terrible. Let's get what we need and
// place it in apropriately named variables.
auto var_name = m[1].str();
auto start = m[0].first;
auto end = m[0].second;
auto value = vars[var_name];
// This does the actual replacement
str.replace(start, end, value);
// We update our position. The new search will start right at the end of the replacement.
pos = m[0].first + value.size();
} while(true);
std::cout << str;
}
Output:
Bill, Bill! Are you okay? Ted said that a plane crash happened!
See it live on Coliru

Simply find and replace, e.g. boost::replace_all()
#include <boost/algorithm/string.hpp>
std::string target(""[Player]! Are you okay? A plane crash happened, it's on fire!"");
boost::replace_all(target, "[Player]", "NiNite");

As some people have mentioned, find and replace might be more useful for this scenario, you could do something like this.
std::string name = "Bill";
std::string strToFind = "[Player]";
std::string str = "[Player]! Are you okay? A plane crash happened, it's on fire!";
str.replace(str.find(strToFind), strToFind.length(), name);

Related

Change and replace words in a string with C++

First of all I want to apologize for my bad English writing.
My question is: for example we have a lot of sentences and in this group of words some words must replace with some other words, something like this:
In this cool day it's perfect to go to park and nice to play football.
And changed string become like this:
In this nice day it's so good to go to park and cool to play football.
As you see the word "perfect" replace with "so good" and this part is not difficult, my problem is how to replace any "cool" word to "nice" and "nice" word to "cool"?
What is the best way to do this with C++?
Thanks.

You can use std::string::replace to replace a part of an std::string.
And you can use std::string::find to find a specific substring in an std::string:
std::string foo = "hello replaceme!";
std::string bar = "replaceme";
size_t pos = foo.find(bar);
size_t len = bar.length();
foo.replace(pos, len, "world");
std::cout << foo << std::endl;
The above code will print hello world!.
You can then continue to loop that until foo.find returns string::npos which means it didn't find the specified substring in foo.

There is also a way to do it with char pointers if you really wanna get fancy.
Here's what I found:
const bool SUCCESS = true;
const bool FAIL = false;
boolean replace_word(const char *foo, const char *bar, const char *foo_bar){
if(foo==NULL || bar==NULL || foo_bar==NULL){
return FAIL;
}
char* new_string = src;
// can also do strcpy(new_string, foo);
int len_old_string = strlen(foo);
int i = 0;
while (i < len_old_string) {
if (*(foo + i) == bar[0]) {
*(new_string + i) = foo_bar[i];
}
i++;
}
foo = new_string;
return (SUCCESS);
}
the replace method is a bit easier, but also less dynamic.

C++ Get String between two delimiter String

Is there any inbuilt function available two get string between two delimiter string in C/C++?
My input look like
_STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_
And my output should be
_0_192.168.1.18_
Thanks in advance...

You can do as:
string str = "STARTDELIMITER_0_192.168.1.18_STOPDELIMITER";
unsigned first = str.find(STARTDELIMITER);
unsigned last = str.find(STOPDELIMITER);
string strNew = str.substr (first,last-first);
Considering your STOPDELIMITER delimiter will occur only once at the end.
EDIT:
As delimiter can occur multiple times, change your statement for finding STOPDELIMITER to:
unsigned last = str.find_last_of(STOPDELIMITER);
This will get you text between the first STARTDELIMITER and LAST STOPDELIMITER despite of them being repeated multiple times.

I have no idea how the top answer received so many votes that it did when the question clearly asks how to get a string between two delimiter strings, and not a pair of characters.
If you would like to do so you need to account for the length of the string delimiter, since it will not be just a single character.
Case 1: Both delimiters are unique:
Given a string _STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_ that you want to extract _0_192.168.1.18_ from, you could modify the top answer like so to get the desired effect. This is the simplest solution without introducing extra dependencies (e.g Boost):
#include <iostream>
#include <string>
std::string get_str_between_two_str(const std::string &s,
const std::string &start_delim,
const std::string &stop_delim)
{
unsigned first_delim_pos = s.find(start_delim);
unsigned end_pos_of_first_delim = first_delim_pos + start_delim.length();
unsigned last_delim_pos = s.find(stop_delim);
return s.substr(end_pos_of_first_delim,
last_delim_pos - end_pos_of_first_delim);
}
int main() {
// Want to extract _0_192.168.1.18_
std::string s = "_STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_";
std::string s2 = "ABC123_STARTDELIMITER_0_192.168.1.18_STOPDELIMITER_XYZ345";
std::string start_delim = "_STARTDELIMITER";
std::string stop_delim = "STOPDELIMITER_";
std::cout << get_str_between_two_str(s, start_delim, stop_delim) << std::endl;
std::cout << get_str_between_two_str(s2, start_delim, stop_delim) << std::endl;
return 0;
}
Will print _0_192.168.1.18_ twice.
It is necessary to add the position of the first delimiter in the second argument to std::string::substr as last - (first + start_delim.length()) to ensure that the it would still extract the desired inner string correctly in the event that the start delimiter is not located at the very beginning of the string, as demonstrated in the second case above.
See the demo.
Case 2: Unique first delimiter, non-unique second delimiter:
Say you want to get a string between a unique delimiter and the first non unique delimiter encountered after the first delimiter. You could modify the above function get_str_between_two_str to use find_first_of instead to get the desired effect:
std::string get_str_between_two_str(const std::string &s,
const std::string &start_delim,
const std::string &stop_delim)
{
unsigned first_delim_pos = s.find(start_delim);
unsigned end_pos_of_first_delim = first_delim_pos + start_delim.length();
unsigned last_delim_pos = s.find_first_of(stop_delim, end_pos_of_first_delim);
return s.substr(end_pos_of_first_delim,
last_delim_pos - end_pos_of_first_delim);
}
If instead you want to capture any characters in between the first unique delimiter and the last encountered second delimiter, like what the asker commented above, use find_last_of instead.
Case 3: Non-unique first delimiter, unique second delimiter:
Very similar to case 2, just reverse the logic between the first delimiter and second delimiter.
Case 4: Both delimiters are not unique:
Again, very similar to case 2, make a container to capture all strings between any of the two delimiters. Loop through the string and update the first delimiter's position to be equal to the second delimiter's position when it is encountered and add the string in between to the container. Repeat until std::string:npos is reached.

To get a string between 2 delimiter strings without white spaces.
string str = "STARTDELIMITER_0_192.168.1.18_STOPDELIMITER";
string startDEL = "STARTDELIMITER";
// this is really only needed for the first delimiter
string stopDEL = "STOPDELIMITER";
unsigned firstLim = str.find(startDEL);
unsigned lastLim = str.find(stopDEL);
string strNew = str.substr (firstLim,lastLim);
//This won't exclude the first delimiter because there is no whitespace
strNew = strNew.substr(firstLim + startDEL.size())
// this will start your substring after the delimiter
I tried combining the two substring functions but it started printing the STOPDELIMITER
Hope that helps

Hope you won't mind I'm answering by another question :)
I would use boost::split or boost::split_iter.
http://www.boost.org/doc/libs/1_54_0/doc/html/string_algo/usage.html#idp166856528
For example code see this SO question:
How to avoid empty tokens when splitting with boost::iter_split?

Let's say you need to get 5th argument (brand) from output below:
zoneid:zonename:state:zonepath:uuid:brand:ip-type:r/w:file-mac-profile
You cannot use any "str.find" function, because it is in the middle, but you can use 'strtok'. e.g.
char *brand;
brand = strtok( line, ":" );
for (int i=0;i<4;i++) {
brand = strtok( NULL, ":" );
}

This is a late answer, but this might work too:
string strgOrg= "STARTDELIMITER_0_192.168.1.18_STOPDELIMITER";
string strg= strgOrg;
strg.replace(strg.find("STARTDELIMITER"), 14, "");
strg.replace(strg.find("STOPDELIMITER"), 13, "");
Hope it works for others.

void getBtwString(std::string oStr, std::string sStr1, std::string sStr2, std::string &rStr)
{
int start = oStr.find(sStr1);
if (start >= 0)
{
string tstr = oStr.substr(start + sStr1.length());
int stop = tstr.find(sStr2);
if (stop >1)
rStr = oStr.substr(start + sStr1.length(), stop);
else
rStr ="error";
}
else
rStr = "error"; }
or if you are using Windows and have access to c++14, the following,
void getBtwString(std::string oStr, std::string sStr1, std::string sStr2, std::string &rStr)
{
using namespace std::literals::string_literals;
auto start = sStr1;
auto end = sStr2;
std::regex base_regex(start + "(.*)" + end);
auto example = oStr;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
if (base_match.size() == 2) {
matched = base_match[1].str();
}
rStr = matched;
}
}
Example:
string strout;
getBtwString("it's_12345bb2","it's","bb2",strout);
getBtwString("it's_12345bb2"s,"it's"s,"bb2"s,strout); // second solution
Headers:
#include <regex> // second solution
#include <string.h>

Equality of two strings

What is the easiest way, with the least amount of code, to compare two strings, while ignoring the following:
"hello world" == "hello world" // spaces
"hello-world" == "hello world" // hyphens
"Hello World" == "hello worlD" // case
"St pierre" == "saint pierre" == "St. Pierre" // word replacement
I'm sure this has been done before, and there are some libraries to do this kind of stuff, but I don't know any. This is in C++ preferably, but if there's a very short option in whatever other language, I'll want to hear about it too.
Alternatively, I'd also be interested in any library that could give a percentage of matching. Say, hello-world and hello wolrd are 97% likely to be the same meaning, just a hyphen and a mispelling.

Remove spaces from both strings.
Remove hyphens from both strings.
Convert both strings to lower case.
Convert all occurrences of “saint” and “st.” to “st”.
Compare strings like normal.
For example:
#include <cctype>
#include <string>
#include <algorithm>
#include <iostream>
static void remove_spaces_and_hyphens(std::string &s)
{
s.erase(std::remove_if(s.begin(), s.end(), [](char c) {
return c == ' ' || c == '-';
}), s.end());
}
static void convert_to_lower_case(std::string &s)
{
for (auto &c : s)
c = std::tolower(c);
}
static void
replace_word(std::string &s, const std::string &from, const std::string &to)
{
size_t pos = 0;
while ((pos = s.find(from, pos)) != std::string::npos) {
s.replace(pos, from.size(), to);
pos += to.size();
}
}
static void replace_words(std::string &s)
{
replace_word(s, "saint", "st");
replace_word(s, "st.", "st");
}
int main()
{
// Given two strings:
std::string s1 = "Hello, Saint Pierre!";
std::string s2 = "hELlO,St.PiERRe!";
// Remove spaces and hyphens.
remove_spaces_and_hyphens(s1);
remove_spaces_and_hyphens(s2);
// Convert to lower case.
convert_to_lower_case(s1);
convert_to_lower_case(s2);
// Replace words...
replace_words(s1);
replace_words(s2);
// Compare.
std::cout << (s1 == s2 ? "Equal" : "Doesn't look like equal") << std::endl;
}
There is a way, of course, to code this more efficiently, but I recommend you start with something working and optimize it only when it proves to be a bottleneck.
It also sounds like you might be interested in string similarity algorithms like “Levenshtein distance”. Similar algorithms are used, for example, by search engine or editors to offer suggestion on spell correction.

I dont know any library, but for equlity, if speed is not rpoblem, you can do char-by-char compare and ignore "special" characters (respectively move iterator further in text).
As for comparing texts, you can use simple Levenshtein distance.

For spaces and hyphens, just replace all spaces/hyphens in the string and do a comparison. For case, convert all text to upper or lower case and do the comparison. For word replacement, you would need a dictionary of words with the key being the abbreviation and the value being the replacement word. You may also consider using the Levenshtein Distance algorithm for showing how similar one phrase is to another. If you want statistical probablility of how close a word/phrase is to another word/phrase, you will need sample data to do a comparison.

QRegExp is what you are looking for. It won't print out the percentages, but you can make some pretty slick ways of comparing one string to another, and finding the number of matches of one string to another.
Regular Expressions are available with almost ever language out there. I like GSkinner's RegEx page for learning regular expressions.
http://qt-project.org/doc/qt-4.8/qregexp.html
Hope that helps.

for the first 3 requirments,
remove all spaces/hypens of string (or replace it to a char, e.g'')
"hello world" --> "helloworld"
compare them ignore case.
Case insensitive string comparison in C++
for the last requirment, it is more compliate.
first you need a dictionary, which in KV structure:
'St.': 'saint'
'Mr.': 'mister'
second use boost token to seperate the string, and fetch then in the KV Store
then replace the token to the string, but it may in low performance:
http://www.boost.org/doc/libs/1_53_0/libs/tokenizer/tokenizer.htm

Replace whole words in a string list without using external libraries

I want to replace some words without using external libraries.
My first attempt was to make a copy of the string, but it was not efficient, so this is another attempt where I use addresses:
void ReplaceString(std::string &subject, const std::string &search, const std::string &replace)
{
size_t position = 0;
while ((position = subject.find(search, position)) != std::string::npos) //if something messes up --> failure
{
subject.replace(position, search.length(), replace);
position = position + replace.length();
}
}
Because this is not very efficient either, I want to use another thing, but I got stuck; I want to use a function like replace_stuff(std::string & a); with a single parameter using string.replace() and string.find() (parsing it with a for loop or something) and then make use of std::map <std::string,std::string>; which is very convenient for me.
I want to use it for a large number of input words. (let's say replacing many bad words with some harmless ones)

The problem with your question is the lack of the necessary components in the Standard library. If you want an efficient implementation, you'd probably need a trie for efficient lookups. Writing one as part of the answer would be way to much code.
If you use a std::map or, if C++11 is available in your environment, a std::unordered_map, you will need to utilitize additional information about the input string and the search-replace pairs from the map. You'd then tokenize the string and check each token if it has to be replaced. Using positions pointing in the input string is a good idea since it avoids copying data. Which brings us to:
Efficiency will depend on memory access (reads and writes), so you should not modify the input string. Create the output by starting with an empty string and by appending pieces from the input. Check each part of the input: If it is a word, check if it needs to be replaced or if it is appended to the output unmodified. If it is not part of a word, append it unmodified.

It sounds like you want to replace all the "bad" words in a string with harmless ones, but your current implementation is inefficient because the list of bad words is much larger than the length of your input string (subject). Is this correct?
If so, the following code should make it more efficient. As you can see, I had to pass the map as a parameter, but if your function is going to be part of a class, you don't need to do so.
void ReplaceString(std::string &subject, const std::map<std::string, std::string>& replace_map)
{
size_t startofword = 0, endofword = 0;
while(startofword < subject.size())
{
size_t length = std::string::npos;
//get next word in string
endofword = subject.find_first_of(" ", startofword);
if(endofword != std::string::npos)
length = endofword-startofword;
std::string search = subject.substr(startofword, length);
//try to find this word in the map
if(replace_map.find(search) != replace_map.end())
{
//if found, replace the word with a new word
subject.replace(startofword, length, replace_map[search]);
startofword += replace_map[search].length();
}
else
{
startofword += length;
}
}
}

I use the following functions, hope it helps:
//=============================================================================
//replaces each occurence of the phrase in sWhat with sReplacement
std::string& sReplaceAll(std::string& sS, const std::string& sWhat, const std::string& sReplacement)
{
size_t pos = 0, fpos;
while ((fpos = sS.find(sWhat, pos)) != std::string::npos)
{
sS.replace(fpos, sWhat.size(), sReplacement);
pos = fpos + sReplacement.length();
}
return sS;
}
//=============================================================================
// replaces each single char from sCharList that is found within sS with entire sReplacement
std::string& sReplaceChars(std::string& sS, const std::string& sCharList, const std::string& sReplacement)
{
size_t pos=0;
while (pos < sS.length())
{
if (sCharList.find(sS.at(pos),0)!=std::string::npos) //pos is where a charlist-char was found
{
sS.replace(pos, 1, sReplacement);
pos += sReplacement.length()-1;
}
pos++;
}
return sS;
}

You might create a class, say Replacer:
class Replacer
{
std::map<std::string,> replacement;
public:
Replacer()
{
// init the map here
replacement.insert ( std::pair<std::string,std::string>("C#","C++") );
//...
}
void replace_stuff(std::string & a);
}
Then the replace_stuff definition would be very similar to your original ReplaceString (it would use map entries instead of the passed parameters).

C++ Regular Expressions with Boost Regex

I am trying to take a string in C++ and find all IP addresses contained inside, and put them into a new vector string.
I've read a lot of documentation on regex, but I just can't seem to understand how to do this simple function.
I believe I can use this Perl expression to find any IP address:
re("\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b");
But I am still stumped on how to do the rest.

Perhaps you're looking for something like this. It uses regex_iterator to get all matches of the current pattern. See reference.
#include <boost/regex.hpp>
#include <iostream>
#include <string>
int main()
{
std::string text(" 192.168.0.1 abc 10.0.0.255 10.5.1 1.2.3.4a 5.4.3.2 ");
const char* pattern =
"\\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
"\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b";
boost::regex ip_regex(pattern);
boost::sregex_iterator it(text.begin(), text.end(), ip_regex);
boost::sregex_iterator end;
for (; it != end; ++it) {
std::cout << it->str() << "\n";
// v.push_back(it->str()); or something similar
}
}
Output:
192.168.0.1
10.0.0.255
5.4.3.2
Side note: you probably meant \\b instead of \b; I doubt you watnted to match backspace character.

The offered solution is quite good, thanks for it. Though I found a slight mistake in the pattern itself.
For example, something like 49.000.00.01 would be taken as a valid IPv4 address and from my understanding, it shouldn't be (just happened to me during some dump processing).
I suggest to improve the patter into:
"\\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)"
"\\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|0)\\b";
This should allow only 0.0.0.0 as the all-zero-in, which I suppose to be correct and it will eliminate all .00. .000. etc.

#include <string>
#include <list>
#include <boost/regex.hpp>
typedef std::string::const_iterator ConstIt;
int main()
{
// input text, expected result, & proper address pattern
const std::string sInput
(
"192.168.0.1 10.0.0.255 abc 10.5.1.00"
" 1.2.3.4a 168.72.0 0.0.0.0 5.4.3.2"
);
const std::string asExpected[] =
{
"192.168.0.1",
"10.0.0.255",
"0.0.0.0",
"5.4.3.2"
};
boost::regex regexIPs
(
"(^|[ \t])("
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])[.]"
"(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])"
")($|[ \t])"
);
// parse, check results, and return error count
boost::smatch what;
std::list<std::string> ns;
ConstIt end = sInput.end();
for (ConstIt begin = sInput.begin();
boost::regex_search(begin, end, what, regexIPs);
begin = what[0].second)
{
ns.push_back(std::string(what[2].first, what[2].second));
}
// check results and return number of errors (zero)
int iErrors = 0;
int i = 0;
for (std::string & s : ns)
if (s != asExpected[i ++])
++ iErrors;
return iErrors;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ regex replace whole word - c++

Simply find and replace, e.g. boost::replace_all() #include <boost/algorithm/string.hpp> std::string target(""[Player]! Are you okay? A plane crash happened, it's on fire!""); boost::replace_all(target, "[Player]", "NiNite");

Related

Change and replace words in a string with C++

C++ Get String between two delimiter String

Equality of two strings

Replace whole words in a string list without using external libraries

C++ Regular Expressions with Boost Regex

Categories

Resources