How to remove an entire word in a sentence c++ - c++

I want to remove specific word in a sentences and I have tried to breakdown the sentences into word and compare the word but when i call the erase function , the index will update. I have tried another method while it will delete substring in a word which i dont want to. Can anyone help me out a bit? what approach i should use.
input
The house whirled around two or three times and rose slowly through the air.
output
The house whirled around two or three times and rose slowly through air.
here is my prototype of the function
int RemoveWordFromLine(string line, string word)
{
// ==========================
string tmp_str="",spacebar=" ";
int start=0,end=-1;
for(int i=0;i<line.length();i++)
{
if(isspace(line[i])||int(line[i])==44||int(line[i])==46)
{
cout<<tmp_str<<" "<<start<<" "<<end<<endl; // compare
if(tmp_str==word)
{
line.erase(start,end);
}
tmp_str="";
start=i+1;
end=i;
} else
{
tmp_str+=line[i];
end++;
}
}
if(tmp_str==word)
{
line.erase(start,end);
}
cout<<tmp_str<<" "<<start<<" "<<end<<endl; // compare
cout<<line<<endl;
// ==========================
}
ideone's link

You can write your function in the following way:
void RemoveWordFromLine(std::string &line, const std::string &word)
{
auto n = line.find(word);
if (n != std::string::npos)
{
line.erase(n, word.length());
}
}
And use it like:
std::string line("This is a wrong line");
RemoveWordFromLine(line, "wrong");
printf("line is: '%s'\n", line.c_str());
Prints out:
line is: 'This is a line'

You are passing end position instead of length of string to be deleted.
You just need to replace
if(tmp_str==word)
{
line.erase(start,end);
}
with
if(tmp_str==word)
{
line.erase(start,word.length()+1); //+1 to prevent 2 spaces
}
You could also do line.erase(start-1,word.length()+1); if you would like to keep '.' or ',' after deleted word.

With <regex>, you might do:
std::string RemoveWordFromLine(const std::string& line, std::string word)
{
// \s* : extra optional spaces at the start
// \b : word boundary
// Ideally, we should check that `word` doesn't break regex too:
std::regex reg("\\s*\\b" + word + "\\b");
return std::regex_replace(line, reg, "");
}
Demo

Remove multiple occurrences as well with this function
void removeWords(string& s, const string& p) {
size_t n = p.length();
for (size_t i = s.find(p);
i != string::npos;
i = s.find(p))
// take care of subword
if((i==0 && (i+n <= s.length()) && (s[i+n+1] == ' ' || s[i+ n+1] == '.'))|| //first word
(i + n <= s.length() && s[i - 1] == ' ' && (s[i+n]=='\0' ||s[i + n] == '.' || s[i+ n+1] == ' '))) //other word
s.erase(i, n);
}

Related

ask for text to edit, text formatting

I would like to make a program that asks for text (a paragraph with several words) that would be separated by commas.
To transform the text and add a tag between the two, like to format the text to html text
Example:
word1, word2, word3
to
<a> word1 </a>, <a> word2 </a>, <a> word3 </a>
So I started doing this code but I do not know how to continue. How can I test the text to find the front of the word? I imagine with ASCII tests?
Maybe with a table that will test every case ?
I do not necessarily ask the complete answer but maybe a direction to follow could help.
#include <iostream>
#include <iomanip>
#include <string> //For getline()
using namespace std;
// Creating class
class GetText
{
public:
string text;
string line; //Using this as a buffer
void userText()
{
cout << "Please type a message: ";
do
{
getline(cin, line);
text += line;
}
while(line != "");
}
void to_string()
{
cout << "\n" << "User's Text: " << "\n" << text << endl;
}
};
int main() {
GetText test;
test.userText();
test.to_string();
system("pause");
return 0;
}
The next thing you would need to do is to split your input by a deltimeter (in your case ',') into a vector and later combine everything with pre and posfixes. C++ does not support splitting by default, you would have to be creative or search for a solution like here.
If you want to keep it really simple, you can detect word boundaries by checking two characters at a time. Here's a working example.
using namespace std;
#include <iostream>
#include <string>
#include <cctype>
typedef enum boundary_type_e {
E_BOUNDARY_TYPE_ERROR = -1,
E_BOUNDARY_TYPE_NONE,
E_BOUNDARY_TYPE_LEFT,
E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;
typedef struct boundary_s {
boundary_type_t type;
int pos;
} boundary_t;
bool is_word_char(int c) {
return ' ' <= c && c <= '~' && !isspace(c) && c != ',';
}
boundary_t maybe_word_boundary(string str, int pos) {
int len = str.length();
if (pos < 0 || pos >= len) {
return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
} else {
if (pos == 0 && is_word_char(str[pos])) {
// if the first character is word-y, we have a left boundary at the beginning
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
} else if (pos == len - 1 && is_word_char(str[pos])) {
// if the last character is word-y, we have a right boundary left of the null terminator
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
} else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
// if we have a delimiter followed by a word char, we have a left boundary left of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
} else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
// if we have a word char followed by a delimiter, we have a right boundary right of the word char
return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
}
return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
}
}
int main() {
string str;
string ins_left("<tag>");
string ins_right("</tag>");
getline(cin, str);
// can't use length for the loop condition without recalculating it all the time
for (int i = 0; str[i] != '\0'; i++) {
boundary_t boundary = maybe_word_boundary(str, i);
if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
str.insert(boundary.pos, ins_left);
i += ins_left.length();
} else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
str.insert(boundary.pos, ins_right);
i += ins_right.length();
}
}
}
It would be better to use enum class but I forgot the notation. You can also copy to a buffer instead of generating the new string in-place, I was just trying to keep it simple. Feel free to expand it to a class based C++ style. To get your exact desired output, strip the spaces first and add spaces to ins_left and ins_right.

Parse String Between Brackets

I have a string that goes like this:
Room -> Subdiv("X", 0.5, 0.5) { sleep | work } : 0.5
I need to somehow extract the 2 strings between {} , i.e. sleep and work. The format is strict, there can be just 2 words between the brackets, the words can change though. The text before and after the brackets can also change. My initial way of doing it was:
string split = line.substr(line.find("Subdiv(") + _count_of_fchars);
split = split.substr(4, axis.find(") { "));
split = split.erase(split.length() - _count_of_chars);
However, I do realised that this is no going to work if the strings in side the brackets are changed o anything with a different length.
How can this be done? Thanks!
Without hard-coding any numbers:
Find A as the index of the first "{" from the end of the string, search backward.
Find B as the index of the first "|" from the position of "{", search forward.
Find C as the index of the first "}" from the position of "|", search forward.
The substring between B and A gives you the first string. While the substring between C and B gives you the first string. You can include the spaces in your substring search, or take them out later.
std::pair<std::string, std::string> SplitMyCustomString(const std::string& str){
auto first = str.find_last_of('{');
if(first == std::string::npos) return {};
auto mid = str.find_first_of('|', first);
if(mid == std::string::npos) return {};
auto last = str.find_first_of('}', mid);
if(last == std::string::npos) return {};
return { str.substr(first+1, mid-first-1), str.substr(mid+1, last-mid-1) };
}
For Trimming the spaces:
std::string Trim(const std::string& str){
auto first = str.find_first_not_of(' ');
if(first == std::string::npos) first = 0;
auto last = str.find_last_not_of(' ');
if(last == std::string::npos) last = str.size();
return str.substr(first, last-first+1);
}
Demo
Something like:
unsigned open = str.find("{ ") + 2;
unsigned separator = str.find(" | ");
unsigned close = str.find(" }") - 2;
string strNew1 = str.substr (open, separator - open);
string strNew2 = str.substr (separator + 3, close - separator);
Even though you said that the amount of words to find is fixed I made a little more flexible example using a regular expression. However you could still achieve the same result using Мотяs answer.
std::string s = ("Room -> Subdiv(\"X\", 0.5, 0.5) { sleep | work } : 0.5")
std::regex rgx("\\{((?:\\s*\\w*\\s*\\|?)+)\\}");
std::smatch match;
if (std::regex_search(s, match, rgx) && match.size() == 2) {
// match[1] now contains "sleep | work"
std::istringstream iss(match[1]);
std::string token;
while (std::getline(iss, token, '|')) {
std::cout << trim(token) << std::endl;
}
}
trim removes leading and trailing spaces and the input string could easily be expanded to look like this: "...{ sleep | work | eat }...".
Here is the complete code.

string::replace not working correctly 100% of the time?

I'm trying to replace every space character with '%20' in a string, and I'm thinking of using the built in replace function for the string class.
Currently, I have:
void replaceSpace(string& s)
{
int len = s.length();
string str = "%20";
for(int i = 0; i < len; i++) {
if(s[i] == ' ') {
s.replace(i, 1, str);
}
}
}
When I pass in the string "_a_b_c_e_f_g__", where the underscores represent space, my output is "%20a%20b%20c%20e_f_g__". Again, underscores represent space.
Why is that the spaces near the beginning of the string are replaced, but the spaces towards the end aren't?
You are making s longer with each replacement, but you are not updating len which is used in the loop condition.
Modifying the string that you are just scanning is like cutting the branch under your feet. It may work if you are careful, but in this case you aren't.
Namely, you take the string len at the beginning but with each replacement your string gets longer and you are pushing the replacement places further away (so you never reach all of them).
The correct way to cut this branch is from its end (tip) towards the trunk - this way you always have a safe footing:
void replaceSpace(string& s)
{
int len = s.length();
string str = "%20";
for(int i = len - 1; i >= 0; i--) {
if(s[i] == ' ') {
s.replace(i, 1, str);
}
}
}
You're growing the string but only looping to its initial size.
Looping over a collection while modifying it is very prone to error.
Here's a solution that doesn't:
void replace(string& s)
{
string s1;
std::for_each(s.begin(),
s.end(),
[&](char c) {
if (c == ' ') s1 += "%20";
else s1 += c;
});
s.swap(s1);
}
As others have already mentioned, the problem is you're using the initial string length in your loop, but the string gets bigger along the way. Your loop never reaches the end of the string.
You have a number of ways to fix this. You can correct your solution and make sure you go to the end of the string as it is now, not as it was before you started looping.
Or you can use #molbdnilo 's way, which creates a copy of the string along the way.
Or you can use something like this:
std::string input = " a b c e f g ";
std::string::size_type pos = 0;
while ((pos = input.find(' ', pos)) != std::string::npos)
{
input.replace(pos, 1, "%20");
}
Here's a function that can make it easier for you:
string replace_char_str(string str, string find_str, string replace_str)
{
size_t pos = 0;
for ( pos = str.find(find_str); pos != std::string::npos; pos = str.find(find_str,pos) )
{
str.replace(pos ,1, replace_str);
}
return str;
}
So if when you want to replace the spaces, try it like this:
string new_str = replace_char_str(yourstring, " ", "%20");
Hope this helps you ! :)

Separator character in string c++

This is the requirement: Read a string and loop it, whenever a new word is encountered insert it into std::list. If the . character has a space, tab, newline or digit on the left and a digit on the right then it is treated as a decimal point and thus part of a word. Otherwise it is treated as a full stop and a word separator.
And this is the result I run from the template program:
foo.bar -> 2 words (foo, bar)
f5.5f -> 1 word
.4.5.6.5 -> 1 word
d.4.5f -> 3 words (d, 4, 5f)
.5.6..6.... -> 2 words (.5.6, 6)
It seems very complex for me in first time dealing with string c++. Im really stuck to implement the code. Could anyone suggest me a hint ? Thanks
I just did some scratch ideas
bool isDecimal(std::string &word) {
bool ok = false;
for (unsigned int i = 0; i < word.size(); i++) {
if (word[i] == '.') {
if ((std::isdigit(word[(int)i - 1]) ||
std::isspace(word[(int)i -1]) ||
(int)(i - 1) == (int)(word.size() - 1)) && std::isdigit(word[i + 1]))
ok = true;
else {
ok = false;
break;
}
}
}
return ok;
}
void checkDecimal(std::string &word) {
if (!isDecimal(word)) {
std::string temp = word;
word.clear();
for (unsigned int i = 0; i < temp.size(); i++) {
if (temp[i] != '.')
word += temp[i];
else {
if (std::isalpha(temp[i + 1]) || std::isdigit(temp[i + 1]))
word += ' ';
}
}
}
trimLeft(word);
}
I think you may be approaching the problem from the wrong direction. It seems much easier if you turn the condition upside down. To give you some pointers in a pseudocode skeleton:
bool isSeparator(const std::string& string, size_t position)
{
// Determine whether the character at <position> in <string> is a word separator
}
void tokenizeString(const std::string& string, std::list& wordList)
{
// for every character in string
// if(isSeparator(character) || end of string)
// list.push_back(substring from last separator to this one)
}
I suggest to implement it using flex and bison with c++ implementation

Loop quitting for no reason

I have a question regarding C++. This is my current function:
string clarifyWord(string str) {
//Remove all spaces before string
unsigned long i = 0;
int currentASCII = 0;
while (i < str.length()) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i++;
continue;
} else {
break;
}
}
//Remove all spaces after string
i = str.length();
while (i > -1) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i--;
continue;
} else {
break;
}
}
return str;
}
Just to get the basic and obvious things out of the way, I have #include <string> and using namespace std; so I do have access to the string functions.
The thing is though that the loop is quitting and sometimes skipping the second loop. I am passing in the str to be " Cheese " and it should remove all the spaces before the string and after the string.
In the main function, I am also assigning a variable to clarifyWord(str) where str is above. It doesn't seem to print that out either using cout << str;.
Is there something I am missing with printing out strings or looping with strings? Also ASCII code 32 is Space.
Okay so the erase function you are calling looks like this:
string& erase ( size_t pos = 0, size_t n = npos );
The n parameter is the number of items to delete. The npos means, delete everything up until the end of the string, so set the second parameter to 1.
str.erase(i,1)
[EDIT]
You could change the first loop to this:
while (str.length() > 0 && str[0] == ' ')
{
str.erase(0,1);
}
and the second loop to this:
while (str.length() > 0 && str[str.length() - 1] == ' ')
{
str.erase(str.length() - 1, 1);
}
In your second loop, you can't initialize i to str.length().
str[str.length()] is going to be after the end of your string, and so is unlikely to be a space (thus triggering the break out of the second loop).
You're using erase (modifying the string) while you're in a loop checking its size. This is a dangerous way of processing the string. As you return a new string, I would recommend you first to search for the first occurrence in the string of the non-space character, and then the last one, and then returning a substring. Something along the lines of (not tested):
size_t init = str.find_first_not_of(' ');
if (init == std::string::npos)
return "";
size_t fini = std.find_last_not_of(' ');
return str.substr(init, fini - init + 1);
You see, no loops, erases, etc.
unsigned long i ... while (i > -1) Well, that's not right, is it? How would you expect that to work? The compiler will in fact convert both operands to the same type: while (i > static_cast<unsigned long>(-1)). And that's just another way to write ULONG-MAX, i.e. while (i > ULONG_MAX). In other words, while(false).
You're using erase incorrectly. It'll erase from pos to npos.
i.e. string& erase ( size_t pos = 0, size_t n = npos );
See: http://www.cplusplus.com/reference/string/string/erase/
A better way to do this is to note the position of the first non space and where the spaces occur at the end of the string. Then use either substr or erase twice.
You also don't need to go to the trouble of doing this:
currentASCII = int(str[i]);
if (currentASCII == 32) {
Instead do this:
if (str[i] == ' ') {
Which I think you'll agree is a lot easier to read.
So, you can shorten it somewhat with something like: (not tested but it shouldn't be far
off)
string clarifyWord(string str) {
int start = 0, end = str.length();
while (str[start++] == ' ');
while (str[end--] == ' ');
return str.substr(start, end);
}