Removing Special Characters from C++ String (Except ' and - ) [duplicate] - c++

This question already has answers here:
How to remove certain characters from a string in C++?
(15 answers)
Closed 3 years ago.
I'm trying to remove special characters from a string using an isWordChar() method. However, I need to keep two special characters, " ' " and " - ", such as the apostrophe in "isn't" and the hyphens in mother-in-law. Here's what I'm trying to implement:
std::string WordCount::stripWord(std::string word) {
for(unsigned int i = 0; i < wrd.size(); ++i)
{
if( !isWordChar(wrd[i]) && (wrd[i]!=39 && wrd[i]!=45))
{
wrd.erase(wrd.begin()+i);
--i;
}
}
return wrd;
}
After adding the special cases in my boolean, I can't get seem to correctly add the exception. Any hints or advice? Thanks!

I would use the remove/erase idiom:
word.erase(std::remove_if(word.begin(),
word.end(),
[](char c) {
return !(isWordChar(c) || '-' == c || '\'' == c);
}), word.end());
The way you're erasing characters has complexity of approximately O(N * M) (where N is the original length of the string and M is the number of characters you remove). This has a complexity of approximately O(N), so if you're removing very many characters (or the string is very long) it's likely to give a substantial speed improvement.
If you care about why it's so much faster, it's because it works somewhat differently. To be specific, when you erase an element from the middle of a string, the erase function immediately copies all the letters after that to fill the hole where you erased the character. If you do this M times, all those characters get copied one for each character you remove.
When you use remove_if, it does something more like this:
template <class Iter, class F>
Iter remove_if(Iter b, iter e, F f)
auto dest = word.begin();
for (auto src=word.begin(); src != word.end(); ++src)
if (!f(*src))
*dst++ = *src;
++src;
}
return dst;
}
This way, each character that's retained is only copied once, rather than being copied every time you remove one character from the string. Then when you do the final erase, it just removes characters from the end of the string, so it's basically just adjusting the length of the string downward.

Your logic is incorrect. It should be: !isWordChar(wrd[i]) && wrd[i] != 39 && wrd[i] != 45. Read as: If the character isn't a word character, and it's not an apostrophe, and it's not a hyphen, do whatever is in the if-statement.

Related

Finding words in a (weird) string in C++

What is technically wrong in this program? The expected result is 6 since that is the total number of words present in the string.
#include <iostream>
using namespace std;
int main()
{
string str = " Let's count the number of words ";
int word = 0;
for (int i = 0; str[i] != '\0';)
{
if ((str[i] == 32 && str[i + 1] == 32) || (str[i] == 32 && str[i - 1] == 32))
{
++i;
}
else if ((str[i] == 32 && str[i - 1] != 32) || (str[i] == 32 && str[i + 1] != 32))
{
word++;
}
++i;
}
cout << "No. of words: " << word << endl;
return 0;
}
My incorrect result:
No. of words: 0
Also, if I try changing the spaces in the string or even the string itself to a totally new set of spaced out words, say:
string str = " Hello world ";
string str = "Hello world! How are you? ";
I still get incorrect results, but different from 0. I'm new to C++ programming and these kinds of strange behaviors are giving me nightmares. Is this common? What I can do to get this corrected?
If you could highlight or correct my program the way I'd written it, it would be much helpful and quick for me to understand the mistake instead of having to know some new commands at this point. Because, as I said, I'm a total beginner in C/C++.
Thanks for your time!
I'm new to C++ programming and these kinds of strange behaviors are giving me nightmares. Is this common?
Yes, it's very common. You've written a load of logic piled up in a heap and you don't have the tools to understand how it behaves.
What I can do to get this corrected?
You can work on this from both directions:
debug this to improve your understanding of how it operates:
identify in advance what you expect it to do for some short input, at each line
single-step through it in the debugger to see what it actually does
think about why it doesn't do what you expected
Sometimes the problem is that your code doesn't implement your algorithm correctly, and sometimes the algorithm itself is broken, and often it's a bit of both. Working through both will give you some insight.
write code that is easier to understand in the first place (and equivalently, write algorithms that are easy to reason about).
This depends on you having some intuition about whether something is easy to reason about, which you develop from iterating step 1.
... instead of having to know some new commands at this point.
Well, you need to learn to use a debugger anyway, so now is as good a time to start as any.
We can certainly improve the existing code, although I'd prefer to fix the logic. In general I'd encourage you to abstract your existing if conditions out into little functions, but the problem is that they don't currently seem to make any sense.
So, how do we define a word?
Your code says it is at least one non-space character preceded or followed by a space. (Do definitely prefer ' ' to 32, by the way, and std::isspace is better than either.)
However your code's implied definition is problematic, because:
each word longer than one character has both a first and last character, and you'll count each of them
you can't check whether the first character is preceded by anything, without going out of bounds
the last character is followed by the null terminator, but you don't count that as whitespace
Let's just choose a different definition, that doesn't require reading str[i-1], and doesn't require the tricky traversal your current code gets wrong.
I claim that a word is a contiguous substring of non-whitespace characters, and words are separated by contiguous substrings of whitespace characters. So, instead of looking at each pair of consecutive characters, we can write pseudocode to work in those terms:
for (current = str.begin(); current != str.end(); ) {
// skip any leading whitespace
current = find_next_non_whitespace(str, current);
if (current != str.end()) {
// we found a word
++words;
current = find_next_whitespace(str, current);
}
}
NB. When I talked about abstracting your code out into little functions, I meant things like find_next_non_whitespace - they should be trivial to implement, easy to test, and have a name that tells you something.
When I said your existing conditions didn't seem to make sense, it's because replacing
if ((str[i] == 32 && str[i + 1] == 32) || (str[i] == 32 && str[i - 1] == 32))
with, say,
if (two_consecutive_spaces(str, i))
prompts more questions than it answers. Why have a special case for exactly two consecutive spaces? Is it different to just one space? What will actually happen if we have two words with a single space between them? Why do we advance by two characters in this case, but only one on the word branch?
The fact that the code can't easily be mapped back onto explicable logic is a bad sign - even if it worked (which we know it doesn't), we don't understand it well enough to ever change, extend or refactor it.
I think you have some ways to do it. Take a look at this code. Very similar to yours:
string s = " Let's count the number of words ";
int word = 0;
for (auto i = 0; s[i] != '\0'; i++) {
if (i == 0) {
if (s[i] != ' ') {
++word;
}
continue;
}
if (s[i - 1] == ' ' && s[i] != ' ') {
++word;
}
}
cout << "No of Words: " << word << endl;
The idea is to iterate over the string reading character by character. So we do some logic:
If we are in the first string character and it's equals to ' ', go to the next loop iteration
If we are in the first string character and it's different from ' ', means we are starting a word, so counts it and jump to the next loop iteration.
If we reach the second if, means we are not at the first position, so trying to access i - 1 should be valid. Then we just check if the previous char is a blank space and the current one it's not. This means we are starting a new word. So counts it and jump to the next loop iteration.
Another and more simple way to do it is using stringstream:
string s = " Let's count the number of words ";
stringstream ss(s);
string sub;
int word = 0;
while (ss >> sub) {
++word;
}
cout << "No of Words: " << word << endl;
This way you're basically extracting word by word from your string.

Print out each character randomly

I am creating a small game where the user will have hints(Characters of a string) to guess the word of a string. I have the code to see each individual character of the string, but is it possible that I can see those characters printed out randomly?
string str("TEST");
for (int i = 0; i < str.size(); i++){
cout <<" "<< str[i];
output:T E S T
desired sample output: E T S T
Use random_shuffle on the string:
random_shuffle(str.begin(), str.end());
Edits:
C++11 onwards use:
auto engine = std::default_random_engine{};
shuffle ( begin(str), end(str), engine );
Use the following code to generate the letters randomly.
const int stl = str.size();
int stl2 = stl;
while (stl2 >= 0)
{
int r = rand() % stl;
if (str[r] != '0')
{
cout<<" "<<str[r];
str[r] = '0';
stl2--;
}
}
This code basically generates the random number based on the size of the String and then prints the character placed at that particular position of the string.
To avoid the reprinting of already printed character, I have converted the character printed to "0", so next time same position number is generated, it will check if the character is "0" or not.
If you need to preserve the original string, then you may copy the string to another variable and use it in the code.
Note: It is assumed that string will contain only alphabetic characters and so to prevent repetition, "0" is used. If your string may contain numbers, you may use a different character for comparison purpose

Have an if statement look ONLY at the first word in a string [duplicate]

This question already has answers here:
how to check string start in C++
(11 answers)
Closed 7 years ago.
I have a vector composed of strings, some strings have single words, some have multiple words, some have numbers, etc. I have a code that deletes elements of the vector IF the entire string is one specific word ("event") that works perfectly:
for (int j = 0; j< myvec.size()-1; j++) {
if(myvec[j] == "<event>") { //erase all instances of "<event>"
myvec.erase(myvec.begin()+j);
}
}
However, now I need to delete a few elements in the vector that only START with a word (these all have differing junk after that first key word "wgt"
I have no idea how to get this working. I'm assuming it will be something similar to the above for/if loop, I just don't know how to make the if statement only look at the first word in the string.
Any ideas?
Thanks in advance!
The first line places string beginning with 'abc' at the end, and the second erases them from the vector.
auto end_it = std::remove_if(myvec.begin(), myvec.end(),
[](const string &str){
return str.find("abc") == 0 ;}) ;
myvec.erase(end_it, myvec.end()) ;

remove non alphabet characters from string c++ [duplicate]

This question already has answers here:
How to strip all non alphanumeric characters from a string in c++?
(12 answers)
Closed 6 years ago.
I'm trying to remove all non alphabet characters from an inputed string in c++ and don't know how to. I know it probably involves ascii numbers because that's what we're learning about. I can't figure out how to remove them. We only learned up to loops and haven't started arrays yet. Not sure what to do.
If the string is Hello 1234 World&*
It would print HelloWorld
If you use std::string and STL, you can:
string s("Hello 1234 World&*");
s.erase(remove_if(s.begin(), s.end(), [](char c) { return !isalpha(c); } ), s.end());
http://ideone.com/OIsJmb
Note: If you want to be able to handle strings holding text in just about any language except English, or where programs use a locale other than the default, you can use isalpha(std::locale).
PS: If you use a c-style string such as char *, you can convert it to std::string by its constructor, and convert back by its member function c_str().
If you're working with C-style strings (e.g. char* str = "foobar") then you can't "remove" characters from a string trivially (as a string is just a sequence of characters stored sequentially in memory - removing a character means copying bytes forward to fill the empty space used by the deleted character.
You'd have to allocate space for a new string and copy characters into it as-needed. The problem is, you have to allocate memory before you fill it, so you'd over-allocate memory unless you do an initial pass to get a count of the number of characters remaining in the string.
Like so:
void BlatentlyObviousHomeworkExercise() {
char* str = "someString";
size_t strLength = ... // how `strLength` is set depends on how `str` gets its value, if it's a literal then using the `sizeof` operator is fine, otherwise use `strlen` (assuming it's a null-terminated string).
size_t finalLength = 0;
for(size_t i = 0; i < strLength; i++ ) {
char c = str[i]; // get the ith element of the `str` array.
if( IsAlphabetical(c) ) finalLength++;
}
char* filteredString = new char[ finalLength + 1 ]; // note I use `new[]` instead of `malloc` as this is C++, not C. Use the right idioms :) The +1 is for the null-terminator.
size_t filteredStringI = 0;
for(size_t i = 0; i < strLength; i++ ) {
char c = str[i];
if( IsAlphabetical(c) ) filteredString[ filteredStringI++ ] = c;
}
filteredString[ filteredStringI ] = '\0'; // set the null terminator
}
bool IsAlphabet(char c) { // `IsAlphabet` rather than `IsNonAlphabet` to avoid negatives in function names/behaviors for simplicity
return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
I do not want to spoil the solution so I will not type out the code, only describe the solution. For your problem think of iterating through your string. Start with that. Then you need to decide if the currently selected character is part of the alphabet or not. You can do this numerous different ways. Checking ASCII values? Comparing against a string of the alphabet? Once you decide if it is a letter, then you need to rebuild the new string with that letter plus the valid letters before and after that you found or will find. Finally you need to display your new string.
If you look at an ascii table, you can see that A-Z is between 65-90 and a-z is between 97-122.
So, assuming that you only need to remove those characters (not accentuated), and not other characters from other languages for example, not represented in ascii, all you would need to do is loop the string, verify if each char is in these values and remove it.

c++ string member function substr usage

Please tell me if I am understanding the the substr member function correctly?
result = result.substr(0, pos) + result.substr(pos + 1);
It takes the string from pos, 0 until (but not including), remove[i]
and then + result.substr(pos + 1); concatenates the rest of the string, except but not including the string / char in remove?
string removeLetters2(string text, string remove)
{
int pos;
string result = text;
for (int i = 0; i < remove.length(); i++)
{
while (true)
{
pos = result.find(remove[i]);
if (pos == string::npos)
{
break;
}
else
{
result = result.substr(0, pos) +
result.substr(pos + 1);
}
}
}
return result;
}
In short, you are asking if
result = result.substr(0, pos) +
result.substr(pos + 1);
removes the character at position pos, right?
Short Answer:
Yes.
Longer Answer:
The two-argument call takes the start index and the length (the one argument call goes to the end of string).
It helps to imagine the string like this:
F o o / B a r
0 1 2 3 4 5 6 <- indices
Now remove /:
F o o / B a r
0 1 2 3 4 5 6 <- indices
1 2 3 | <- 1st length
| 1 2 3 <- 2nd length
result = result.substr(0, 3) <- from index 0 with length 3
+ result.substr(4); <- from index 4 to end
As a programmer, always be aware of the difference between distance/index and length.
Better: If index is known:
Your code creates two new, temporary strings, which are then concatenated into a third temporary string, which is then copied to result.
It would be better to ask string to erase (wink wink) in place:
result.erase(pos,1);
// or by iterator
string::iterator it = ....;
result.erase(it,it+1);
This leaves more optimization freedom to the string implementer, who may choose to just move all characters after pos by one to the left. This could, in a specialized scenario, be implemented with a single assignment, a single loop, and within the loop with the x86 swap instruction.
Better: If characters to be deleted are known:
Or, but I am not sure if this gives better performance, but it may give better code, the algorithm remove_if:
#include <algorithm>
// this would remove all slashes, question marks and dots
....
std::string foobar = "ab/d?...";
std::remove_if (foobar.begin(), foobar.end(), [](char c) {
return c=='/' || c=='?' || '.';
});
remove_if accepts any function object.
If there is just one character, it gets easier:
// this would remove all slashes
std::remove (foobar.begin(), foobar.end(), '/');
Although the answer to your question is "yes", there is a better way to go about what you are trying to do. Use string::erase, like this:
result.erase(pos, 1);
This API is designed for removal of characters from the string; it achieves the same result much more efficiently.
Yes, this function removes all letters in remove from text.
since you seem to delete more than one type of character have a look at remove_if from <algorithm> with a special predicate too, although the response of dasblinkenlignt is the good one