Remove out excess spaces from string in C++ - c++

I have written program for removing excess spaces from string.
#include <iostream>
#include <string>
void RemoveExcessSpaces(std::string &s) {
for (int i = 0; i < s.length(); i++) {
while (s[i] == ' ')s.erase(s.begin() + i);
while (s[i] != ' ' && i < s.length())i++;
}
if (s[s.length() - 1] == ' ')s.pop_back();
}
int main() {
std::string s(" this is string ");
RemoveExcessSpaces(s);
std::cout << "\"" << s << "\"";
return 0;
}
One thing is not clear to me. This while (s[i] == ' ')s.erase(s.begin() + i); should remove every space in string, so the output would be thisisstring, but I got correct output which is this is string.
Could you explain me why program didn't remove one space between this and is and why I got the correct output?
Note: I cannot use auxiliary strings.

That is because when your last while loop finds the space between your characters (this is) control pass to increment part of your for loop which will increase the value of int i then it will point to next character of given string that is i(this is string) that's why there is space between (this is).

Your second while loop will break when s[i]==' '. But then your for loop will increment i and s[i] for that i will be skipped. This will happen for every first space character after each word.

Related

How to get the number of words from a line from a text file

I am trying to get the number of words from a line in a text file. I used .getline() in order to extract a line from the entire text file. The code is:
#include <iostream>
#include <iomanip>
#include <fstream>
int main()
{
char const* filename = "duck.txt";
std::ifstream ifs{ filename };
constexpr size_t MAX_LINE_LEN{ 2048 };
char line[MAX_LINE_LEN];
int lineCount = 0;
int totalWordCount = 0;
int totalByteCount = 0;
while (ifs.getline(line, MAX_LINE_LEN-1))
{
int lineWord = 0;
char* q = &line[0];
if (ifs.eof())
{
lineCount--;
totalByteCount--;
}
while (*q != '\0')
{
q++;
totalByteCount++;
}
totalByteCount++;
if (*q == '\0')
{
lineCount++;
}
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ')
{
i++;
}
j = i;
while (line[j] != ' ')
{
j++;
}
lineWord++;
j = i;
}
totalWordCount += lineWord;
}
std::cout << "Total Lines: " << lineCount << '\n' << "Total Words: " << totalWordCount << '\n' << "Total Bytes: " << totalByteCount;
}
But the only important part is:
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ')
{
i++;
}
j = i;
while (line[j] != ' ')
{
j++;
}
lineWord++;
j = i;
}
totalWordCount += lineWord;
I'm trying to read the line character by character until I reach a non-whitespace character after which I'll assign that subscript to i. Then, I'll set j to the subscript of the first whitespace encountered after the character of line[i]. If j finds a whitespace, then there is a word. If j reaches '\0', then the line has ended and I end the while loop. When I try to compile and run this, the compiler just displays nothing. What am I doing wrong? Also, I can't add anymore header files
In your original code:
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ') // you do not check for end of string character?
{
i++;
}
j = i; // here i = j = beginning of word.
// since you rewind to beginning of the word below,
// your program keeps repeating this loop endlessly
while (line[j] != ' ') // this loop could easily run for quite a while.
// until it seg-faults
{
j++;
}
lineWord++;
j = i; // BUG Here! you're 'rewinding' j to to beginning of the word.
// you loop back and keep counting the same word over and over.
}
totalWordCount += lineWord;
What is the purpose of i and j ? Wouldn't the code be simpler and easier to read and maintain using a single pointer? Or a single index? This kind of algorithm is where a pointer would excel, though, as the only arithmetic pointer operation needed is increment.
As in:
const char* p = &line[0];
int word_count = 0;
line[MAX_LINE_LEN - 1] = 0; // making sure the code below stays within boundaries.
for(;;)
{
// skip to next word
while (*p && *p == ' ') ++p; // stay within the string by testing for zero.
if (!*p)
break; // done!
// since p now points to the beginning of a word, we've got one
++word_count;
// skip to end of word
while (*p && *p != ' ') ++p;
}
This is all fine, for most cases but there could be some exceptional typos in the text, like "hello, world !", where the last punctuation would be counted as a word. There is also the problem of horizontal tabs, which could also be counted erroneously as words.
To cover these cases, you should test for valid characters for words, instead of for space, which is a rather vague concept.
Without using library calls, You'd need to define what constitutes spaces and punctuation, either with a constant, or with a function.
Substituting the test for space with a more targeted test, using isalnum() to check for alpha or numeric characters:
for(;;)
{
// skip to next word
while (*p && !std::isalnum(*p & 0xFF)) ++p;
if (!*p)
break; // done!
++word_count;
// skip to end of word
while (*p && std::isalnum(*p & 0xFF)) ++p;
}
Beware of function of the isalnum(), isalpha().. family, they define their input as an int, the mask ensures that characters in the 128-255 range are not sign-extended aand are passed correctly as positive values.

Fastest way to count words of string

How could I make this algorithm faster and shorten this code which counts word of given string?
int number_of_words(std::string &s) {
int count = 0;
for (int i = 0; i < s.length(); i++) {
// skip spaces
while (s[i] == ' ' && i < s.length())
i++;
if (i == s.length())
break;
// word found
count++;
// inside word
while (s[i] != ' ' && i < s.length())
i++;
}
return count;
}
Your code is quite alright, speed-wise. But if you want to make your code shorter, you may use find_first_not_of() and find_first_of standard functions, like I did in following code that solves your task.
I made an assumption that all your words are separated by only spaces. If other separators are needed you may pass something like " \r\n\t" instead of ' ' in both lines of my code.
One small optimization that can be made in your code is when you notice that after first while-loop we're located on non-space character, so we can add ++i; line for free before second loop. Similarly after second while-loop we're located on space character so we may add one more ++i; line after second while loop. This will give a tiny bit of speed gain to avoid extra two checks inside while loop.
Try it online
#include <iostream>
#include <string>
int number_of_words(std::string const & s) {
ptrdiff_t cnt = 0, pos = -1;
while (true) {
if ((pos = s.find_first_not_of(' ', pos + 1)) == s.npos) break;
++cnt;
if ((pos = s.find_first_of(' ', pos + 1)) == s.npos) break;
}
return cnt;
}
int main() {
std::cout << number_of_words(" abc def ghi ") << std::endl;
}
Output:
3

Replacing a substring with a space character

I am given a string and I have to remove a substring from it. Namely WUB, and replace it with a space character.
There are 2 WUB's between ÁRE' and 'THE'. SO the first condition in if statement is for not printing two blank spaces but on executing the code two blank spaces are being printed.
Input: WUBWEWUBAREWUBWUBTHEWUBCHAMPIONSWUBMYWUBFRIENDWUB
Output: WE ARE THE CHAMPIONS MY FRIEND
Here is my code so far:
#include <iostream>
using namespace std;
int main()
{
const string check = "WUB";
string s, p;
int ct = 0;
cin >> s;
for (int i = 0; i < s.size(); i++)
{
if (s[i] == 'W' && s[i+1] == 'U' && s[i+2] == 'B')
{
i += 2;
if (p[ct] == '32' || p.empty())
{
continue;
}
else
{
p += ' ';
ct++;
}
}
else
{
p += s[i];
ct++;
}
}
cout << p;
return 0;
}
Why is the first if statement never executed?
2 things are going to break your code:
you are doing a for loop like this int i=0;i<s.size() but reading (s[i]=='W' && s[i+1]=='U' && s[i+2]=='B')
and here: if(p[ct]=='32') you mean for sure if(p[ct]==32) or if(p[ct]==' ')
This condition
if(p[ct]=='32')
should read either
if(p[ct]==32)
or
if(p[ct]==' ')
that is, compare to the numeric value of the space character or to the space character itself.
Additionally, when your i grows close to the string's length, the subexpressions s[i+1] and s[i+2] may reach non-exiting characters of the string. You should continue looping with a i<s.length()-2 condition.
EDIT
For a full solution you need to fully understand the problem you want to solve. The problem statement is a bit vague:
remove a substring ("WUB") from (a given string). And put a space inplace of it if required.
You considered the last condition, but not deeply enough. What does it mean, 'if required'? Replacement is not required if the resulting string is empty or you appended a space to it already (when you encounter a second of further consecutive WUB). It is also not necessary if you are at WUB, but there is nothing more following it - except possibly another WUBs...
So, when you find a "WUB" substring it is too early to decide if a space is needed. You know you need a space when you find a non-WUB text following some WUB (or WUBs) and there was some text before those WUB(s).
There are actually three bugs here, so it's probably worth to conclude them in one answer:
The first condition:
if (s[i] == 'W' && s[i+1] == 'U' && s[i+2] == 'B')
is out of bounds for the last two characters. One fix would be to check the length first:
if(i < s.length() - 2 && s[i] == 'W' && s[i+1] == 'U' && s[i+2] == 'B')
There's a multicharacter-literal in
if (p[ct] == '32' || p.empty())
Use ' ' or 32 or std::isspace instead. IMO the last one is the best.
In the same condition
p[ct] == '32'
is always out of bounds: ct is equal to p.length(). (Credits to Some programmer dude, who mentioned this in the comments!) The variable ct is also redundant, since std::string knows it's length. I suggest to use std::string::back() to access the last character and reorder the condition as so:
if (p.empty() || std::isspace(p.back()))
The algorithm to this program is on the right track.
However, there is a few issues..
The for loop goes out of index. A way to solve this issue is substracting the size -3. Something like this.
for (int i=0; i<s.size()-3; i++) {
}
I do not suggest using other variables as counters like ct. In this case ct can reach an index out of bound error by using p[ct] inside the for loop.
Creating a string and using append() function will be a better solution. In this case, we iterate through each character in the string and if we find "WUB" then we append a " ". Otherwise, we append the character.
I highly recommend to write the first if() statement using substring() from C++.
This makes the code easier to read.
Substring creates and returns a new string that starts from a specific position to an ending position. Here is the syntax
syntax: substr(startingIndex, endingIndex);
endingIndex is exclusive
#include <string>
#include <iostream>
int main() {
string s, p;
cin >> s;
for(int i=0;i<s.size()-3;i++) {
if (s.substr(i, i+3) == "WUB") {
p.append(" ");
} else {
p.append(s.substr(i,i+1));
i++;
continue;
}
i+=3;
}
}

Replacing all spaces in a string with '%20' (C++)

Having some trouble understanding parts of the code; the output I am getting is also wrong. The problem is to replace all spaces in a string with '%20'. The full code is shown below; it compiles but doesn't run exactly as it should.
#include <iostream>
#include <string>
using namespace std;
void replaceSpaces(string str){
//Getting the length of the string, counting the number of spaces
int strLen = str.length();
int i, count = 0;
for (i = 0; i <= strLen; i++) {
if(str[i]==' ')
count++;
}
//Determining the new length needed to allocate for replacement characters '%20'
int newLength = strLen + count * 2;
str[newLength] = '\0';
for (i = strLen - 1; i >= 0; i--) {
if (str[i] == ' ') {
str[newLength - 1] = '0';
str[newLength - 2] = '2';
str[newLength - 3] = '%';
newLength = newLength - 3;
}
else {
str[newLength - 1] = str[i];
newLength = newLength -1;
}
}
cout << str <<endl;
}
int main() {
string str = "hello jellybean hello";
replaceSpaces(str);
return 0;
}
I am probably missing something obvious, but when allocating for the new string length in this line:
int newLength = strLen + count * 2;
Here we are multiplying the number of spaces by 2, but if we are trying to replace all spaces with '%20', why not multiply it by 3?
str[newLength] = '\0';
Does this line indicate that the position past the last character in the string is assigned a null space?
Am also confused about the else statement.
else {
str[newLength - 1] = str[i];
newLength = newLength -1;
}
Not sure if I completely understand the circumstance when this would be executed.
When the functions are compiled and run, if
string str = "hello jellybean hello";
the expected output would be hello%20jellybean%20hello, except the output I am getting is hello%20jellybean%20h.
In terms of time complexity, since there are two independent for loops, would the time complexity be O(n)?
I know I'm asking a lot of different questions, many thanks in advance for any answers!
This is wrong:
str[newLength] = '\0';
std::string objects maintain their NUL terminator internally based on their size. You want
str.resize(newLength);
instead.
int newLength = strLen + count * 2;
says to allocate space (later), equal to the length of the string, plus the number of whitespaces found multiplied by two, which makes sense.
For example: so glad to help, should use the slots that the whitespaces live into for the % and they will need two more slots each, for the 20 part of the replacement that will come into play.
This is WRONG:
str[newLength] = '\0';
can't you see? You access memory out of the bounds of your string. You act like you actually allocated space equal to the newLength, but you haven't that anywhere in the code yet.
Out of bounds accessing result in Undefined Behavior and that's bad.
The else statement is just for copying non-whitespace characters, but you should already given up on that code (if it's not yours) and start from scratch or/and take a sneak peak at: Encode/Decode URLs in C++.
As for the wrong result, you should know by reaching that point of that answer, that this is expected.
Trying to do the modification in place is tricky. It's much easier to create a new string:
std::string new_string;
for (int i = 0; i < str.length(); ++i) {
if (str[i] == ' ')
new_string += "%20";
else
new_string += str[i];
}
return new_string;
or, if you like range-for:
std::string new_string;
for (char ch : str) {
if (ch == ' ')
new_string += "%20";
else
new_string += ch;
}
return new_string;
You can change that string argument in function to reference, then there wont be any need for new string, at other part of the code, you can use insert function to add '2' and '0', and you only need to convert space to '&'.
void replaceSpaces(string &str) {
size_t strLen = str.length();
for (int i = 0; i < strLen; i++) {
if (str[i] == ' ') {
str[i] = '%';
str.insert(str.begin() + i + 1, '2');
str.insert(str.begin() + i + 2, '0');
strLen += 2;
}
}
}
This is easy; replace examplestring with your string in the code, and use as you would:
#include <iostream> //debug output
#include <string>
using std::string;
using std::cout;
using std::endl;
//the string to convert
string examplestring = "this is the example string for spaces into %20";
int main()
{
int countspaces = 0; //its faster to fill a known size
for (auto &x : examplestring)if (x == ' ')countspaces++; //counts spaces
string newstring; //declare new string
newstring.resize(examplestring.size() + (countspaces*3)); //pre-set size to make it run faster
int newstringiterator = 0; //keep track of new string location
//if ' '(space), place %20 in newstring and add 3 to iteration
//else just place the letter and iterate
for (int i=0;i<examplestring.size();i++)
{
if (examplestring[i] == ' ')
{
newstring.insert(newstringiterator, "%20");
newstringiterator += 3;
}
else newstring[newstringiterator++] = examplestring[i];
}
//final newstring is the original with %20 instead of spaces.
cout << newstring << endl;
system("PAUSE"); //to read console output
return 0; //return to zero
}
This will output newstring, which is the old string with '%20' instead of spaces.

Remove additional spaces C++

I have the homework assignment where I should remove all leading spaces in the char array. If there is a space in the beginning of the array, it should be deleted too. Also, I have to take care about an empty array. I can't use <cstring> and <string.h>. Also, I can't create any additional arrays. I googled a lot, but I have no idea how to solve this problem. Here is what I have so far.
void clean(char* tab)
{
//char *p = tab;
for (int i = 0; i <= sizeof(tab); i++)
{
if ((tab[i] = ' ') && (tab[i + 1] = ' '))
{
tab[i] = tab[i + 1];
}
}
}
int main()
{
char tab1[] = " h qa w e ";
cout << tab1 << endl;
clean(tab1);
cout << tab1 << endl;
}
Will be very gratefull if someone can give me a hint how to figure it out.
Regard each space as a "free space" in the array, one which you should fill.
You can maintain two pointers to different places in your char array. One to the first available space, another to the currently character you examine.
You only need to figure out these questions:
How would you fill up the spaces?
How would you know when you're done?
You have a few separate problems here.
One is that you've mis-typed your comparisons: if ((tab[i] = ' ') && (tab[i+1] = ' ')) should undoubtedly be: if ((tab[i] == ' ') && (tab[i+1] == ' ')).
Second, you're using sizeof(a pointer), where you really wanted strlen(a pointer) (or at least something that works similarly, scanning for a NUL character that signals the end of the string).
Third, the algorithm your loop implements doesn't seem to reflect the intent of removing leading spaces very well (if at all). I'd scan for the first thing that wasn't a space, then copy the remainder of the string so it starts at the beginning of the string.
[Sorry, but I'm not going to post actual code or much more detail than that for a homework assignment.]
Those codes work with absolutely no dependency on nor and they don't allocate any temporary buffers. Note that a home made copy function must be defined to copy string content when a space is being removed.
Note that this code is easy to understand but may/should be optimized to minimize copies if performance is a requirement.
As it's unclear what you exactly want...
This one removes all spaces in the string.
#include <iostream>
void copy( char* to, char* from )
{
int i = 0;
while ( true ) // risky in case char* has no EOS character!
{
to[i] = from[i];
if ( from[i] != '\0' )
++i;
else
break;
}
}
void clean( char* tab )
{
int i = 0;
while ( tab[i] != '\0' )
{
if ( tab[i] == ' ')
{
copy( tab + i, tab + i + 1 );
// do not increment i, to test tab[i] that was newly copied from tab[i+1]
}
else
{
++i;
}
}
}
int main() {
char tab1[] = " h qa w e ";
std::cout << tab1 << std::endl;
clean(tab1);
std::cout << tab1 << std::endl;
}
https://ideone.com/Yv4aqL
If you only want to remove leading spaces, it's even easier, just change the clean function:
void clean( char* tab )
{
int i = 0;
while ( tab[i] == ' ' )
{
copy( tab + i, tab + i + 1 );
}
}
https://ideone.com/RIAsGt