c++ splitting a string and getting the part coming after whitespace - c++

I need to do a string splitting such that If I have a string like below
string foo="thisIsThe Test Input";
I need to get the part coming after the multiple or single withspace. In this case I need to get "Test Input". I know that I can get the first part by :
int index=foo.find(' ');
string subString=foo.substr(0,index);
But I dont know how could I do what I want. Is there anyone to help me ?

std::find_first_not_of accepts a position argument which indicates where to start searching. So use that to find the first non-space, starting at the first space.
int index=foo.find(' ');
index=foo.find_first_not_of(' ', index);
string subString=foo.substr(index);

You can also copy to a new string char by char eliminating any spaces too.
It will make it easier to use foo.find(' ');
Eliminate all whitespace
string foo = "thisIsThe Test Input";
string bar[100];
for (int i = 0; i < foo.length(); i++)
{
if (foo[i] != ' ')
bar[i] = foo[i];
}
for (int i = 0; i < sizeof(bar) / sizeof(bar[i]); i++)
cout << bar[i];
Keep one space between each term:
string foo = "thisIsThe Test Input";
string bar[100];
for (int i = 0; i < foo.length(); i++)
{
if (foo[i] != ' ')
bar[i] = foo[i];
else if (foo[i + 1] != ' ' && foo[i] == ' ')
bar[i] = ' ';
}
for (int i = 0; i < sizeof(bar)/sizeof(bar[i]); i++)
cout << bar[i];

Related

How to get the number of words from a line from a text file

I am trying to get the number of words from a line in a text file. I used .getline() in order to extract a line from the entire text file. The code is:
#include <iostream>
#include <iomanip>
#include <fstream>
int main()
{
char const* filename = "duck.txt";
std::ifstream ifs{ filename };
constexpr size_t MAX_LINE_LEN{ 2048 };
char line[MAX_LINE_LEN];
int lineCount = 0;
int totalWordCount = 0;
int totalByteCount = 0;
while (ifs.getline(line, MAX_LINE_LEN-1))
{
int lineWord = 0;
char* q = &line[0];
if (ifs.eof())
{
lineCount--;
totalByteCount--;
}
while (*q != '\0')
{
q++;
totalByteCount++;
}
totalByteCount++;
if (*q == '\0')
{
lineCount++;
}
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ')
{
i++;
}
j = i;
while (line[j] != ' ')
{
j++;
}
lineWord++;
j = i;
}
totalWordCount += lineWord;
}
std::cout << "Total Lines: " << lineCount << '\n' << "Total Words: " << totalWordCount << '\n' << "Total Bytes: " << totalByteCount;
}
But the only important part is:
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ')
{
i++;
}
j = i;
while (line[j] != ' ')
{
j++;
}
lineWord++;
j = i;
}
totalWordCount += lineWord;
I'm trying to read the line character by character until I reach a non-whitespace character after which I'll assign that subscript to i. Then, I'll set j to the subscript of the first whitespace encountered after the character of line[i]. If j finds a whitespace, then there is a word. If j reaches '\0', then the line has ended and I end the while loop. When I try to compile and run this, the compiler just displays nothing. What am I doing wrong? Also, I can't add anymore header files
In your original code:
int i = 0;
int j = 0;
while (line[i] != '\0' && line[j] != '\0')
{
while (line[i] == ' ') // you do not check for end of string character?
{
i++;
}
j = i; // here i = j = beginning of word.
// since you rewind to beginning of the word below,
// your program keeps repeating this loop endlessly
while (line[j] != ' ') // this loop could easily run for quite a while.
// until it seg-faults
{
j++;
}
lineWord++;
j = i; // BUG Here! you're 'rewinding' j to to beginning of the word.
// you loop back and keep counting the same word over and over.
}
totalWordCount += lineWord;
What is the purpose of i and j ? Wouldn't the code be simpler and easier to read and maintain using a single pointer? Or a single index? This kind of algorithm is where a pointer would excel, though, as the only arithmetic pointer operation needed is increment.
As in:
const char* p = &line[0];
int word_count = 0;
line[MAX_LINE_LEN - 1] = 0; // making sure the code below stays within boundaries.
for(;;)
{
// skip to next word
while (*p && *p == ' ') ++p; // stay within the string by testing for zero.
if (!*p)
break; // done!
// since p now points to the beginning of a word, we've got one
++word_count;
// skip to end of word
while (*p && *p != ' ') ++p;
}
This is all fine, for most cases but there could be some exceptional typos in the text, like "hello, world !", where the last punctuation would be counted as a word. There is also the problem of horizontal tabs, which could also be counted erroneously as words.
To cover these cases, you should test for valid characters for words, instead of for space, which is a rather vague concept.
Without using library calls, You'd need to define what constitutes spaces and punctuation, either with a constant, or with a function.
Substituting the test for space with a more targeted test, using isalnum() to check for alpha or numeric characters:
for(;;)
{
// skip to next word
while (*p && !std::isalnum(*p & 0xFF)) ++p;
if (!*p)
break; // done!
++word_count;
// skip to end of word
while (*p && std::isalnum(*p & 0xFF)) ++p;
}
Beware of function of the isalnum(), isalpha().. family, they define their input as an int, the mask ensures that characters in the 128-255 range are not sign-extended aand are passed correctly as positive values.

What is wrong with my program to find the longest word in a sentence?

#include <iostream>
using namespace std;
int main() {
char a[101]{0};
cin>>a;
cin.getline(a,101);
cin.ignore();
int currLen{0};
int maxLen{0};
int startInd{-1};
int endInd{-1};
for(int i=0; i<101; i++) {
if(a[i]!=' ' ) {
++currLen;
} else if(a[i]==' '||a[i]=='\0') {
if(currLen>maxLen) {
maxLen=currLen;
startInd=i-currLen;
endInd=i-1;
}
if(a[i]=='\0')
break;
currLen=0;
}
}
cout<<maxLen<<endl;
if(startInd==-1)
cout<<-1;
else
for(int i=startInd; i<=endInd; i++)
cout<<a[i];
return 0;
}
If I take an input here, for example, "My name is Manav Kampani"
It will output 5
Manav instead of 7
Kampani
But if I write "My name is Manav Kampani ", with space after the last word
than it is considering Kampani too printing Kampani.
Also when I input "Kampani Manav is my name" then too it's displaying the wrong output. That means it is not considering the first word of the sentence.
if(a[i]!=' ' )
{
++currLen;
}
else if(a[i]==' '||a[i]=='\0')
{
....
}
Consider the case of a[i] == 0. Which of these if-statements will apply.
Answer: the first one. Which means you'll never look at the final word in the string. You also don't exit at the end of the string, but instead loop through whatever is in your string all the way out to character 101.
As a general structure, be very, very careful with this:
if (condition)
else if (condition)
// without a final else section
If you do that, you need to think about what you're doing. In this particular case, you can have:
if (a[i] != 0 && a[i] != ' ')
else
It may not solve all your issues, but it should solve some.
A nice sliding window pattern implementation.
You have 3 problems in your code
You must not write cin >> a;
You must not write cin.ignore();
You need to modify your if statement like so: if (a[i] != ' ' && a[i] != '\0') Otherwise you will not detect the last word.
Your complete working code with that minor fixes will lokk like that.
int main()
{
char a[101]{ 0 };
//cin >> a;
cin.getline(a, 101);
//cin.ignore();
int currLen{ 0 };
int maxLen{ 0 };
int startInd{ -1 };
int endInd{ -1 };
for (int i = 0; i < 101; i++)
{
if (a[i] != ' ' && a[i] != '\0')// Add comparison
{
++currLen;
}
else if (a[i] == ' ' || a[i] == '\0')
{
if (currLen > maxLen)
{
maxLen = currLen;
startInd = i - currLen;
endInd = i - 1;
}
if (a[i] == '\0')
break;
currLen = 0;
}
}
cout << maxLen << endl;
if (startInd == -1)
cout << -1;
else
for (int i = startInd; i <= endInd; i++)
cout << a[i];
return 0;
}
Additionally. You should not use C-Style arrays in C++. And please use std::string
There is a couple of things here:
1- You don't need to do a cin>>a this is actually consuming the first word, and afterwards the content is overrided by cin.getline(). So removing the firsst cin>>ayou'll be fine.
2- The last word is not read because there isn't any if condition that matches the condition aka.
if(a[i]!=' ' ) case of not a space
//not end of word
else if(a[i]==' '||a[i]=='\0') case of space or null
//end of word
So your last character is not a space nor null, that means you don't detect the last word.

Connecting n pipes linux

I was studying about pipes recently and saw this answer:
Connecting n commands with pipes in a shell?
I was intrigued about it and tried to make like a "dynamic" one, in which I introduce a string with n process and then execute the n process (i.e ls | sort). I was trying to tokenize, save in an array, but it did not work. Here is my code of my "tokenizer":
int main()
{
char str[] = "ls | sort";
int length = (sizeof(str) / sizeof(*str))-1;
int sizeCMD = 1; //If the string has zero pipe, it means it has at least 1 process
vector<char> tempV;
for (int i = 0; i < length; i++)
{
if (str[i] == '|')
{//If the string has one |, it means it has at least 2 process.
sizeCMD++;
}
tempV.push_back(str[i]);//I was going to do something else with this, but I forgot.
//cout<<i<<" "<<tempV.at(i)<<endl;
}
int j = 0;//Current position of the cmd
string comLetter = "";//it will save every single letter in certain conditions
string comLine = "";//it will save all the characters of comLetter in certain conditions
struct command cmd[sizeCMD];
const char *ls[2];
const char *sort[2];
const char *toCChar;
for (int i = 0; i < tempV.size(); i++)
{
if (tempV.at(i) != ' ' && tempV.at(i) != '|')
{//comLetter will only save characters that are not equal to blank or |.
//cout<<tempV.at(i);
comLetter += tempV.at(i);
//cout<< comLetter <<endl;
}
if (tempV.at(i) == ' ' || i == tempV.size() - 1)
{//comLine will save everything of comLetter when it detects a blank or the very end
//cout<<comLetter<<endl;
comLine = comLetter;
comLetter = "";
}
if (tempV.at(i) == '|' || i == tempV.size() - 1)
{//cmd will save everything of comLine when it detects a | or the very end.
//cout<<j<<endl;
cout << "." << comLine << "." << endl;
//cout<<i<<endl;
//cout<<toCChar<<endl;
if(comLine == "ls"){
toCChar = comLine.c_str();
ls[0] = toCChar;
ls[1] = 0; //THIS IF
cmd[0] = {ls}; //WORKS
}
if(comLine == "sort"){
sort[0] = "sort";
sort[1] = 0; //THIS IF
cmd[1] = {sort}; //WORKS
}
/*const char *ls[2];
cout<<toCChar<<endl;
ls[0] = toCChar;
ls[1] = 0;
cout<< *ls[0] << " - "<< endl;
cmd[j] = {ls};
//cout << cmd << endl;
comLine = "";*/
j++; //The position will move by one.
}
}
return fork_pipes(sizeCMD, cmd);
}
Everything made sense to me, until I found out that const char* can't be temporal as it needs the data, so I need to create 2 const char* arrays for 2 commands. That's why I've two arrays: *sort[] and *ls[], for sort and ls.
Also, I was wondering, why these lines get "ignored":
toCChar = comLine.c_str();
ls[0] = toCChar;
I'm struggling right now, if someone could please help/guide me on how to do it, I would appreciate that.

Replacing all spaces in a string with '%20' (C++)

Having some trouble understanding parts of the code; the output I am getting is also wrong. The problem is to replace all spaces in a string with '%20'. The full code is shown below; it compiles but doesn't run exactly as it should.
#include <iostream>
#include <string>
using namespace std;
void replaceSpaces(string str){
//Getting the length of the string, counting the number of spaces
int strLen = str.length();
int i, count = 0;
for (i = 0; i <= strLen; i++) {
if(str[i]==' ')
count++;
}
//Determining the new length needed to allocate for replacement characters '%20'
int newLength = strLen + count * 2;
str[newLength] = '\0';
for (i = strLen - 1; i >= 0; i--) {
if (str[i] == ' ') {
str[newLength - 1] = '0';
str[newLength - 2] = '2';
str[newLength - 3] = '%';
newLength = newLength - 3;
}
else {
str[newLength - 1] = str[i];
newLength = newLength -1;
}
}
cout << str <<endl;
}
int main() {
string str = "hello jellybean hello";
replaceSpaces(str);
return 0;
}
I am probably missing something obvious, but when allocating for the new string length in this line:
int newLength = strLen + count * 2;
Here we are multiplying the number of spaces by 2, but if we are trying to replace all spaces with '%20', why not multiply it by 3?
str[newLength] = '\0';
Does this line indicate that the position past the last character in the string is assigned a null space?
Am also confused about the else statement.
else {
str[newLength - 1] = str[i];
newLength = newLength -1;
}
Not sure if I completely understand the circumstance when this would be executed.
When the functions are compiled and run, if
string str = "hello jellybean hello";
the expected output would be hello%20jellybean%20hello, except the output I am getting is hello%20jellybean%20h.
In terms of time complexity, since there are two independent for loops, would the time complexity be O(n)?
I know I'm asking a lot of different questions, many thanks in advance for any answers!
This is wrong:
str[newLength] = '\0';
std::string objects maintain their NUL terminator internally based on their size. You want
str.resize(newLength);
instead.
int newLength = strLen + count * 2;
says to allocate space (later), equal to the length of the string, plus the number of whitespaces found multiplied by two, which makes sense.
For example: so glad to help, should use the slots that the whitespaces live into for the % and they will need two more slots each, for the 20 part of the replacement that will come into play.
This is WRONG:
str[newLength] = '\0';
can't you see? You access memory out of the bounds of your string. You act like you actually allocated space equal to the newLength, but you haven't that anywhere in the code yet.
Out of bounds accessing result in Undefined Behavior and that's bad.
The else statement is just for copying non-whitespace characters, but you should already given up on that code (if it's not yours) and start from scratch or/and take a sneak peak at: Encode/Decode URLs in C++.
As for the wrong result, you should know by reaching that point of that answer, that this is expected.
Trying to do the modification in place is tricky. It's much easier to create a new string:
std::string new_string;
for (int i = 0; i < str.length(); ++i) {
if (str[i] == ' ')
new_string += "%20";
else
new_string += str[i];
}
return new_string;
or, if you like range-for:
std::string new_string;
for (char ch : str) {
if (ch == ' ')
new_string += "%20";
else
new_string += ch;
}
return new_string;
You can change that string argument in function to reference, then there wont be any need for new string, at other part of the code, you can use insert function to add '2' and '0', and you only need to convert space to '&'.
void replaceSpaces(string &str) {
size_t strLen = str.length();
for (int i = 0; i < strLen; i++) {
if (str[i] == ' ') {
str[i] = '%';
str.insert(str.begin() + i + 1, '2');
str.insert(str.begin() + i + 2, '0');
strLen += 2;
}
}
}
This is easy; replace examplestring with your string in the code, and use as you would:
#include <iostream> //debug output
#include <string>
using std::string;
using std::cout;
using std::endl;
//the string to convert
string examplestring = "this is the example string for spaces into %20";
int main()
{
int countspaces = 0; //its faster to fill a known size
for (auto &x : examplestring)if (x == ' ')countspaces++; //counts spaces
string newstring; //declare new string
newstring.resize(examplestring.size() + (countspaces*3)); //pre-set size to make it run faster
int newstringiterator = 0; //keep track of new string location
//if ' '(space), place %20 in newstring and add 3 to iteration
//else just place the letter and iterate
for (int i=0;i<examplestring.size();i++)
{
if (examplestring[i] == ' ')
{
newstring.insert(newstringiterator, "%20");
newstringiterator += 3;
}
else newstring[newstringiterator++] = examplestring[i];
}
//final newstring is the original with %20 instead of spaces.
cout << newstring << endl;
system("PAUSE"); //to read console output
return 0; //return to zero
}
This will output newstring, which is the old string with '%20' instead of spaces.

Counting words in a string

I am learning C++ on my own. I have written this program to count the number of words in a string. I know it's not the best way to do this, but this was what I could think of.
I am using spaces to count the number of words. Here is the problem.
countWords(""); // ok, 'x.empty()' identifies it as an empty string.
countWords(" "); // 'x.empty()' fails, function returns 1.
p.s I want this program to not count symbols like, "!","?" as words. Here is my code:
#include <iostream>
#include <string>
int countWords(std::string x);
int main() {
std::cout << countWords("Hello world!");
}
int countWords(std::string x) {
if(x.empty()) return 0; // if the string is empty
int Num = 1;
for(unsigned int i = 0; i < x.size(); i++) {
// if there is a space in the start
if(x[0] == ' ') continue;
// second condition makes sure that i don't count 2 spaces as 2 words
else if(x[i] == ' ' && x[i - 1] != ' ') Num++;
}
return Num;
}
Your function can be reduced to this:
int countWords(std::string x) {
int Num = 0;
char prev = ' ';
for(unsigned int i = 0; i < x.size(); i++) {
if(x[i] != ' ' && prev == ' ') Num++;
prev = x[i];
}
return Num;
}
Here is a demo
Edit: To follow up comment:
Here is a simple way to replace other characters with ' ', thought there might be a build method for this:
void replace(std::string &s, char replacer, std::set<char> &replacies)
{
for (int i=0; i < s.size(); i++)
if (replacies.count(s[i])) s[i] = replacer;
}
demo
The problem with your answer is that you are counting the number of words after which there is a ' ' sign. I believe you start with Num = 1 because you won't be counting the last word. Hovewer that only occurs when the string youre analysing does not end with ' '. Otherwise you will have 1 more word counted. The easiest way to fix this is to add
if(x.back() == ' ')
Num--;
right before returning the answer.
Your solution is insufficient. It will fail when applied with:
Leading spaces
Trailing spaces
Only spaces
Other forms of whitespace
You need to rethink how your algorithm should work as you simply need a more sophisticated method to cover all the use cases.
Or you could avoid reinventing the wheel and use what the standard library already provides, e.g.:
int countWords(const std::string& s) {
std::istringstream iss{s};
return std::distance(std::istream_iterator<std::string>{iss},
std::istream_iterator<std::string>{});
}
Here std::istringstream and std::istream_iterator is used to tokenize the string, and std::distance is used to get the number of tokens extracted.
I found the best using string stream:
int Count(const std::string &string)
{
stringstream ss(string);
char cmd[256] = {0};
int Words = 0;
while(true)
{
ss >> cmd;
if(!ss)
break;
Words++;
}
return Words;
Input: " Hello my dear friend "
Output: 4
It will not fail even if appiled with:
Leading spaces
Trailing spaces
Only spaces
Other forms of whitespace
So I tried on my own, after reading some useful comments. Here is my solution. I have checked my program for worst case scenario. If any of you, can find any cases for which this program doesn't work, let me know, so that I can work and improve it.
And just to be clear, we don't want symbols like, "," , "!" , "?", "." , "\n" to be counted as words. But obviously, "I" should be counted as word, as we consider it in the language. I have made sure of all this by replacing them with spaces. Let me know if I missed something.
#include <iostream>
#include <string>
void replace(std::string& str, char x, char y);
int countWords(std::string x);
int main(){
std::cout<<countWords(" \n \t Hello, world ! ");
}
void replace(std::string& str, char x, char y){
for(unsigned int i=0;i<str.size();i++){
if(str[i]==x) str[i]=y;
}
}
int countWords(std::string x){
replace(x,',',' ');
replace(x,'.',' ');
replace(x,'!',' ');
replace(x,'?',' ');
replace(x,'(',' ');
replace(x,')',' ');
replace(x,'\n',' ');
replace(x,'\t',' ');
replace(x,'"',' ');
if(x.empty()) return 0;
int Num=1;
for(unsigned int i=1;i<x.size();i++){
if(x[i]==' ' && x[i-1]!=' ') Num++;
}
if(x.back() == ' ') Num--;
return Num;
}
This is simple and fast on my machine. It iterates over the string, using a bool to track
whether it's inside a word or not, and whitespace characters as word delimiters. I tested with the isspace() library function but this switch statement was slightly faster.
int countwords(const std::string &str)
{
int count = 0;
bool in_word = false;
for (char ch : str) {
switch (ch) {
case '\t': case '\n': case '\v': case '\f': case '\r': case ' ':
in_word = false;
break;
default:
if (!in_word) {
in_word = true;
++count;
}
break;
}
}
return count;
}
This is easy to extend or modify for different word delimiters. Here is a version that considers any non-alphabetical character as a delimiter. Changing the !isalpha() call to isspace() will give the same results as the code above.
int countwords(const std::string &str)
{
int count = 0;
bool in_word = false;
for (char ch : str) {
if (!isalpha(ch)) { // non-alpha chars are word delimiters
in_word = false;
} else if (!in_word) {
in_word = true;
++count;
}
}
return count;
}
int countwords(std::string x)
{
int i, count = 0;
for (i = 0; i < x.size(); i++)
if (x[i] == ' ')
count++; //just count empty spaces
count++; //count++ is same as count+1,so there will be count+1 words in string
if (x.size() == 0)
count = 0;
return count;
}
Add the following lines to the code
int Num;
if(x[0] == ' ') Num = 0;
else Num = 1;
this would eliminate the count of a blank in the start of the string
#include <iostream>
#include <string>
int countWords(std::string x);
int main() {
std::cout << countWords("Hello world!");
}
int countWords(std::string x) {
if(x.empty()) return 0; // if the string is empty
int Num;
if(x[0] == ' ') Num = 0;
else Num = 1;
for(unsigned int i = 0; i < x.size(); i++) {
// if there is a space in the start
if(x[0] == ' ') continue;
// second condition makes sure that i don't count 2 spaces as 2 words
else if(x[i] == ' ' && x[i - 1] != ' ') Num++;
}
return Num;
}