Splitting string into a vector<string> of words - c++

From Accelerated C++(book), I found this code which is identical program, but the processed in program itself is different, and confused me on some part.
The code below, well, obviously it will output each word one-by-one(by loops) based on user input after the user included end-of-file, then, end the program.
int main()
{
string s;
while (cin >> s)
cout << s << endl;
return 0;
}
Unlike code above, this one will store each word in a vector, then use index i and j to detect the non-whitespace character, and the real question is, I don't understand how it happens with the vector.
What is whitespace in vector? An element?
At first, I thought the program will proceed through each character, because I thought the whitespace is character(which i and j functionality is for), then, the book come and said it proceed through each word, I don't know how to test this myself, like I can see how the inner process in the compiler itself..
vector<string> split(const string& s)
{
vector<string> ret;
typedef string::size_type string_size;
string_size i = 0;
// invariant: we have processed characters [original value of i, i)
while (i != s.size())
{
// ignore leading blanks
// invariant: characters in range [original i, current i) are all spaces
while (i != s.size() && isspace(s[i]))
++i;
// find end of next word
string_size j = i;
// invariant: none of the characters in range [original j, current j)is a space
while (j != s.size() && !isspace(s[j]))
j++;
// if we found some nonwhitespace characters
if (i != j) {
// copy from s starting at i and taking j - i chars
ret.push_back(s.substr(i, j - i));
i = j;
}
}
return ret;
}
int main() {
string s;
// read and split each line of input
while (getline(cin, s)) {
vector<string> v = split(s);
// write each word in v
for (vector<string>::size_type i = 0; i != v.size(); ++i)
cout << v[i] << endl;
}
return 0;
}

The code you posted above does not split a line of text into words, based on whitespace, it instead splits a line into characters. However, that's if the code was actually compilable and not missing any necessary braces ({, }). EDIT: Actually whether it splits words or individual characters depends on where the braces go, bottom line is that the code doesn't compile.
Here is a fixed version of the code that splits each word, rather than each character, by simply moving the last if statement in split outside of it's immediate while block:
#include <iostream>
#include <vector>
using namespace std;
vector<string> split(const string& s)
{
vector<string> ret;
typedef string::size_type string_size;
string_size i = 0;
// invariant: we have processed characters [original value of i, i)
while (i != s.size()) {
// ignore leading blanks
// invariant: characters in range [original i, current i) are all spaces
while (i != s.size() && isspace(s[i]))
++i;
// find end of next word
string_size j = i;
// invariant: none of the characters in range [original j, current j)is a space
while (j != s.size() && !isspace(s[j]))
j++;
// if we found some nonwhitespace characters
if (i != j) {
// copy from s starting at i and taking j - i chars
ret.push_back(s.substr(i, j - i));
i = j;
}
}
return ret;
}
int main() {
string s;
// read and split each line of input
while (getline(cin, s)) {
vector<string> v = split(s);
// write each word in v
for (vector<string>::size_type i = 0; i != v.size(); ++i)
cout << v[i] << endl;
}
return 0;
}
What happens to the string passed to split is:
While still characters in the string (while (i != s.size()))
While we're reading a space from the string while (i != s.size() && isspace(s[i]))
Increment the counter until we get to the start of a word (++i)
Set the end of the word as the start of the word (string_size j = i)
While we're still inside this word and not up to a space (while (j != s.size() && !isspace(s[j])))
Increment the counter indicating the end of the word (j++)
If there are some non-whitespace characters - end is greater than the start (if (i != j))
Create a sub-string from the start point to the end point of the word (s.substr(i, j - i)), and add that word to the vector (ret.push_back(..)).
Rinse and repeat.

If you are just splitting based on space, then you don't need write a custom method. STL has options for you.
std::string line;
std::vector<std::string> strings;
while ( std::getline(std::cin, line))
{
std::istringstream s ( line);
strings.insert(strings.end(),
std::istream_iterator<std::string>(s),
std::istream_iterator<std::string>());
}
// For simplicity sake using lambda.
std::for_each(strings.begin(), strings.end(), [](const std::string& str)
{
std::cout << str << "\n";
});

Related

How to fix the random character outputs in C++?

When I get string input by using char arrays and I cycle through them with a for loop, my code always has random character outputs that should not be there.
I have tried debugging my code, by checking the output at various stages, but I can't find the reason for what is happening.
int k, s, counter = 0;
char word[21];
std::cin>>k;
std::cin.getline(word,21);
for (int i = 0; word[i] != ' '; i++)
{
s = 3*(i + 1) + k;
std::cout<<s;
for (int k = 0; k < s; k++)
{
word[i]--;
if (word[i] < 'A')
word[i] = 'Z';
}
std::cout<<word[i];
}
When I type in 3 to get the value of k, I already get the output "URORIFCFWOQNJCEBFVSPMJNKD" when I should not get any output.
The problem is that the buffer is not flushed before using getline.
Because of that when you hit enter after entering a number, that enter (character '\n') is passed to getline(), and at that point getline ends his work by leaving the word empty.
The solution to this is simple: Flush the buffer before getline.
Here is the complete solution:
#include <iostream>
int main() {
int k, s, counter = 0;
char word[21];
std::cin>>k;
// Clear the buffer
std::cin.clear();
while (std::cin.get() != '\n')
{
continue;
}
std::cin.getline(word,21);
std::cout<<"TEST>"<<word<<"<TEST"<<std::endl<<std::flush;
for (int i = 0; word[i] != ' '; i++)
{
s = 3*(i + 1) + k;
std::cout<<s;
for (int k = 0; k < s; k++)
{
word[i]--;
if (word[i] < 'A')
word[i] = 'Z';
}
// Use std::flush to forcefully print current output.
std::cout<<word[i]<<std::flush;
}
}
Notes:
I've used the buffer clearing mechanism described there. You might use another, but the idea is the same
If you comment the 4 lines of that buffer clearing part, you'll notice that as soon as you type "3" and hit enter, you see an output like "TEST><TEST" which means that the word inside it, is empty.
Consider using std::flush while using cout if you want forcefully print the output before the for cycle ends.
std::cin >> k; is reading an integer only. It does not read the trailing line break. The documentation of the >> operator says
The extraction stops if one of the following conditions are met:
a whitespace character [...] is found. The whitespace character is not extracted.
As Just Shadow pointed out this line break is causing the getline() call to return an empty string.
You can ignore any number of line breaks by calling
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
BTW: Looking at your outer for loop I would be concerned that you might read beyond the end of word if the string doesn't contain any whitespaces. The following solution fixes that as well:
#include <iostream>
#include <limits>
int main() {
int k, s, counter = 0;
char word[21];
std::cin >> k;
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.getline(word, 21);
for (int i = 0; i < sizeof(word) && word[i] != ' '; i++)
{
s = 3 * (i + 1) + k;
std::cout<<s;
for (int k = 0; k < s; k++)
{
word[i]--;
if (word[i] < 'A')
word[i] = 'Z';
}
std::cout << word[i];
}
}

C++ string parser issues

Ok, so I'm working on a homework project in C++ and am running into an issue, and can't seem to find a way around it. The function is supposed to break an input string at user-defined delimiters and store the substrings in a vector to be accessed later. I think I got the basic parser figured out, but it doesn't want to split the last part of the input.
int main() {
string input = "comma-delim-delim&delim-delim";
vector<string> result;
vector<char> delims;
delims.push_back('-');
delims.push_back('&');
int begin = 0;
for (int i = begin; i < input.length(); i++ ){
for(int j = 0; j < delims.size(); j++){
if(input.at(i) == delims.at(j)){
//Compares chars in delim vector to current char in string, and
//creates a substring from the beginning to the current position
//minus 1, to account for the current char being a delimiter.
string subString = input.substr(begin, (i - begin));
result.push_back(subString);
begin = i + 1;
}
The above code works fine for splitting the input code up until the last dash. Anything after that, because it doesn't run into another delimiter, it won't save as a substring and push into the result vector. So in an attempt to rectify the matter, I put together the following:
else if(input.at(i) == input.at(input.length())){
string subString = input.substr(begin, (input.length() - begin));
result.push_back(subString);
}
However, I keep getting out of bounds errors with the above portion. It seems to be having an issue with the boundaries for splitting the substring, and I can't figure out how to get around it. Any help?
In your code you have to remember that .size() is going to be 1 more than your last index because it starts at 0. so an array of size 1 is indexed at [0]. so if you do input.at(input.length()) will always overflow by 1 place. input.at(input.length()-1) is the last element. here is an example that is working for me. After your loops just grab the last piece of the string.
if(begin != input.length()){
string subString = input.substr(begin,(input.length()-begin));
result.push_back(subString);
}
Working from the code in the question I've substituted iterators so that we can check for the end() of the input:
int main() {
string input = "comma-delim-delim&delim-delim";
vector<string> result;
vector<char> delims;
delims.push_back('-');
delims.push_back('&');
auto begin = input.begin(); // use iterator
for(auto ii = input.begin(); ii <= input.end(); ii++){
for(auto j : delims) {
if(ii == input.end() || *ii == j){
string subString(begin,ii); // can construct string from iterators, of if ii is at end
result.push_back(subString);
if(ii != input.end())
begin = ii + 1;
else
goto done;
}
}
}
done:
return 0;
}
This program uses std::find_first_of to parse the multiple delimiters:
int main() {
string input = "comma-delim-delim&delim-delim";
vector<string> result;
vector<char> delims;
delims.push_back('-');
delims.push_back('&');
auto begin = input.begin(); // use iterator
for(;;) {
auto next = find_first_of(begin, input.end(), delims.begin(), delims.end());
string subString(begin, next); // can construct string from iterators
result.push_back(subString);
if(next == input.end())
break;
begin = next + 1;
}
}

Last word in a sentence is not printing after the sentence is reversed

When I am reversing a sentence, below code is unable to print the last word in the sentence after it is reversed.
#include "stdafx.h"
#include "conio.h"
#include <string.h>
#include <iostream>
using namespace std;
int main()
{
char sentence[80]={0};
cout<<"Input the string: ";
cin.getline(sentence,80,'\n');
int length=strlen(sentence);
int check=0;
for(int i=length; i>0; i--)
{
if(sentence[i]!=' ')
{
check++;
}
else
{
for(int j=i; j<(check+i); j++)
cout<<sentence[j+1];
cout<<" ";
check=0;
}
}
return 0;
}
If we enter the Sentence as "My Name is Rakesh" the output it is displaying as "Rakesh is Name". It is not displaying "My".
I have found two mistakes in your code.
Mistake # 01:
You are not iterating over the whole input. You are skipping the first index of the array because of the statement i>0.
Possible Solution:
You should change the condition of loop from i>0 to i>=0 in order to iterate the whole input.
Mistake # 02:
You are not checking the case of first word of the input, which is My in your case. You are printing the word in case the condition of sentence[i]!=' ' gets false so what if sentence[0] is not a space character then the statement check++ will be executed and then the loop will be terminated so the first word of input will not be printed.
Possible Solution:
You should handle this case either by printing the word outside the loop or by adding an if condition in the loop to print the word in case if i == 0 && sentence[i] != ' '. I have updated the code according to the first method and now it works fine.
Updated Code:
int i = 0;
for (i = length; i>=0; i--)
{
if (sentence[i] != ' ')
{
check++;
}
else
{
for (int j = i; j<(check + i); j++)
cout << sentence[j + 1];
cout << " ";
check = 0;
}
}
//Printing the missing word outside the loop
for (int j = i; j<(check + i); j++)
cout << sentence[j + 1];
Hope this helps.
Well,
for(int i=length; i>0; i--)
ends when i=1, and array index starts from 0, so that's ONE of problems here.
Change i>0 to i>=0.
If you begin and end your sentence with a space character it will work. You need to treat the space character and your end of string (null terminator) and your start of string as the same delimiter in this case, so you detect the start of the string, end of the string as well as the spaces in between
Try entering: " My Name is Rakesh " (with a space at the start and end)
to see the scope of your problem...Use a debugger to step through
(You indirectly manage the null termintor - by using strlen; and you capture all the space characters, but what do you do with the string remaining, that is the word delimited by being at the beginning of the String - at index 0)

read string into array

I want to read a string with integers and whitespaces into an array. For example I have a string looks like 1 2 3 4 5, and I want to convert it into an integer array arr[5]={1, 2, 3, 4, 5}. How should I do that?
I tried to delete the whitespaces, but that just assign the whole 12345 into every array element. If I don't everything element will all assigned 1.
for (int i = 0; i < str.length(); i++){
if (str[i] == ' ')
str.erase(i, 1);
}
for (int j = 0; j < size; j++){ // size is given
arr[j] = atoi(str.c_str());
}
A couple of notes:
Use a std::vector. You will most likely never know the size of an input at compile time. If you do, use a std::array.
If you have C++11 available to you, maybe think about stoi or stol, as they will throw upon failed conversion
You could accomplish your task with a std::stringstream which will allow you to treat a std::string as a std::istream like std::cin. I recommend this way
alternatively, you could go the hard route and attempt to tokenize your std::string based on ' ' as a delimiter, which is what it appears you are trying to do.
Finally, why reinvent the wheel if you go the tokenization route? Use Boost's split function.
Stringstream approach
std::vector<int> ReadInputFromStream(const std::string& _input, int _num_vals)
{
std::vector<int> toReturn;
toReturn.reserve(_num_vals);
std::istringstream fin(_input);
for(int i=0, nextInt=0; i < _num_vals && fin >> nextInt; ++i)
{
toReturn.emplace_back(nextInt);
}
// assert (toReturn.size() == _num_vals, "Error, stream did not contain enough input")
return toReturn;
}
Tokenization approach
std::vector<int> ReadInputFromTokenizedString(const std::string& _input, int _num_vals)
{
std::vector<int> toReturn;
toReturn.reserve(_num_vals);
char tok = ' '; // whitespace delimiter
size_t beg = 0;
size_t end = 0;
for(beg = _input.find_first_not_of(tok, end); toReturn.size() < static_cast<size_t>(_num_vals) &&
beg != std::string::npos; beg = _input.find_first_not_of(tok, end))
{
end = beg+1;
while(_input[end] == tok && end < _input.size())
++end;
toReturn.push_back(std::stoi(_input.substr(beg, end-beg)));
}
// assert (toReturn.size() == _num_vals, "Error, string did not contain enough input")
return toReturn;
}
Live Demo
Your code arr[j] = atoi(str.c_str()); is fault. The str is a string, not a char. When you used atoi(const char *), you should give the &char param. So the correct code is arr[j] = atoi(&str[j]). By the way, if you want to change the string to int, you could use the function arr[j] = std::stoul(str). I hope this can help you.
You have modified/parsing the string in one loop, but copying to integer array in another loop. without setting any marks, where all the embedded integers in strings start/end. So we have to do both the actions in single loop.
This code is not perfect, but to give you some idea; followed the same process you followed, but used vectors.
string str = "12 13 14";
vector<int> integers;
int start=0,i = 0;
for (; i < str.length(); i++){
if (str[i] == ' ')
{
integers.push_back(atoi(str.substr(start,i).c_str()));
start = i;
}
}
integers.push_back(atoi(str.substr(start,i).c_str()));

Longest unique substring

This question may seem repeated but I am posting it since I was not able to find the solution that I wanted.
If the input string is "abcaadafghae", I want the first longest unique substring (without repeated characters) which should be "dafgh". I got the below program for finding the length of this substring which is 5, but I want the substring itself as the output.
Thanks in advance.
int lengthOfLongestSubstring(string s) {
int n = s.length();
int i = 0, j = 0;
int maxLen = 0;
bool exist[256] = { false };
while (j < n) {
if (exist[s[j]]) {
maxLen = max(maxLen, j-i);
while (s[i] != s[j]) {
exist[s[i]] = false;
i++;
}
i++;
j++;
} else {
exist[s[j]] = true;
j++;
}
}
maxLen = max(maxLen, n-i);
return maxLen;
}
Assuming that this is a learning exercise, here is how you can modify your algorithm to find the longest unique substring.
Start by identifying the places in your code where you modify maxLen. There are three of them:
The place where you set it to zero,
The place where you set it to max(maxLen, j-i), and
The place where you set it to max(maxLen, n-i)
Replace maxLen with maxStr, and use it as follows:
Replace assignment of zero with an assignment to an empty string,
Replace assignment to max(maxLen, j-i) with a check maxStr.length() < (j-i), and setting maxStr to substring of s from i, inclusive, to j, exclusive
Replace assignment to max(maxLen, n-i) with a check maxStr.length() < (n-i), and setting maxStr to substring of s from i, inclusive, to n, exclusive
Return maxStr, that would be your answer.
Demo.
/*C++ program to print the largest substring in a string without repetation of character.
eg. given string :- abcabbabcd
largest substring possible without repetition of character is abcd.*/
#include<bits/stdc++.h>
using namespace std;
int main()
{
string str,str1;
int max =0;
string finalstr;
vector<string> str2;
cin>>str;
int len = str.length();
for(int i=0;i<len;i++)
{
if(str1.find(str[i]) != std::string::npos)
{
str2.push_back(str1);
char p = str[i];
str1 = "";
i--;
while(p!=str[i])
i--;
}
else
str1.append(str,i,1);
}
str2.push_back(str1);
for(int i=0;i<str2.size();i++)
{
if(max<str2[i].length()){
max = str2[i].length();
finalstr = str2[i];
}
}
cout<<finalstr<<endl;
}