Splitting a line in C/C++ using whitespace as delimiter [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do I tokenize a string in C++?
pseudocode:
Attributes[] = Split line(' ')
How?
I have been doing this:
char *pch;
pch = strtok(line," ");
while(pch!=NULL)
{
fputs ( pch, stdout );
}
and getting a non-written, stuck, exit file. It's something wrong with this?
Well, the thing isn't even meeting my pseudocode requirement, but I'm confused about how to index tokens (as char arrays) to my array, I guess I should write a 2-dim array?

Use strtok with " " as your delimiter.

This is not quite a dup - for C++ see and upvote the accepted answer here by #Zunino.
Basic code below but to see the full glorious elegance of the answer you are going to have to click on it.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "Something in the way she moves...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "\n"));
}
This hinges on the fact that by default, istream_iterator treats whitespace as its separator. The resulting tokens are written to cout on separate lines (per separator specified in constructor overload for ostream_iterator).

The easiest method is boost::split:
std::vector<std::string> words;
boost::split(words, your_string, boost::is_space());

Related

have a programming project for an intro c++ class one of the function we need to create is a split function

i was hoping to get some feedback on if i am doing this the "smart way" or if maybe i could be doing it faster. if i were splitting on white spaces
i would probably use getline(stringstream, word, delimiter)
but i didnt know how to adapt the delimiter to all the good characters so i just looped through the whole string generated a new word until i reached a bad character but as i am fairly new to programming im not sure if its the best way to do it
thanks for any feedback
#include <iostream>
#include <string>
using std::string;
#include <vector>
using std::vector;
#include <sstream>
#include <algorithm>
#include <iterator> //delete l8r
using std::cout; using std::cin; using std::endl;
/*
void split(string line, vector<string>&words, string good_chars)
o
Find words in the line that consist of good_chars.
Any other character is considered a separator.
o
Once you have a word, convert all the characters to lower case.
You then push each word onto the reference vector words.
Important: split goes in its own file. This is both for your own benefit, you can reuse
split, and for grading purposes.We will provide a split.h for you.
*/
void split(string line, vector<string> & words, string good_chars){
string good_word;
for(auto c : line){
if(good_chars.find(c)!=string::npos){
good_word.push_back(c);
}
else{
if(good_word.size()){
std::transform(good_word.begin(), good_word.end(), good_word.begin(), ::tolower);
words.push_back(good_word);
}
good_word = "";
}
}
}
int main(){
vector<string> words;
string good_chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'";
// TEST split
split("This isn't a TEST.", words, good_chars);
// words should have: {"this", "isn't", "a", "test"}, no period in test
std::copy(words.begin(), words.end(), std::ostream_iterator<string>(cout, ","));
cout << endl;
return 0;
}
I'd say that this is a reasonable approach given the context of an intro to C++ class. I'd even say that it's fairly likely that this is the approach your instructor expects to see.
There are, of course, a few optimization tweaks that can be done. Like instantiating a 256-element bool array, using good_chars to set the corresponding values to true, and all others defaulting to false, then replacing the find() call with a quick array lookup.
But, I'd predirect that if you were to hand in such a thing, you'll be suspected of copying stuff you found on the intertubes, so leave that alone.
One thing you might consider doing is using tolower when you push_back each character, instead, and removing the extra std::transform pass over the word.

Take a String with any number of words and store the words in different string variables? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
My aim is to do this in C++:
1. Let the user enter a line with any number of words.
2. Split the line into different words.
3. Store those words into separate string variables.
I know we can split the words of string using istringstream object.
But my question is how to store them in DIFFERENT string variables?. I know that it is not possible to create an array of strings.
Also, how to detect the end of string in a string stream, just like eof() marker in filestream?
Since you're already using the standard library, why not use a vector?
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
std::string input = "abc def ghi";
std::istringstream ss(input);
std::string token;
std::vector<std::string> vec;
while(std::getline(ss, token, ' ')) {
vec.push_back(token);
}
//vec now contains ['abc', 'def', 'ghi']
You have various options:
You can have an array of pointers to string; however, you need to know how many words there are in advance. UPDATE: As pointed out by #Blastfurnace this option is prone to errors and should be avoided.
You can use a vector (or any other container) to store them.
To get the words you can use a while loop and the extraction operator, it will automatically stop when you reach the end of your string.
Example:
istringstream iss(str);
string word;
while(iss >> word) {
/* do stuff with the word */
}
Yes I'm answering my Own Questions, after drawing conclusion from various above answers and comments. I'll answer in form of code.
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
int main()
{
string str, word;
vector<string> myVector;
getline(cin, str);
stringstream iss(str);
while(iss >> word)
myVector.push_back(word);
}

why does char[1] read entire word from my input file?

this is what I have done till now: I want to read words from file in C++ and I am allowed to use only cstring library. this is my piece of code
#include <cstring>
#include <fstream>
#include <stdio.h>
using namespace std;
int main(){
ifstream file;
char word[1];
file.open("p.txt");
while (!file.eof()){
file >> word;
cout << word << endl;
}
system("pause");
return 0;
}
It is working fine and reading one word at a time. But I don't understand how this is working fine.
How can char array of any size be it char word[1] or char word[50] read only one word at a time ignoring spaces.
And further I want to store these words in dynamic array. How can I achieve this? Any guidance would be appreciated?
Your code has undefined behaviour. operator >> simply overwrites memory beyond the array.
Take into account that included by you header <stdio.h> is not used in the program. On the other hand you need to include header <cstdlib> that declares function system.
As for your second question then you should use for example standard container std::vector<std::string>
For example
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
int main()
{
std::ifstream file("p.txt");
std::string s;
std::vector<std::string> v;
v.reserve( 100 );
while ( file >> s ) v.push_back( s );
std::system( "pause" );
return 0;
}
Or you can simply define the vector as
std::vector<std::string> v( ( std::istream_iterator<std::string>( file ) ),
std::istream_iterator<std::string>() );
provided that you will include header <iterator>
For example
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <cstdlib>
int main()
{
std::ifstream file("p.txt");
std::vector<std::string> v( ( std::istream_iterator<std::string>( file ) ),
std::istream_iterator<std::string>() );
for ( const std::string &s : v ) std::cout << s << std::endl;
std::system( "pause" );
return 0;
}
Your code is invoking undefined behavior. That it doesn't crash is a roll of the dice, but its execution is not deterministic precisely because that is the nature of being undefined.
The easiest way (I've found) to load a file of words with whitespace separation is by:
std::ifstream inp("p.txt");
std::istream_iterator<std::string> inp_it(inp), inp_eof;
std::vector<std::string> strs(inp_it, inp_eof);
strs will contain every whitespace delimited char sequence as a linear vector of std::string. Use std::string for dynamic string content and don't feel the least bit guilty about exploiting the hell out of the hard work those that came before you gave us all: The Standard Library.
Your code is failing due to the overload of char * for operator>>.
An array of char, regardless the size, will decompose to the type char * where the value is the address of the start of the array.
For compatibility with the C language, the overloaded operator>>(char *) has been implemented to read one or more characters until a terminating whitespace character is reached, or there is an error with the stream.
If you declare an array of 1 character and read from a file containing "California", the function will put 'C' into the first location of the array and keep writing the remaining characters to the next locations in memory (regardless of what data type they are). This is known as a buffer overflow.
A much safer method is to read into a std::string or if you only want one character, use a char variable. Look in your favorite C++ reference for the getline methods. There is an overload for reading until a given delimiter is reached.
You only need a couple changes:
#include <cstring>
#include <fstream>
#include <stdio.h>
#include <string>
int main(){
ifstream file;
string word;
file.open("p.txt");
while (file >> word){
cout << word << endl;
}
system("pause");
return 0;
}
It works because you are lucky and you don't overwrite some critical memory. You need to allocate enough bytes for char word array, say char word[64]. And use while(file>>word) as your test for EOF. In the loop you can push_back the word into a std::vector<string> if you are allowed to use C++ STL.
If you want a simple C++11 STL-like solution, use this
#include <algorithm>
#include <iterator>
#include <vector>
#include <string>
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
ifstream fin("./in.txt"); // input file
vector<string> words; // store the words in a vector
copy(istream_iterator<string>(fin),{}, back_inserter(words)); // insert the words
for(auto &elem: words)
cout << elem << endl; // display them
}
Or, more compactly, construct the container directly from the stream iterator like
vector<string> words(istream_iterator<string>(fin),{});
and remove the copy statement.
If instead a vector<string> you use a multiset<string> (#include <set>) and change
copy(istream_iterator<string>(fin),{}, back_inserter(words)); // insert the words
to
copy(istream_iterator<string>(fin),{}, inserter(words, words.begin())); // insert the words
you get the words ordered. So using STL is the cleanest approach in my opinion.
You're using C++, so you can avoid all that C stuff.
std::string word;
std::vector<std::string> words;
std::fstream stream("wordlist");
// this assumes one word (or phrase, with spaces, etc) per line...
while (std::getline(stream, word))
words.push_back(word);
or for multiple words (or phrases, with spaces, etc) per line separated by commas:
while (std::getline(stream, word, ','))
words.push_back(word);
or for multiple words per line separated by spaces:
while(stream >> word)
words.push_back(word);
No need to worry about buffer sizes or memory allocation or anything like that.
file>>char *
Will work with any char * and you are using
file >> word;
and it simply sees work variable as a char * but you are getting a segemntation fault somewhere and if your code grows you will see something is not working without any logical reason. GDB debugger will show you the seg fault

C++ how to put an input string from stdio into a vector, one word per container element

I'm learning c++, and I'm a bit of a newbie. I've researched this question quite a bit. I've studied vectors, strings, and stringstreams in c++ but I still can't find the 'right' way to do this.
Basically, I want to write, "some text" at the command line and have "some" put into a vector container at position '0' and "text" put into the same container in position '1'.
I've found a lot of ways that sorta work, but nothing that just does that.
Thanks for the help.
As per your comment:
#include <string>
#include <iostream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::string line;
std::getline(std::cin, line); // read one line from cin
std::stringstream buffer(line);
std::vector<std::string> words;
// copy each word from line to words
std::copy(std::istream_iterator<std::string>(buffer),
std::istream_iterator<std::string>(),
std::back_inserter(words));
}
You can simply use >> to achieve this effect.
std::vector<std::string> vector;
std::string string;
while(std::cin >> string)
vector.push_back(string);

how to tokenize a string [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I tokenize a string in C++?
hello every one i want to divide my string into two parts based on '\t' is there any built in function i tried strtok but it take char * as first in put but my variable is of type string
thanks
#include <sstream>
#include <vector>
#include <string>
int main() {
std::string str("abc\tdef");
char split_char = '\t';
std::istringstream split(str);
std::vector<std::string> token;
for(std::string each; std::getline(split, each, split_char); token.push_back(each));
}
Why can't you use C standard library?
Variant 1.
Use std::string::c_str() function to convert a std::string to a C-string (char *)
Variant 2.
Use std::string::find(char, size_t) to find a necessary symbol ('\t' in your case) than make a new string with std::string::substr. Loop saving a 'current position' till the end of line.