Splitting a C++ std::string using tokens, e.g. ";" [duplicate]

Splitting a C++ std::string using tokens, e.g. ";" [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split a string in C++?
Best way to split a string in C++? The string can be assumed to be composed of words separated by ;
From our guide lines point of view C string functions are not allowed and also Boost is also not allowed to use because of security conecerns open source is not allowed.
The best solution I have right now is:
string str("denmark;sweden;india;us");
Above str should be stored in vector as strings. how can we achieve this?
Thanks for inputs.

I find std::getline() is often the simplest. The optional delimiter parameter means it's not just for reading "lines":
#include <sstream>
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<string> strings;
istringstream f("denmark;sweden;india;us");
string s;
while (getline(f, s, ';')) {
cout << s << endl;
strings.push_back(s);
}
}

You could use a string stream and read the elements into the vector.
Here are many different examples...
A copy of one of the examples:
std::vector<std::string> split(const std::string& s, char seperator)
{
std::vector<std::string> output;
std::string::size_type prev_pos = 0, pos = 0;
while((pos = s.find(seperator, pos)) != std::string::npos)
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
output.push_back(substring);
prev_pos = ++pos;
}
output.push_back(s.substr(prev_pos, pos-prev_pos)); // Last word
return output;
}

There are several libraries available solving this problem, but the simplest is probably to use Boost Tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
std::string str("denmark;sweden;india;us");
boost::char_separator<char> sep(";");
tokenizer tokens(str, sep);
BOOST_FOREACH(std::string const& token, tokens)
{
std::cout << "<" << *tok_iter << "> " << "\n";
}

Related

Ignore spaces in vector C++

I'm trying to split a string in individual words using vector in C++. So I would like to know how to ignore spaces in vector, if user put more than one space between words in string.
How would I do that?
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
int main(){
cout<<"Sentence: ";
string sentence;
getline(cin,sentence);
vector<string> my;
int start=0;
unsigned int end=sentence.size();
unsigned int temp=0;
while(temp<end){
int te=sentence.find(" ",start);
temp=te;
my.push_back(sentence.substr(start, temp-start));
start=temp+1;
}
unsigned int i;
for(i=0 ; i<my.size() ; i++){
cout<<my[i]<<endl;
}
return 0;
}

Four things:
When reading input from a stream into astring using the overloaded >> operator, then it automatically separates on white-space. I.e. it reads "words".
There exists an input stream that uses a string as the input, std::istringstream.
You can use iterators with streams, like e.g. std::istream_iterator.
std::vector have a constructor taking a pair of iterators.
That means your code could simply be
std::string line;
std::getline(std::cin, line);
std::istringstream istr(line);
std::vector<std::string> words(std::istream_iterator<std::string>(istr),
std::istream_iterator<std::string>());
After this, the vector words will contain all the "words" from the input line.
You can easily print the "words" using std::ostream_iterator and std::copy:
std::copy(begin(words), end(words),
std::ostream_iterator<std::string>(std::cout, "\n"));

The easiest way is to use a std::istringstream like follows:
std::string sentence;
std::getline(std::cin,sentence);
std::istringstream iss(sentence);
std::vector<std::string> my;
std::string word;
while(iss >> word) {
my.push_back(word);
}
Any whitespaces will be ignored and skipped automatically.

You can create the vector directly using the std::istream_iterator which skips white spaces:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
int main() {
std::string str = "Hello World Lorem Ipsum The Quick Brown Fox";
std::istringstream iss(str);
std::vector<std::string> vec {std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>() };
for (const auto& el : vec) {
std::cout << el << '\n';
}
}

Here is a function which divides given sentence into words.
#include <string>
#include <vector>
#include <sstream>
#include <utility>
std::vector<std::string> divideSentence(const std::string& sentence) {
std::stringstream stream(sentence);
std::vector<std::string> words;
std::string word;
while(stream >> word) {
words.push_back(std::move(word));
}
return words;
}

Reducing double, triple etc. spaces in string is a problem you'll encounter again and again. I've always used the following very simple algorithm:
Pseudocode:
while " " in string:
string.replace(" ", " ")
After the while loop, you know your string only has single spaces since multiple consecutive spaces were compressed to singles.
Most languages allow you to search for a substring in a string and most languages have the ability to run string.replace() so it's a useful trick.

getline(param1,param2,param3) usage in c++ ,linux

...may be so simple question,but am going to write a simple c++ code to parse a string using a delimiter,i want the delimiter to contain multiple spaces(actually one or more space). My question is,is it possible to do that way? my sample code is :
#include <stdio.h>
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <stdlib.h>
#include <cstring>
#include <sstream>
using namespace std;
int main()
{
string str="HELLO THIS IS 888and777";
char buf[1000];
getline(buf, 1000);
string str(buf);
stringstream stream(buf);
string toStr;
getline(stream, toStr,' ');//here the delimiter is six spaces
string str1=tostr;
getline(stream, toStr,' ');//here the delimiter is two spaces
string str2=tostr;
getline(stream, toStr,' ');//here the delimiter is three spaces
string str3=tostr;
cout<<str1<<"\t"<<str2<<"\t"<<str3<<endl;
return 0;
}
but,i cant use a delimiter of multiple chars. any idea please.
i get the following error:
error: invalid conversion from ‘void*’ to ‘char**’
error: cannot convert ‘std::string’ to ‘size_t*’ for argument ‘2’ to ‘__ssize_t getline(char**, size_t*, FILE*)’

The delimiter used by std::getline() is purely an individual character. To accept a string would require a non-trivial algorithm to guarantee suitable performance. In addition, the entities defined using 'x' normally need to result in an individual char.
For the example I think the easiest approach is to simply tokenize the string directly:
#include <tuple>
#include <utility>
#include <string>
#include <iostream>
std::pair<std::string, std::string::size_type>
get_token(std::string const& value, std::string::size_type pos, std::string const& delimiter)
{
if (pos == value.npos) {
return std::make_pair(std::string(), pos);
}
std::string::size_type end(value.find(delimiter, pos));
return end == value.npos
? std::make_pair(value.substr(pos), end)
: std::make_pair(value.substr(pos, end - pos), end + delimiter.size());
}
int main()
{
std::string str("HELLO THIS IS 888and777");
std::string str1, str2, str3;
std::string::size_type pos(0);
std::tie(str1, pos) = get_token(str, pos, " ");
std::tie(str2, pos) = get_token(str, pos, " ");
std::tie(str3, pos) = get_token(str, pos, " ");
std::cout << "str1='" << str1 << "' str2='" << str2 << "' str3='" << str3 << "'\n";
}

Using strtok() to parse text file

I've been trying to make a program that parses a text file and feeds 6 pieces of information into an array of objects. The problem for me is that I'm having issues figuring out how to process the text file. I was told that the first step I needed to do was to write some code that counted how many letters long each entry was. The txt file is in this format:
"thing1","thing2","thing3","thing4","thing5","thing6"
This is the current version of my code:
#include<iostream>
#include<string>
#include<fstream>
#include<cstring>
using namespace std;
int main()
{
ifstream myFile("Book List.txt");
while(myFile.good())
{
string line;
getline(myFile, line);
char *sArr = new char[line.length() + 1];
strcpy(sArr, line.c_str());
char *sPtr;
sPtr = strtok(sArr, " ");
while(sPtr != NULL)
{
cout << strlen(sPtr) << " ";
sPtr = strtok(NULL, " ");
}
cout << endl;
}
myFile.close();
return 0;
}
So there are two things making it hard for me right now.
1) How do I deal with the delimiters?
2) How do I deal with "skipping" the first quotation mark in each line?

Read in a string instead of a c-style string. This means that you can use the handy std methods.
The std::string::find() method should help you out with finding each thing that you want to parse.
http://www.cplusplus.com/reference/string/string/find/
You can use this to find all the commas, which will give you the starts of all the things.
Then you can use std::string::substr() to cut up the string into each piece.
http://www.cplusplus.com/reference/string/string/substr/
You can manage to get rid of the quotation marks by passing in 1 more than the start and 1 less than the length of the thing, you can also use

If you have to use strtok then this code snippet should give enough to modify your program to parse your data:
#include <cstdio>
#include <cstring>
int main ()
{
char str[] ="\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"\",");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ",\"");
}
return 0;
}
If you do not have to use strtok then you should use std::string as others have advised. Using std::string and std::istringstream:
#include <string>
#include <sstream>
#include <vector>
#include <iostream>
int main ()
{
std::string str2( "\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"" ) ;
std::istringstream is(str2);
std::string part;
while (getline(is, part, ','))
std::cout << part.substr(1,part.length()-2) << std::endl;
return 0;
}

For starters, don't use strtok if you can avoid it (and you easily can here - and you can even avoid using the find series of functions as well).
If you want to read in the whole line and then parse it:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
// defines a new ctype that treats commas as whitespace
struct csv_reader : std::ctype<char>
{
csv_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
rc[','] = std::ctype_base::space;
return &rc[0];
}
};
int main()
{
std::ifstream fin("yourFile.txt");
std::string line;
csv_reader csv;
std::vector<std::vector<std::string>> values;
while (std::getline(fin, line))
{
istringstream iss(line);
iss.imbue(std::locale(std::locale(), csv));
std::vector<std::string> vec;
std::copy(std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>(), std::back_inserter(vec));
values.push_back(vec);
}
// values now contains a vector for each line that has the strings split by their commas
fin.close();
return 0;
}
That answers your first question. For your second, you can skip all the quotation marks by adding them to the rc mask (also treating them as whitespace) or you can strip them out afterwards (either directly or by using a transform):
std::transform(vec.begin(), vec.end(), vec.begin(), [](std::string& s)
{
std::string::iterator pend = std::remove_if(s.begin(), s.end(), [](char c)
{
return c == '"';
});
s.erase(pend, s.end());
});

How to take formatted input from ifstream

I have a text file with a set of names formatted in the following way:
"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH"
and so on. I want to open the file using ifstream and read the names into a string array (without quotes, commas). I somehow managed to do it by checking the input stream character by character. Is there an easier way to take this formatted input?
EDIT:
I heard that you can use something like
fscanf (f, "\"%[a-zA-Z]\",", str);
in C, but is there such a method for ifstream?

That input should be parsable with std::getline or std::regex_token_iterator (though the latter is shooting sparrows with artillery).
Examples:
Regex
Quick and dirty, yet heavyweight solution (using boost so most compilers eat this)
#include <boost/regex.hpp>
#include <iostream>
int main() {
const std::string s = "\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"";
boost::regex re("\"(.*?)\"");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, 1), end;
it != end; ++it)
{
std::cout << *it << std::endl;
}
}
Output:
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
Alternatively, you can use
boost::regex re(",");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, -1), end;
to let it split along commas (note also the -1) or other regexes.
getline
getline solution (whitespace allowed)
#include <sstream>
#include <iostream>
int main() {
std::stringstream ss;
ss.str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");
std::string curr;
while (std::getline (ss, curr, ',')) {
size_t from = 1 + curr.find_first_of ('"'),
to = curr.find_last_of ('"');
std::cout << curr.substr (from, to-from) << std::endl;
}
}
Output is the same.
getline
getline solution (whitespace not allowed)
The loop becomes almost trivial:
std::string curr;
while (std::getline (ss, curr, ',')) {
std::cout << curr.substr (1, curr.length()-2) << std::endl;
}
homebrew solution
Least wasteful w.r.t. performance (especially when you wouldn't store those strings, but iterators or indices instead)
#include <iostream>
int main() {
const std::string str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");
size_t i = 0;
while (i != std::string::npos) {
size_t begin = str.find ('"', i) + 1, // one behind initial '"'
end = str.find ('"', begin),
comma = str.find (',', end);
i = comma;
std::cout << str.substr(begin, end-begin) << std::endl;
}
}

As far as I know, there is no tokenizer in the STL. But if you are willing to use boost, there's a very good tokenizer class there. Other than that, character by character is your best C++ way of addressing it (unless you are willing to go the C route, and use strtok_t on your raw char * strings).

A simple tokenizer should do the trick; no need for something heavy-weight like regular expressions. C++ doesn't have a built-in one, but it's easy enough to write. Here's one which I myself stole off the internet so long ago I don't even remember who wrote it, so apologies for the blatant plagiarism:
#include <vector>
#include <string>
std::vector<std::string>
tokenize(const std::string & str, const std::string & delimiters)
{
std::vector<std::string> tokens;
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of(delimiters, lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
Usage: std::vector<std::string> words = tokenize(line, ",");

Actually, because I was interested, I worked out how to do this using Boost.Spirit.Qi:
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace boost::spirit::qi;
int main() {
// our test-string
std::string data("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\"");
// this is where we will store the names
std::vector<std::string> names;
// parse the string
phrase_parse(data.begin(), data.end(),
( lexeme['"' >> +(char_ - '"') >> '"'] % ',' ),
space, names);
// print what we have parsed
std::copy(names.begin(), names.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
To check if an error occurred during parsing, simply store the iterators over the string in variables, and compare them afterwards. If they are equal, the whole string was matched, if not, the begin-iterator will point to the error site.

Splitting strings in C++ [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 4 years ago.
How do you split a string into tokens in C++?

this works nicely for me :), it puts the results in elems. delim can be any char.
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}

With this Mingw distro that includes Boost:
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <ostream>
#include <algorithm>
#include <boost/algorithm/string.hpp>
using namespace std;
using namespace boost;
int main() {
vector<string> v;
split(v, "1=2&3=4&5=6", is_any_of("=&"));
copy(v.begin(), v.end(), ostream_iterator<string>(cout, "\n"));
}

You can use the C function strtok:
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
The Boost Tokenizer will also do the job:
#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>
int main(){
using namespace std;
using namespace boost;
string s = "This is, a test";
tokenizer<> tok(s);
for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
cout << *beg << "\n";
}
}

Try using stringstream:
std::string line("A line of tokens");
std::stringstream lineStream(line);
std::string token;
while(lineStream >> token)
{
}
Check out my answer to your last question:
C++ Reading file Tokens

See also boost::split from String Algo library
string str1("hello abc-*-ABC-*-aBc goodbye");
vector<string> tokens;
boost::split(tokens, str1, boost::is_any_of("-*"));
// tokens == { "hello abc","ABC","aBc goodbye" }

It depends on how complex the token delimiter is and if there are more than one. For easy problems, just use std::istringstream and std::getline. For more complex tasks or if you want to iterate the tokens in an STL-compliant way, use Boost's Tokenizer. Another possibility (although messier than either of these two) is to set up a while loop that calls std::string::find and updates the position of the last found token to be the start point for searching for the next. But this is probably the most bug-prone of the 3 options.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Splitting a C++ std::string using tokens, e.g. ";" [duplicate] - c++

Related

Ignore spaces in vector C++

getline(param1,param2,param3) usage in c++ ,linux

Using strtok() to parse text file

How to take formatted input from ifstream

Splitting strings in C++ [duplicate]

Categories

Resources