Splitting a string - c++

I have this code to split a string. For some reason, it just sits there doing nothing. I am not sure what the problem is. By the way, delim = ' ' here.
vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;
while(beg != str.end())
{
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
beg = temp;
}
return tokens;
}

Here is another nice and short Boost-based version that uses a whole string as delimiter:
std::vector<std::string> result;
boost::iter_split(result, str, boost::first_finder(delim));
Or case-insensitive:
std::vector<std::string> result;
boost::iter_split(result, str,
boost::first_finder(delim, boost::is_iequal()));

I could debug it for you, I guess but that won't help you in the long run. Here's what you do.
After every line, put a printf() or cout staement dumping the changed variables to standard output. Then run your code, passing a simple set of parameters to it:
vector<string> x = split ("Hello there, Bob.", ' ');
Then, examine the output to see why your implementation isn't working. You'll probably have to break out of the code since, if it's just sitting there, you've probably got yourself one of those new-fangled infinite loops.
Give a man a fish and he'll eat for a day, teach a man to fish, he'll never be hungry again.
Or the Terry Pratchett version:
Give a man some fire and he'll be warm for a day, set a man on fire, he'll be warm for the rest of his life.
Update:
Since you've stated that you've actually done what I suggested, here's what I found out from doing it. It's evident that when you set beg to temp at the end of the while loop, it's pointing at the space. That was discovered by printing the beg string at the top of the while loop - it never changed after the first word was extracted.
Then, when you do the next find, it finds that exact same space rather than first skipping spaces then calling find properly. You need to skip the spaces after each find, making sure you don't iterate beyond the end of the string.
This is my solution. Use it as you wish.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
vector<string> split( const string &str, const char &delim ) {
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;
while(beg != str.end()) {
//cout << ":" << beg._Myptr << ":" << endl;
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
beg = temp;
while ((beg != str.end()) && (*beg == delim))
beg++;
}
return tokens;
}
int main () {
vector<string> x = split ("Hello, my name is Bob. ", ' ');
return 0;
}
Without that space-skipping code at the end of the while loop, the output was:
:Hello, my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
and so on, ad infinitum. With the skipping code, you get:
:Hello, my name is Bob. :
:my name is Bob. :
:name is Bob. :
:is Bob. :
:Bob. :

I've got to love Boost, as it's providing a handy solution to this one as well:
std::vector<std::string> Split(const std::string &s, const std::string &d)
{
std::vector<std::string> v;
for (boost::split_iterator<std::string::iterator> i = boost::make_split_iterator(s, boost::first_finder(d, boost::is_iequal()));
i != boost::split_iterator<std::string::iterator>();
++i) {
v.push_back(boost::copy_range<std::string>(*i));
}
return v;
}

There is a problem in your while loop in that if the delimiter is found then temp will point to the first delimiter after the first find call.
At the end of the while loop you set beg to the value of temp.
Now beg also points to the first delimiter.
When find is next called it will return the current value of beg again as it does point to a delimiter.
temp hasn't moved on from it's previous value so you are in an infinite loop.

Maybe this one:
std::vector<std::string> &mysplit(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}

vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;
while(beg != str.end())
{
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
if(temp != str.end())
temp++;
beg = temp;
}
return tokens;
}

find() will return the position X of the next token. When you then assign this to beg and go into the next iteration, it will start searching at position X again - and again - and again ... i.e. you are stuck in an endless loop.
Try this code:
vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;
vector<string> tokens;
iter pos = str.begin(), last = str.begin();
while(pos != str.end()) {
last = pos;
pos = find(pos, str.end(), delim);
if (pos != str.end()) {
string token = string(last, pos);
if (token.length() > 0)
tokens.push_back(token);
last = ++pos;
}
}
string lastToken = string(last, pos);
if (lastToken.length() > 0)
tokens.push_back(lastToken);
return tokens;
}
This has the added benefit that it will include the last token in the list (e.g. when splitting on space, the string "a b c" will now return tokens a, b and c instead of only a and b) and that multiple delims will not lead to empty tokens.

You don't have to reinvent the wheel, boost provides a string splitting function for you.
Example code:
string stringtobesplit = "AA/BB-CC")
vector<string> tokens;
boost::split(tokens, stringtobesplit, boost::is_any_of("/-"));
// tokens now holds 3 items: AA BB CC

The simplest way to debug this code, is to print all the positions beg will be. If beg doesn't increase then that's your problem.

Apart from the beg needing to be incremented with the size of the delimiter, one special case is missed: the case where no delimiters are in the string.

Related

How to break up a string into a vector fast?

I am processing CSV and using the following code to process a single line.
play with code
std::vector<std::string> string_to_vector(const std::string& s, const char delimiter, const char escape) {
std::stringstream sstr{s};
std::vector<std::string> result;
while (sstr.good()) {
std::string substr;
getline(sstr, substr, delimiter);
while (substr.back() == escape) {
std::string tmp;
getline(sstr, tmp, delimiter);
substr += "," + tmp;
}
result.emplace_back(substr);
}
return result;
}
What it does: Function breaks up string s based on delimiter. If the delimiter is escaped with escape the delimiter will be ignored.
This code works but is super slow. How can I speed it up?
Do you know any existing csv processing implementation that does exactly this and which I could use?
The fastest way to do something is to not do it at all.
If you can ensure that your source string s will outlive the use of the returned vector, you could replace your std::vector<std::string> with std::vector<char*> which would point to the beginning of each substring. You then replace your identified delimiters with zeroes.
[EDIT] I have not moved up to C++17, so no string_view for me :)
NOTE: typical CSV is different from what you imply; it doesn't use escape for the comma, but surrounds entries with comma in it with double quotes. But I assume you know your data.
Implementation:
#include <iostream>
#include <vector>
#include <string>
std::vector<char*> string_to_vector(std::string& s,
const char delimiter, const char escape)
{
size_t prev(0), pos(0), from(0);
std::vector<char*> v;
while ((pos = s.find(delimiter, from)) != s.npos)
{
if (pos == 0 || s[pos - 1] != escape)
{
s[pos] = 0;
v.push_back(&s[prev]);
prev = pos + 1;
}
from = pos + 1;
}
v.push_back(&s[prev]);
return v;
}
int main() {
std::string test("this,is,a\\,test");
std::vector<char*> v = string_to_vector(test, ',', '\\');
for (auto& s : v)
std::cout << s << " ";
}

Remove char if infront of word in vector of string

I have a vector of strings with different chars infront of them. for example:
"Hello
(Hello
I want to remove the first occurrence of the char. So if there is a " or ( before the word, I want it gone. My code so far is this.
void wash(std::vector<std::string> & data)
{
std::string b_chars = "\"'("; //Before_chars
std::string a_chars = "!?;,:.\"')"; //after_chars
data.erase(std::remove_if(data.begin(), data.end(), [&b_chars](const char& c) {
return data.find_first_of(b_chars) != std::string::npos;
}), data.end());
}
Your condition is wrong - you should determine whether c is one of the offending characters, which is b_chars.find(c) != std::string::npos.
To iterate over the entire vector, you could go:
std::for_each(data.begin(),
data.end(),
[&b_chars](std::string& str)
{
str.erase(std::remove_if(str.begin(),
str.end(),
[&b_chars](const char& c)
{return b_chars.find(c) != std::string::npos;}),
data.end());
});
}
But it makes sense to have a separate string-washing function and not limit yourself to vectors of strings (I didn't read your code properly because this is a more useful building block to start with):
void wash_string(std::string & s)
{
static const std::string b_chars = "\"'("; //Before_chars
static const std::string a_chars = "!?;,:.\"')"; //after_chars
s.erase(std::remove_if(s.begin(),
s.end(),
[&b_chars](const char& c)
{return b_chars.find(c) != std::string::npos;}),
s.end());
}
void wash(std::vector<std::string> & data)
{
std::for_each(data.begin(), data.end(), wash_string);
}
There are various problems with your code:
in you code you dont remove chars from a string but from a vector of string. You should iterate your vector and do the removal on each string.
As in molbdnilo answer, you need to change the condition of your lambda to find character inside the string of offending chars
void wash(std::vector<std::string> & data)
{
std::string b_chars = "\"'("; //Before_chars
std::string a_chars = "!?;,:.\"')"; //after_chars
for (auto& str : data) {
str.erase(std::remove_if(str.begin(), str.end(),
[&](const char &c) { return b_chars.find_first_of(c) != std::string::npos; }),
str.end());
}
}
If you want to remove these the chars only before the first occurrence of a 'normal char':
for (vector<std::string>::iterator vt_it = data.begin(); vt_it<data.end(); ++vt_it)
{
std::string::iterator str_it = (*vt_it).begin();
while (str_it != (*vt_it).end())
{
if ((b_chars).find((*str_it)) == std::string::npos)
break;
str_it++;
}
(*vt_it).erase ((*vt_it).begin(), str_it);
}
But if you want to remove all those chars:
for (vector<std::string>::iterator vt_it = data.begin(); vt_it<data.end(); ++vt_it)
{
(*vt_it).erase(
std::remove_if(
(*vt_it).begin(),
(*vt_it).end(),
[&b_chars](const char& c) {return b_chars.find(c) != std::string::npos;}),
(*vt_it).end()
);
}
OBS. You didn't reffer anything about what you want to do with a_chars. I tested all the codes in these post, included mine.

Getting the words from a sentence and storing them in a vector of strings

Alright, guys ...
Here's my set that has all the letters. I'm defining a word as consisting of consecutive letters from the set.
const char LETTERS_ARR[] = {"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::set<char> LETTERS_SET(LETTERS_ARR, LETTERS_ARR + sizeof(LETTERS_ARR)/sizeof(char));
I was hoping that this function would take in a string representing a sentence and return a vector of strings that are the individual words in the sentence.
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1) == 1))) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += k;
}
else {
++it;
}
}
return retvec;
}
For instance, the following call should return a vector of the strings "Yo", "dawg", etc.
std::string mystring("Yo, dawg, I heard you life functions, so we put a function inside your function so you can derive while you derive.");
std::vector<std::string> mystringvec = get_sntnc_wrds(mystring);
But everything isn't going as planned. I tried running my code and it was putting the entire sentence into the first and only element of the vector. My function is very messy code and perhaps you can help me come up with a simpler version. I don't expect you to be able to trace my thought process in my pitiful attempt at writing that function.
Try this instead:
#include <vector>
#include <cctype>
#include <string>
#include <algorithm>
// true if the argument is whitespace, false otherwise
bool space(char c)
{
return isspace(c);
}
// false if the argument is whitespace, true otherwise
bool not_space(char c)
{
return !isspace(c);
}
vector<string> split(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end())
{
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end())
ret.push_back(string(i, j));
i = j;
}
return ret;
}
The split function will return a vector of strings, each element containing one word.
This code is taken from the Accelerated C++ book, so it's not mine, but it works. There are other superb examples of using containers and algorithms for solving every-day problems in this book. I could even get a one-liner to show the contents of a file at the output console. Highly recommended.
It's just a bracketing issue, my advice is (almost) never put in more brackets than are necessary, it's only confuses things
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1)) == 1) {
Your code compares the character with 1 not the return value of count.
Also although count does return an integer in this context I would simplify further and treat the return as a boolean
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1))) {
You should use the string steam with std::copy like so:
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
int main() {
std::string sentence = "And I feel fine...";
std::istringstream iss(sentence);
std::vector<std::string> split;
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(split));
// This is to print the vector
for(auto iter = split.begin();
iter != split.end();
++iter)
{
std::cout << *iter << "\n";
}
}
I would use another more simple approach based on member functions of class std::string. For example
const char LETTERS[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string s( "This12 34is 56a78 test." );
std::vector<std::string> v;
for ( std::string::size_type first = s.find_first_of( LETTERS, 0 );
first != std::string::npos;
first = s.find_first_of( LETTERS, first ) )
{
std::string::size_type last = s.find_first_not_of( LETTERS, first );
v.push_back(
std::string( s, first, last == std::string::npos ? std::string::npos : last - first ) );
first = last;
}
for ( const std::string &s : v ) std::cout << s << ' ';
std::cout << std::endl;
Here you make 2 mistakes, I have correct in the following code.
First, it should be
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1))
and, it should move to next by
it += (k+1);
and the code is
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1)) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += (k+1);
}
else {
++it;
}
}
return retvec;
}
The output have been tested.

finding substring c++ [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to split a string?
Hi,
I have a string say "1,0,1", how can i get the substring separated by comma operator.
C++ doesn't have a built in function for doing exactly this. However, it can be implemented using either the std::string::find_first_of member function, or the non-member std::find.
Here's an example using the latter:
#include <string>
#include <vector>
#include <algorithm>
// given a string str, split it on every occurrence of the character delim
std::vector<std::string> tokenize(std::string str, char delim) {
// store the results in a vector of strings
std::vector<std::string> tokens;
std::string::iterator end = str.end();
std::string::iterator left = str.begin();
for (;;) {
// find the next occurrence of the delimiter
std::string::iterator right = std::find(left, end, delim);
// create a string from the end of last one up until the one we just foun
tokens.push_back(std::string(left, right));
// if we reached the end of the string, exit the loop
if (right == end) { break; }
// otherwise, start the next iteration just past the delimiter we just found
left = right + 1;
}
return tokens;
}
// test program
int main() {
std::string str = "foo, bar, baz";
std::string str2 = "foo, bar, baz,";
std::string str3 = "foo";
std::string str4 = "";
std::string str5 = ",";
std::vector<std::string> tokens = tokenize(str, ',');
std::vector<std::string> tokens2 = tokenize(str2, ',');
std::vector<std::string> tokens3 = tokenize(str3, ',');
std::vector<std::string> tokens4 = tokenize(str4, ',');
std::vector<std::string> tokens5 = tokenize(str5, ',');
}
Of course there are a lot of border cases to handle, and this implementation might not do exactly what you want, but it should give you a starting point.
another way of doing this is by using strtok. This is a old c way but it still applies to the problem.
using <vector>
using <string>
char* token, line[512];
std::string tokenStr;
std::string lineStr = "0, 1, 2";
std::vector<std::string> commaSplit;
strcpy ( line, lineStr.c_str());
//Remove spaces and find the first instance of ','
token = strtok( line, " ," );
while(token != NULL)
{
//Copy the token to a string
tokenStr = token;
//Add the token to the vector
commaSplit.push_back(token);
//Find next instance of the ,
token = strtok(NULL, " ,");
}
Search google for an algorithm to explode or tokenize your string. It's trivial.
You can also check out the documentation and use available tools : http://www.cplusplus.com/reference/string/string/
A simple implementation could be :
void tokenize(const string & text, vector<string> & tokens, char delim)
{
size_t length = text.size();
string token = "";
for(size_t i=0;i<length;i++)
{
if(text[i] != delim)
{
token += text[i];
}
else
{
if(token.size() > 0)
{
tokens.push_back(token);
}
token = "";
}
}
tokens.push_back(token);
}

Is There A Built-In Way to Split Strings In C++?

well is there? by string i mean std::string
Here's a perl-style split function I use:
void split(const string& str, const string& delimiters , vector<string>& tokens)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
There's no built-in way to split a string in C++, but boost provides the string algo library to do all sort of string manipulation, including string splitting.
Yup, stringstream.
std::istringstream oss(std::string("This is a test string"));
std::string word;
while(oss >> word) {
std::cout << "[" << word << "] ";
}
STL strings
You can use string iterators to do your dirty work.
std::string str = "hello world";
std::string::const_iterator pos = std::find(string.begin(), string.end(), ' '); // Split at ' '.
std::string left(str.begin(), pos);
std::string right(pos + 1, str.end());
// Echoes "hello|world".
std::cout << left << "|" << right << std::endl;
void split(string StringToSplit, string Separators)
{
size_t EndPart1 = StringToSplit.find_first_of(Separators)
string Part1 = StringToSplit.substr(0, EndPart1);
string Part2 = StringToSplit.substr(EndPart1 + 1);
}
The answer is no. You have to break them up using one of the library functions.
Something I use:
std::vector<std::string> parse(std::string l, char delim)
{
std::replace(l.begin(), l.end(), delim, ' ');
std::istringstream stm(l);
std::vector<std::string> tokens;
for (;;) {
std::string word;
if (!(stm >> word)) break;
tokens.push_back(word);
}
return tokens;
}
You can also take a look at the basic_streambuf<T>::underflow() method and write a filter.
What the heck... Here's my version...
Note: Splitting on ("XZaaaXZ", "XZ") will give you 3 strings. 2 of those strings will be empty, and won't be added to theStringVector if theIncludeEmptyStrings is false.
Delimiter is not any element in the set, but rather matches that exact string.
inline void
StringSplit( vector<string> * theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
UASSERT( theStringVector, !=, (vector<string> *) NULL );
UASSERT( theDelimiter.size(), >, 0 );
size_t start = 0, end = 0, length = 0;
while ( end != string::npos )
{
end = theString.find( theDelimiter, start );
// If at end, use length=maxLength. Else use length=end-start.
length = (end == string::npos) ? string::npos : end - start;
if ( theIncludeEmptyStrings
|| ( ( length > 0 ) /* At end, end == length == string::npos */
&& ( start < theString.size() ) ) )
theStringVector -> push_back( theString.substr( start, length ) );
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size() );
}
}
inline vector<string>
StringSplit( const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
vector<string> v;
StringSplit( & v, theString, theDelimiter, theIncludeEmptyStrings );
return v;
}
There is no common way doing this.
I prefer the boost::tokenizer, its header only and easy to use.
C strings
Simply insert a \0 where you wish to split. This is about as built-in as you can get with standard C functions.
This function splits on the first occurance of a char separator, returning the second string.
char *split_string(char *str, char separator) {
char *second = strchr(str, separator);
if(second == NULL)
return NULL;
*second = '\0';
++second;
return second;
}
A fairly simple method would be to use the c_str() method of std::string to get a C-style character array, then use strtok() to tokenize the string. Not quite as eloquent as some of the other solutions listed here, but it's easy and works.