Is There A Built-In Way to Split Strings In C++?

Is There A Built-In Way to Split Strings In C++? - c++

well is there? by string i mean std::string

Here's a perl-style split function I use:
void split(const string& str, const string& delimiters , vector<string>& tokens)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}

There's no built-in way to split a string in C++, but boost provides the string algo library to do all sort of string manipulation, including string splitting.

Yup, stringstream.
std::istringstream oss(std::string("This is a test string"));
std::string word;
while(oss >> word) {
std::cout << "[" << word << "] ";
}

STL strings
You can use string iterators to do your dirty work.
std::string str = "hello world";
std::string::const_iterator pos = std::find(string.begin(), string.end(), ' '); // Split at ' '.
std::string left(str.begin(), pos);
std::string right(pos + 1, str.end());
// Echoes "hello|world".
std::cout << left << "|" << right << std::endl;

void split(string StringToSplit, string Separators)
{
size_t EndPart1 = StringToSplit.find_first_of(Separators)
string Part1 = StringToSplit.substr(0, EndPart1);
string Part2 = StringToSplit.substr(EndPart1 + 1);
}

The answer is no. You have to break them up using one of the library functions.
Something I use:
std::vector<std::string> parse(std::string l, char delim)
{
std::replace(l.begin(), l.end(), delim, ' ');
std::istringstream stm(l);
std::vector<std::string> tokens;
for (;;) {
std::string word;
if (!(stm >> word)) break;
tokens.push_back(word);
}
return tokens;
}
You can also take a look at the basic_streambuf<T>::underflow() method and write a filter.

What the heck... Here's my version...
Note: Splitting on ("XZaaaXZ", "XZ") will give you 3 strings. 2 of those strings will be empty, and won't be added to theStringVector if theIncludeEmptyStrings is false.
Delimiter is not any element in the set, but rather matches that exact string.
inline void
StringSplit( vector<string> * theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
UASSERT( theStringVector, !=, (vector<string> *) NULL );
UASSERT( theDelimiter.size(), >, 0 );
size_t start = 0, end = 0, length = 0;
while ( end != string::npos )
{
end = theString.find( theDelimiter, start );
// If at end, use length=maxLength. Else use length=end-start.
length = (end == string::npos) ? string::npos : end - start;
if ( theIncludeEmptyStrings
|| ( ( length > 0 ) /* At end, end == length == string::npos */
&& ( start < theString.size() ) ) )
theStringVector -> push_back( theString.substr( start, length ) );
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size() );
}
}
inline vector<string>
StringSplit( const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
vector<string> v;
StringSplit( & v, theString, theDelimiter, theIncludeEmptyStrings );
return v;
}

There is no common way doing this.
I prefer the boost::tokenizer, its header only and easy to use.

C strings
Simply insert a \0 where you wish to split. This is about as built-in as you can get with standard C functions.
This function splits on the first occurance of a char separator, returning the second string.
char *split_string(char *str, char separator) {
char *second = strchr(str, separator);
if(second == NULL)
return NULL;
*second = '\0';
++second;
return second;
}

A fairly simple method would be to use the c_str() method of std::string to get a C-style character array, then use strtok() to tokenize the string. Not quite as eloquent as some of the other solutions listed here, but it's easy and works.

Related

Tokenize returns a vector that contains the delimiters - C++

I have the function that tokenizes a vector string and returns without the delimiters. But, I want to return with delimiters.
Desired output:
tokenize("<ab><>cd<", "<>")
should display: "<", "ab", ">", "<", ">", "cd", "<"
Here's my function:
vector<string> tokenize1(const string& s, const string& delim) {
vector<string> tokens;
string::size_type lastPos = s.find_first_not_of(delim, 0);
string::size_type pos = s.find_first_of(delim, lastPos);
while (string::npos != pos || string::npos != lastPos) {
tokens.push_back(s.substr(lastPos, pos - lastPos));
lastPos = s.find_first_not_of(delim, pos);
pos = s.find_first_of(delim, lastPos);
}
return tokens;
}

I'm not quite sure why you'd want to do this, but you've got almost all the code to do it already. Here's a minor modification that alternates between stuffing individual delimiter characters into the token vector (simpler if you're happy to return delimiter chunks, eg. "><>" instead of "<", ">", "<") and adding whole non-delimiter chunks to it.
std::vector<std::string> tokenize2(const std::string& s, const std::string& delim)
{
std::vector<std::string> tokens;
auto nextDelimiter = s.find_first_of(delim, 0);
auto nextNonDelimiter = s.find_first_not_of(delim, 0);
while (std::string::npos != nextDelimiter || std::string::npos != nextNonDelimiter)
{
if (nextNonDelimiter > nextDelimiter)
{
for (auto d = nextDelimiter; d < nextNonDelimiter && d < s.size(); d++)
tokens.push_back(s.substr(d, 1));
nextDelimiter = s.find_first_of(delim, nextNonDelimiter);
}
else
{
tokens.push_back(s.substr(nextNonDelimiter, nextDelimiter - nextNonDelimiter));
nextNonDelimiter = s.find_first_not_of(delim, nextDelimiter);
}
}
return tokens;
}
Note the use of auto (because we're living in the future now) and std:: (because using namespace std; is considered bad practise, for good reason).

String Tokenizer with multiple delimiters including delimiter without Boost

I need to create string parser in C++. I tried using
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then add new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}
This works for 2X+7cos(3Y)
(tokenizer("2X+7cos(3Y)","+-/^() \t");)
But gives a runtime error for 2X
I need non Boost solution.
I tried using C++ String Toolkit (StrTk) Tokenizer
std::vector<std::string> results;
strtk::split(delimiter, source,
strtk::range_to_type_back_inserter(results),
strtk::tokenize_options::include_all_delimiters);
return results;
but it doesn't give token as a separate string.
eg: if I give the input as 2X+3Y
output vector contains
2X+
3Y

What's probably happening is this is crashing when passed npos:
lastPos = str.find_first_not_of(delimiters, pos);
Just add breaks to your loop instead of relying on the while clause to break out of it.
if (pos == string::npos)
break;
lastPos = str.find_first_not_of(delimiters, pos);
if (lastPos == string::npos)
break;
pos = str.find_first_of(delimiters, lastPos);

Loop exit condition is broken:
while (string::npos != pos || string::npos != startpos)
Allows entry with, say pos = npos and startpos = 1.
So
strOne.substr(startpos, pos - startpos)
strOne.substr(1, npos - 1)
end is not npos, so substr doesn't stop where it should and BOOM!
If pos = npos and startpos = 0,
strOne.substr(startpos, pos - startpos)
lives, but
strOne.substr(pos, 1) == "\n"
strOne.substr(npos, 1) == "\n"
dies. So does
strOne.substr(pos, 1) != " "
Sadly I'm out of time and can't solve this right now, but QuestionC's got the right idea. Better filtering. Something along the lines of:
if (string::npos != pos)
{
if (strOne.substr(pos, 1) == "\n") // can possibly simplify this with strOne[pos] == '\n'
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne[pos] != ' ')
vS.push_back(strOne.substr(pos, 1));
}

Would be great if you could share some info on your environment. Your program ran fine with an input value of 2X on my Fedora 20 using g++.

I created a little function that splits a string into substrings (which are stored in a vector) and it allows you to set which characters you want to treat as whitespace. Normal whitespace will still be treated as whitespace, so you don't have to define that. Actually, all it does is turns the character you defined as whitespace into actual whitespace (space char ' '). Then it runs that in a stream (stringstream) to separate the substrings and store them in a vector. This may not be what you need for this particular problem, but maybe it can give you some ideas.
// split a string into its whitespace-separated substrings and store
// each substring in a vector<string>. Whitespace can be defined in argument
// w as a string (e.g. ".;,?-'")
vector<string> split(const string& s, const string& w)
{
string temp{ s };
// go through each char in temp (or s)
for (char& ch : temp) {
// check if any characters in temp (s) are whitespace defined in w
for (char white : w) {
if (ch == white)
ch = ' '; // if so, replace them with a space char (' ')
}
}
vector<string> substrings;
stringstream ss{ temp };
for (string buffer; ss >> buffer;) {
substrings.push_back(buffer);
}
return substrings;
}

C++ function to replace in a string all occurrences of a given substring

I want a function that takes a string and replaces all occurrences of a given word with asterisks in place of its letters. I want to do this elegantly, like a real C++ programmer.
As an example,
int main()
{
std::string str = "crap this craping shit.";
censor_word("crap", str);
std::cout << str;
return 0;
}
should output
"**** this ****ing shit"
I need help coming up with an elegant way of filling in the following function:
void censor_word(const std::string& word, std::string& text)
{
...
}
I know the geniuses at Stack Overflow can probably come up with a 1-line solution.
My code looks yucky
void censor_word(const std::string& word, std::string& text)
{
int wordsize= word.size();
if (wordsize < text.size())
{
for (std::string::iterator it(text.begin()), endpos(text.size() - wordsize), int curpos = 0; it != endpos; ++it, ++curpos)
{
if (text.substr(curpos, wordsize) == word)
{
std::string repstr(wordsize, '*');
text.replace(curpos, wordsize, repstr);
}
}
}
}
Teach me how to do this the way that a C++ purist would do it.

for( auto pos = str.find( word ); pos != std::string::npos; pos = str.find( word ) )
{
str.replace( str.begin() + pos, str.begin() + pos + word.size(), word.size(),'*' );
}
We find the first appearance of the word we want replaced. We then replace it. We do this until there are no more appearances, as they have all been replaced.

Extract substrings of a filename

In C/C++, how can I extract from c:\Blabla - dsf\blup\AAA - BBB\blabla.bmp the substrings AAA and BBB ?
i.e. extract the parts before and after - in the last folder of a filename.
Thanks in advance.
(PS: if possible, with no Framework .net or such things, in which I could easily get lost)

#include <iostream>
using namespace std;
#include <windows.h>
#include <Shlwapi.h> // link with shlwapi.lib
int main()
{
char buffer_1[ ] = "c:\\Blabla - dsf\\blup\\AAA - BBB\\blabla.bmp";
char *lpStr1 = buffer_1;
// Remove the file name from the string
PathRemoveFileSpec(lpStr1);
string s(lpStr1);
// Find the last directory name
stringstream ss(s.substr(s.rfind('\\') + 1));
// Split the last directory name into tokens separated by '-'
while (getline(ss, s, '-'))
cout << s << endl;
}
Explanation in comments.
This doesn't trim leading spaces - in the output - if you also want to do that - check this.

This can relatively easily be done with regular expressions:
std::regex if you have C++11; boost::regex if you don't:
static std::regex( R"(.*\\(\w+)\s*-\s*(\w+)\\[^\\]*$" );
smatch results;
if ( std::regex_match( path, results, regex ) ) {
std::string firstMatch = results[1];
std::string secondMatch = results[2];
// ...
}
Also, you definitely should have the functions split and
trim in toolkit:
template <std::ctype_base::mask test>
class IsNot
{
std::locale ensureLifetime;
std::ctype<char> const* ctype; // Pointer to allow assignment
public:
Is( std::locale const& loc = std::locale() )
: ensureLifetime( loc )
, ctype( &std::use_facet<std::ctype<char>>( loc ) )
{
}
bool operator()( char ch ) const
{
return !ctype->is( test, ch );
}
};
typedef IsNot<std::ctype_base::space> IsNotSpace;
std::vector<std::string>
split( std::string const& original, char separator )
{
std::vector<std::string> results;
std::string::const_iterator current = original.begin();
std::string::const_iterator end = original.end();
std::string::const_iterator next = std::find( current, end, separator );
while ( next != end ) {
results.push_back( std::string( current, next ) );
current = next + 1;
next = std::find( current, end, separator );
}
results.push_back( std::string( current, next ) );
return results;
}
std::string
trim( std::string const& original )
{
std::string::const_iterator end
= std::find_if( original.rbegin(), original.rend(), IsNotSpace() ).base();
std::string::const_iterator begin
= std::find_if( original.begin(), end, IsNotSpace() );
return std::string( begin, end );
}
(These are just the ones you need here. You'll obviously want
the full complement of IsXxx and IsNotXxx predicates, a split
which can split according to a regular expression, a trim which
can be passed a predicate object specifying what is to be
trimmed, etc.)
Anyway, the application of split and trim should be obvious
to give you what you want.

This does all the work and validations in plain C:
int FindParts(const char* source, char** firstOut, char** secondOut)
{
const char* last = NULL;
const char* previous = NULL;
const char* middle = NULL;
const char* middle1 = NULL;
const char* middle2 = NULL;
char* first;
char* second;
last = strrchr(source, '\\');
if (!last || (last == source))
return -1;
--last;
if (last == source)
return -1;
previous = last;
for (; (previous != source) && (*previous != '\\'); --previous);
++previous;
{
middle = strchr(previous, '-');
if (!middle || (middle > last))
return -1;
middle1 = middle-1;
middle2 = middle+1;
}
// now skip spaces
for (; (previous != middle1) && (*previous == ' '); ++previous);
if (previous == middle1)
return -1;
for (; (middle1 != previous) && (*middle1 == ' '); --middle1);
if (middle1 == previous)
return -1;
for (; (middle2 != last) && (*middle2 == ' '); ++middle2);
if (middle2 == last)
return -1;
for (; (middle2 != last) && (*last == ' '); --last);
if (middle2 == last)
return -1;
first = (char*)malloc(middle1-previous+1 + 1);
second = (char*)malloc(last-middle2+1 + 1);
if (!first || !second)
{
free(first);
free(second);
return -1;
}
strncpy(first, previous, middle1-previous+1);
first[middle1-previous+1] = '\0';
strncpy(second, middle2, last-middle2+1);
second[last-middle2+1] = '\0';
*firstOut = first;
*secondOut = second;
return 1;
}

The plain C++ solution (without boost, nor C++11), still the regex solution of James Kanze (https://stackoverflow.com/a/16605408/1032277) is the most generic and elegant:
inline void Trim(std::string& source)
{
size_t position = source.find_first_not_of(" ");
if (std::string::npos != position)
source = source.substr(position);
position = source.find_last_not_of(" ");
if (std::string::npos != position)
source = source.substr(0, position+1);
}
inline bool FindParts(const std::string& source, std::string& first, std::string& second)
{
size_t last = source.find_last_of('\\');
if ((std::string::npos == last) || !last)
return false;
size_t previous = source.find_last_of('\\', last-1);
if (std::string::npos == last)
previous = -1;
size_t middle = source.find_first_of('-',1+previous);
if ((std::string::npos == middle) || (middle > last))
return false;
first = source.substr(1+previous, (middle-1)-(1+previous)+1);
second = source.substr(1+middle, (last-1)-(1+middle)+1);
Trim(first);
Trim(second);
return true;
}

Use std::string rfind rfind (char c, size_t pos = npos)
Find character '\' from the end using rfind (pos1)
Find next character '\' using rfind (pos2)
Get the substring between the positions pos2 and pos1. Use substring function for that.
Find character '-' (pos3)
Extract 2 substrings between pos3 and pos1, pos3 and pos2
Remove the spaces in the substrings.
Resulting substrings will be AAA and BBB

Tokenize a string and include delimiters in C++

I'm tokening with the following, but unsure how to include the delimiters with it.
void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{
int startpos = 0;
int pos = str.find_first_of(delimiters, startpos);
string strTemp;
while (string::npos != pos || string::npos != startpos)
{
strTemp = str.substr(startpos, pos - startpos);
tokens.push_back(strTemp.substr(0, strTemp.length()));
startpos = str.find_first_not_of(delimiters, pos);
pos = str.find_first_of(delimiters, startpos);
}
}

The C++ String Toolkit Library (StrTk) has the following solution:
std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
str,
strtk::range_to_type_back_inserter(token_list),
strtk::include_delimiters);
It should result with token_list have the following elements:
Token0 = "abc,"
Token1 = "123 "
Token2 = "xyz"
More examples can be found Here

I now this a little sloppy, but this is what I ended up with. I did not want to use boost since this is a school assignment and my instructor wanted me to use find_first_of to accomplish this.
Thanks for everyone's help.
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then addt new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}

I can't really follow your code, could you post a working program?
Anyway, this is a simple tokenizer, without testing edge cases:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void tokenize(vector<string>& tokens, const string& text, const string& del)
{
string::size_type startpos = 0,
currentpos = text.find(del, startpos);
do
{
tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
startpos = currentpos + del.size();
currentpos = text.find(del, startpos);
} while(currentpos != string::npos);
tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}
Example input, delimiter = $$:
Hello$$Stack$$Over$$$Flow$$$$!
Tokens:
Hello$$
Stack$$
Over$$
$Flow$$
$$
!
Note: I would never use a tokenizer I wrote without testing! please use boost::tokenizer!

if the delimiters are characters and not strings, then you can use strtok.

It depends on whether you want the preceding delimiters, the following delimiters, or both, and what you want to do with strings at the beginning and end of the string that may not have delimiters before/after them.
I'm going to assume you want each word, with its preceding and following delimiters, but NOT any strings of delimiters by themselves (e.g. if there's a delimiter following the last string).
template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) {
int pos = 0;
do {
int beg_word = str.find_first_not_of(delims, pos);
if (beg_word == std::string::npos)
break;
int end_word = str.find_first_of(delims, beg_word);
int beg_next_word = str.find_first_not_of(delims, end_word);
*out++ = std::string(str, pos, beg_next_word-pos);
pos = end_word;
} while (pos != std::string::npos);
}
For the moment, I've written it more like an STL algorithm, taking an iterator for its output instead of assuming it's always pushing onto a collection. Since it depends (for the moment) in the input being a string, it doesn't use iterators for the input.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is There A Built-In Way to Split Strings In C++? - c++

well is there? by string i mean std::string

There's no built-in way to split a string in C++, but boost provides the string algo library to do all sort of string manipulation, including string splitting.

Yup, stringstream. std::istringstream oss(std::string("This is a test string")); std::string word; while(oss >> word) { std::cout << "[" << word << "] "; }

void split(string StringToSplit, string Separators) { size_t EndPart1 = StringToSplit.find_first_of(Separators) string Part1 = StringToSplit.substr(0, EndPart1); string Part2 = StringToSplit.substr(EndPart1 + 1); }

There is no common way doing this. I prefer the boost::tokenizer, its header only and easy to use.

A fairly simple method would be to use the c_str() method of std::string to get a C-style character array, then use strtok() to tokenize the string. Not quite as eloquent as some of the other solutions listed here, but it's easy and works.

Related

Tokenize returns a vector that contains the delimiters - C++

String Tokenizer with multiple delimiters including delimiter without Boost

C++ function to replace in a string all occurrences of a given substring

Extract substrings of a filename

Tokenize a string and include delimiters in C++

Categories

Resources