Tokenize a string and include delimiters in C++ - c++

I'm tokening with the following, but unsure how to include the delimiters with it.
void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{
int startpos = 0;
int pos = str.find_first_of(delimiters, startpos);
string strTemp;
while (string::npos != pos || string::npos != startpos)
{
strTemp = str.substr(startpos, pos - startpos);
tokens.push_back(strTemp.substr(0, strTemp.length()));
startpos = str.find_first_not_of(delimiters, pos);
pos = str.find_first_of(delimiters, startpos);
}
}

The C++ String Toolkit Library (StrTk) has the following solution:
std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
str,
strtk::range_to_type_back_inserter(token_list),
strtk::include_delimiters);
It should result with token_list have the following elements:
Token0 = "abc,"
Token1 = "123 "
Token2 = "xyz"
More examples can be found Here

I now this a little sloppy, but this is what I ended up with. I did not want to use boost since this is a school assignment and my instructor wanted me to use find_first_of to accomplish this.
Thanks for everyone's help.
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then addt new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}

I can't really follow your code, could you post a working program?
Anyway, this is a simple tokenizer, without testing edge cases:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void tokenize(vector<string>& tokens, const string& text, const string& del)
{
string::size_type startpos = 0,
currentpos = text.find(del, startpos);
do
{
tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
startpos = currentpos + del.size();
currentpos = text.find(del, startpos);
} while(currentpos != string::npos);
tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}
Example input, delimiter = $$:
Hello$$Stack$$Over$$$Flow$$$$!
Tokens:
Hello$$
Stack$$
Over$$
$Flow$$
$$
!
Note: I would never use a tokenizer I wrote without testing! please use boost::tokenizer!

if the delimiters are characters and not strings, then you can use strtok.

It depends on whether you want the preceding delimiters, the following delimiters, or both, and what you want to do with strings at the beginning and end of the string that may not have delimiters before/after them.
I'm going to assume you want each word, with its preceding and following delimiters, but NOT any strings of delimiters by themselves (e.g. if there's a delimiter following the last string).
template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) {
int pos = 0;
do {
int beg_word = str.find_first_not_of(delims, pos);
if (beg_word == std::string::npos)
break;
int end_word = str.find_first_of(delims, beg_word);
int beg_next_word = str.find_first_not_of(delims, end_word);
*out++ = std::string(str, pos, beg_next_word-pos);
pos = end_word;
} while (pos != std::string::npos);
}
For the moment, I've written it more like an STL algorithm, taking an iterator for its output instead of assuming it's always pushing onto a collection. Since it depends (for the moment) in the input being a string, it doesn't use iterators for the input.

Related

Replace a string in a vector of structs C++ [duplicate]

How do I replace part of a string with another string using the standard C++ libraries?
QString s("hello $name"); // Example using Qt.
s.replace("$name", "Somename");
There's a function to find a substring within a string (find), and a function to replace a particular range in a string with another string (replace), so you can combine those to get the effect you want:
bool replace(std::string& str, const std::string& from, const std::string& to) {
size_t start_pos = str.find(from);
if(start_pos == std::string::npos)
return false;
str.replace(start_pos, from.length(), to);
return true;
}
std::string string("hello $name");
replace(string, "$name", "Somename");
In response to a comment, I think replaceAll would probably look something like this:
void replaceAll(std::string& str, const std::string& from, const std::string& to) {
if(from.empty())
return;
size_t start_pos = 0;
while((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length(); // In case 'to' contains 'from', like replacing 'x' with 'yx'
}
}
With C++11 you can use std::regex like so:
#include <regex>
...
std::string string("hello $name");
string = std::regex_replace(string, std::regex("\\$name"), "Somename");
The double backslash is required for escaping an escape character.
Using std::string::replace:
s.replace(s.find("$name"), sizeof("$name") - 1, "Somename");
To have the new string returned use this:
std::string ReplaceString(std::string subject, const std::string& search,
const std::string& replace) {
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
return subject;
}
If you need performance, here is an optimized function that modifies the input string, it does not create a copy of the string:
void ReplaceStringInPlace(std::string& subject, const std::string& search,
const std::string& replace) {
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
}
Tests:
std::string input = "abc abc def";
std::cout << "Input string: " << input << std::endl;
std::cout << "ReplaceString() return value: "
<< ReplaceString(input, "bc", "!!") << std::endl;
std::cout << "ReplaceString() input string not modified: "
<< input << std::endl;
ReplaceStringInPlace(input, "bc", "??");
std::cout << "ReplaceStringInPlace() input string modified: "
<< input << std::endl;
Output:
Input string: abc abc def
ReplaceString() return value: a!! a!! def
ReplaceString() input string not modified: abc abc def
ReplaceStringInPlace() input string modified: a?? a?? def
string.replace(string.find("%s"), string("%s").size(), "Something");
You could wrap this in a function but this one-line solution sounds acceptable.
The problem is that this will change the first occurence only, you might want to loop over it, but it also allows you to insert several variables into this string with the same token (%s).
Yes, you can do it, but you have to find the position of the first string with string's find() member, and then replace with it's replace() member.
string s("hello $name");
size_type pos = s.find( "$name" );
if ( pos != string::npos ) {
s.replace( pos, 5, "somename" ); // 5 = length( $name )
}
If you are planning on using the Standard Library, you should really get hold of a copy of the book The C++ Standard Library which covers all this stuff very well.
I use generally this:
std::string& replace(std::string& s, const std::string& from, const std::string& to)
{
if(!from.empty())
for(size_t pos = 0; (pos = s.find(from, pos)) != std::string::npos; pos += to.size())
s.replace(pos, from.size(), to);
return s;
}
It repeatedly calls std::string::find() to locate other occurrences of the searched for string until std::string::find() doesn't find anything. Because std::string::find() returns the position of the match we don't have the problem of invalidating iterators.
If all strings are std::string, you'll find strange problems with the cutoff of characters if using sizeof() because it's meant for C strings, not C++ strings. The fix is to use the .size() class method of std::string.
sHaystack.replace(sHaystack.find(sNeedle), sNeedle.size(), sReplace);
That replaces sHaystack inline -- no need to do an = assignment back on that.
Example usage:
std::string sHaystack = "This is %XXX% test.";
std::string sNeedle = "%XXX%";
std::string sReplace = "my special";
sHaystack.replace(sHaystack.find(sNeedle),sNeedle.size(),sReplace);
std::cout << sHaystack << std::endl;
This could be even better to use
void replace(string& input, const string& from, const string& to)
{
auto pos = 0;
while(true)
{
size_t startPosition = input.find(from, pos);
if(startPosition == string::npos)
return;
input.replace(startPosition, from.length(), to);
pos += to.length();
}
}
wstring myString = L"Hello $$ this is an example. By $$.";
wstring search = L"$$";
wstring replace = L"Tom";
for (int i = myString.find(search); i >= 0; i = myString.find(search))
myString.replace(i, search.size(), replace);
If you want to do it quickly you can use a two scan approach.
Pseudo code:
first parse. find how many matching chars.
expand the length of the string.
second parse. Start from the end of the string when we get a match we replace, else we just copy the chars from the first string.
I am not sure if this can be optimized to an in-place algo.
And a C++11 code example but I only search for one char.
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
void ReplaceString(string& subject, char search, const string& replace)
{
size_t initSize = subject.size();
int count = 0;
for (auto c : subject) {
if (c == search) ++count;
}
size_t idx = subject.size()-1 + count * replace.size()-1;
subject.resize(idx + 1, '\0');
string reverseReplace{ replace };
reverse(reverseReplace.begin(), reverseReplace.end());
char *end_ptr = &subject[initSize - 1];
while (end_ptr >= &subject[0])
{
if (*end_ptr == search) {
for (auto c : reverseReplace) {
subject[idx - 1] = c;
--idx;
}
}
else {
subject[idx - 1] = *end_ptr;
--idx;
}
--end_ptr;
}
}
int main()
{
string s{ "Mr John Smith" };
ReplaceString(s, ' ', "%20");
cout << s << "\n";
}
What about the boost solution:
boost::replace_all(value, "token1", "token2");
std::string replace(std::string base, const std::string from, const std::string to) {
std::string SecureCopy = base;
for (size_t start_pos = SecureCopy.find(from); start_pos != std::string::npos; start_pos = SecureCopy.find(from,start_pos))
{
SecureCopy.replace(start_pos, from.length(), to);
}
return SecureCopy;
}
My own implementation, taking into account that string needs to be resized only once, then replace can happen.
template <typename T>
std::basic_string<T> replaceAll(const std::basic_string<T>& s, const T* from, const T* to)
{
auto length = std::char_traits<T>::length;
size_t toLen = length(to), fromLen = length(from), delta = toLen - fromLen;
bool pass = false;
std::string ns = s;
size_t newLen = ns.length();
for (bool estimate : { true, false })
{
size_t pos = 0;
for (; (pos = ns.find(from, pos)) != std::string::npos; pos++)
{
if (estimate)
{
newLen += delta;
pos += fromLen;
}
else
{
ns.replace(pos, fromLen, to);
pos += delta;
}
}
if (estimate)
ns.resize(newLen);
}
return ns;
}
Usage could be for example like this:
std::string dirSuite = replaceAll(replaceAll(relPath.parent_path().u8string(), "\\", "/"), ":", "");
I'm just now learning C++, but editing some of the code previously posted, I'd probably use something like this. This gives you the flexibility to replace 1 or multiple instances, and also lets you specify the start point.
using namespace std;
// returns number of replacements made in string
long strReplace(string& str, const string& from, const string& to, size_t start = 0, long count = -1) {
if (from.empty()) return 0;
size_t startpos = str.find(from, start);
long replaceCount = 0;
while (startpos != string::npos){
str.replace(startpos, from.length(), to);
startpos += to.length();
replaceCount++;
if (count > 0 && replaceCount >= count) break;
startpos = str.find(from, startpos);
}
return replaceCount;
}
Here is a one liner that uses c++'s standard library.
The replacement better not have the old string in it (ex: replacing , with ,,), otherwise you have an INFINITE LOOP. Moreso, it is slow for large strings compared to other techniques because the find operations start at the begining of the string call every time. Look for better solutions if you're not too lazy. I put this in for completeness and inspiration for others. You've been warned.
while(s.find(old_s) != string::npos) s.replace(s.find(old_s), old_s.size(), new_s);
And a lambda option
auto replaceAll = [](string& s, string o, string n){ while(s.find(o) != string::npos) s.replace(s.find(o), o.size(), n); };
// EXAMPLES:
// Used like
string text = "hello hello world";
replaceAll(text, "hello", "bye"); // Changes text to "bye bye world"
// Do NOT use like
string text = "hello hello world";
replaceAll(text, "hello", "hello hello"); // Loops forever
You can use this code for remove subtring and also replace , and also remove extra white space .
code :
#include<bits/stdc++.h>
using namespace std;
void removeSpaces(string &str)
{
int n = str.length();
int i = 0, j = -1;
bool spaceFound = false;
while (++j <= n && str[j] == ' ');
while (j <= n)
{
if (str[j] != ' ')
{
if ((str[j] == '.' || str[j] == ',' ||
str[j] == '?') && i - 1 >= 0 &&
str[i - 1] == ' ')
str[i - 1] = str[j++];
else str[i++] = str[j++];
spaceFound = false;
}
else if (str[j++] == ' ')
{
if (!spaceFound)
{
str[i++] = ' ';
spaceFound = true;
}
}
}
if (i <= 1)
str.erase(str.begin() + i, str.end());
else str.erase(str.begin() + i - 1, str.end());
}
int main()
{
string s;
cin >> s;
for(int i = s.find("WUB"); i >= 0; i = s.find("WUB"))
s.replace(i,3," ");
removeSpaces(s);
cout << s << endl;
return 0;
}

C++ String.Replace - no overloaded function takes 2 arguments? [duplicate]

How do I replace part of a string with another string using the standard C++ libraries?
QString s("hello $name"); // Example using Qt.
s.replace("$name", "Somename");
There's a function to find a substring within a string (find), and a function to replace a particular range in a string with another string (replace), so you can combine those to get the effect you want:
bool replace(std::string& str, const std::string& from, const std::string& to) {
size_t start_pos = str.find(from);
if(start_pos == std::string::npos)
return false;
str.replace(start_pos, from.length(), to);
return true;
}
std::string string("hello $name");
replace(string, "$name", "Somename");
In response to a comment, I think replaceAll would probably look something like this:
void replaceAll(std::string& str, const std::string& from, const std::string& to) {
if(from.empty())
return;
size_t start_pos = 0;
while((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length(); // In case 'to' contains 'from', like replacing 'x' with 'yx'
}
}
With C++11 you can use std::regex like so:
#include <regex>
...
std::string string("hello $name");
string = std::regex_replace(string, std::regex("\\$name"), "Somename");
The double backslash is required for escaping an escape character.
Using std::string::replace:
s.replace(s.find("$name"), sizeof("$name") - 1, "Somename");
To have the new string returned use this:
std::string ReplaceString(std::string subject, const std::string& search,
const std::string& replace) {
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
return subject;
}
If you need performance, here is an optimized function that modifies the input string, it does not create a copy of the string:
void ReplaceStringInPlace(std::string& subject, const std::string& search,
const std::string& replace) {
size_t pos = 0;
while ((pos = subject.find(search, pos)) != std::string::npos) {
subject.replace(pos, search.length(), replace);
pos += replace.length();
}
}
Tests:
std::string input = "abc abc def";
std::cout << "Input string: " << input << std::endl;
std::cout << "ReplaceString() return value: "
<< ReplaceString(input, "bc", "!!") << std::endl;
std::cout << "ReplaceString() input string not modified: "
<< input << std::endl;
ReplaceStringInPlace(input, "bc", "??");
std::cout << "ReplaceStringInPlace() input string modified: "
<< input << std::endl;
Output:
Input string: abc abc def
ReplaceString() return value: a!! a!! def
ReplaceString() input string not modified: abc abc def
ReplaceStringInPlace() input string modified: a?? a?? def
string.replace(string.find("%s"), string("%s").size(), "Something");
You could wrap this in a function but this one-line solution sounds acceptable.
The problem is that this will change the first occurence only, you might want to loop over it, but it also allows you to insert several variables into this string with the same token (%s).
Yes, you can do it, but you have to find the position of the first string with string's find() member, and then replace with it's replace() member.
string s("hello $name");
size_type pos = s.find( "$name" );
if ( pos != string::npos ) {
s.replace( pos, 5, "somename" ); // 5 = length( $name )
}
If you are planning on using the Standard Library, you should really get hold of a copy of the book The C++ Standard Library which covers all this stuff very well.
I use generally this:
std::string& replace(std::string& s, const std::string& from, const std::string& to)
{
if(!from.empty())
for(size_t pos = 0; (pos = s.find(from, pos)) != std::string::npos; pos += to.size())
s.replace(pos, from.size(), to);
return s;
}
It repeatedly calls std::string::find() to locate other occurrences of the searched for string until std::string::find() doesn't find anything. Because std::string::find() returns the position of the match we don't have the problem of invalidating iterators.
If all strings are std::string, you'll find strange problems with the cutoff of characters if using sizeof() because it's meant for C strings, not C++ strings. The fix is to use the .size() class method of std::string.
sHaystack.replace(sHaystack.find(sNeedle), sNeedle.size(), sReplace);
That replaces sHaystack inline -- no need to do an = assignment back on that.
Example usage:
std::string sHaystack = "This is %XXX% test.";
std::string sNeedle = "%XXX%";
std::string sReplace = "my special";
sHaystack.replace(sHaystack.find(sNeedle),sNeedle.size(),sReplace);
std::cout << sHaystack << std::endl;
This could be even better to use
void replace(string& input, const string& from, const string& to)
{
auto pos = 0;
while(true)
{
size_t startPosition = input.find(from, pos);
if(startPosition == string::npos)
return;
input.replace(startPosition, from.length(), to);
pos += to.length();
}
}
wstring myString = L"Hello $$ this is an example. By $$.";
wstring search = L"$$";
wstring replace = L"Tom";
for (int i = myString.find(search); i >= 0; i = myString.find(search))
myString.replace(i, search.size(), replace);
If you want to do it quickly you can use a two scan approach.
Pseudo code:
first parse. find how many matching chars.
expand the length of the string.
second parse. Start from the end of the string when we get a match we replace, else we just copy the chars from the first string.
I am not sure if this can be optimized to an in-place algo.
And a C++11 code example but I only search for one char.
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
void ReplaceString(string& subject, char search, const string& replace)
{
size_t initSize = subject.size();
int count = 0;
for (auto c : subject) {
if (c == search) ++count;
}
size_t idx = subject.size()-1 + count * replace.size()-1;
subject.resize(idx + 1, '\0');
string reverseReplace{ replace };
reverse(reverseReplace.begin(), reverseReplace.end());
char *end_ptr = &subject[initSize - 1];
while (end_ptr >= &subject[0])
{
if (*end_ptr == search) {
for (auto c : reverseReplace) {
subject[idx - 1] = c;
--idx;
}
}
else {
subject[idx - 1] = *end_ptr;
--idx;
}
--end_ptr;
}
}
int main()
{
string s{ "Mr John Smith" };
ReplaceString(s, ' ', "%20");
cout << s << "\n";
}
What about the boost solution:
boost::replace_all(value, "token1", "token2");
std::string replace(std::string base, const std::string from, const std::string to) {
std::string SecureCopy = base;
for (size_t start_pos = SecureCopy.find(from); start_pos != std::string::npos; start_pos = SecureCopy.find(from,start_pos))
{
SecureCopy.replace(start_pos, from.length(), to);
}
return SecureCopy;
}
My own implementation, taking into account that string needs to be resized only once, then replace can happen.
template <typename T>
std::basic_string<T> replaceAll(const std::basic_string<T>& s, const T* from, const T* to)
{
auto length = std::char_traits<T>::length;
size_t toLen = length(to), fromLen = length(from), delta = toLen - fromLen;
bool pass = false;
std::string ns = s;
size_t newLen = ns.length();
for (bool estimate : { true, false })
{
size_t pos = 0;
for (; (pos = ns.find(from, pos)) != std::string::npos; pos++)
{
if (estimate)
{
newLen += delta;
pos += fromLen;
}
else
{
ns.replace(pos, fromLen, to);
pos += delta;
}
}
if (estimate)
ns.resize(newLen);
}
return ns;
}
Usage could be for example like this:
std::string dirSuite = replaceAll(replaceAll(relPath.parent_path().u8string(), "\\", "/"), ":", "");
I'm just now learning C++, but editing some of the code previously posted, I'd probably use something like this. This gives you the flexibility to replace 1 or multiple instances, and also lets you specify the start point.
using namespace std;
// returns number of replacements made in string
long strReplace(string& str, const string& from, const string& to, size_t start = 0, long count = -1) {
if (from.empty()) return 0;
size_t startpos = str.find(from, start);
long replaceCount = 0;
while (startpos != string::npos){
str.replace(startpos, from.length(), to);
startpos += to.length();
replaceCount++;
if (count > 0 && replaceCount >= count) break;
startpos = str.find(from, startpos);
}
return replaceCount;
}
Here is a one liner that uses c++'s standard library.
The replacement better not have the old string in it (ex: replacing , with ,,), otherwise you have an INFINITE LOOP. Moreso, it is slow for large strings compared to other techniques because the find operations start at the begining of the string call every time. Look for better solutions if you're not too lazy. I put this in for completeness and inspiration for others. You've been warned.
while(s.find(old_s) != string::npos) s.replace(s.find(old_s), old_s.size(), new_s);
And a lambda option
auto replaceAll = [](string& s, string o, string n){ while(s.find(o) != string::npos) s.replace(s.find(o), o.size(), n); };
// EXAMPLES:
// Used like
string text = "hello hello world";
replaceAll(text, "hello", "bye"); // Changes text to "bye bye world"
// Do NOT use like
string text = "hello hello world";
replaceAll(text, "hello", "hello hello"); // Loops forever
You can use this code for remove subtring and also replace , and also remove extra white space .
code :
#include<bits/stdc++.h>
using namespace std;
void removeSpaces(string &str)
{
int n = str.length();
int i = 0, j = -1;
bool spaceFound = false;
while (++j <= n && str[j] == ' ');
while (j <= n)
{
if (str[j] != ' ')
{
if ((str[j] == '.' || str[j] == ',' ||
str[j] == '?') && i - 1 >= 0 &&
str[i - 1] == ' ')
str[i - 1] = str[j++];
else str[i++] = str[j++];
spaceFound = false;
}
else if (str[j++] == ' ')
{
if (!spaceFound)
{
str[i++] = ' ';
spaceFound = true;
}
}
}
if (i <= 1)
str.erase(str.begin() + i, str.end());
else str.erase(str.begin() + i - 1, str.end());
}
int main()
{
string s;
cin >> s;
for(int i = s.find("WUB"); i >= 0; i = s.find("WUB"))
s.replace(i,3," ");
removeSpaces(s);
cout << s << endl;
return 0;
}

Tokenize returns a vector that contains the delimiters - C++

I have the function that tokenizes a vector string and returns without the delimiters. But, I want to return with delimiters.
Desired output:
tokenize("<ab><>cd<", "<>")
should display: "<", "ab", ">", "<", ">", "cd", "<"
Here's my function:
vector<string> tokenize1(const string& s, const string& delim) {
vector<string> tokens;
string::size_type lastPos = s.find_first_not_of(delim, 0);
string::size_type pos = s.find_first_of(delim, lastPos);
while (string::npos != pos || string::npos != lastPos) {
tokens.push_back(s.substr(lastPos, pos - lastPos));
lastPos = s.find_first_not_of(delim, pos);
pos = s.find_first_of(delim, lastPos);
}
return tokens;
}
I'm not quite sure why you'd want to do this, but you've got almost all the code to do it already. Here's a minor modification that alternates between stuffing individual delimiter characters into the token vector (simpler if you're happy to return delimiter chunks, eg. "><>" instead of "<", ">", "<") and adding whole non-delimiter chunks to it.
std::vector<std::string> tokenize2(const std::string& s, const std::string& delim)
{
std::vector<std::string> tokens;
auto nextDelimiter = s.find_first_of(delim, 0);
auto nextNonDelimiter = s.find_first_not_of(delim, 0);
while (std::string::npos != nextDelimiter || std::string::npos != nextNonDelimiter)
{
if (nextNonDelimiter > nextDelimiter)
{
for (auto d = nextDelimiter; d < nextNonDelimiter && d < s.size(); d++)
tokens.push_back(s.substr(d, 1));
nextDelimiter = s.find_first_of(delim, nextNonDelimiter);
}
else
{
tokens.push_back(s.substr(nextNonDelimiter, nextDelimiter - nextNonDelimiter));
nextNonDelimiter = s.find_first_not_of(delim, nextDelimiter);
}
}
return tokens;
}
Note the use of auto (because we're living in the future now) and std:: (because using namespace std; is considered bad practise, for good reason).

String Tokenizer with multiple delimiters including delimiter without Boost

I need to create string parser in C++. I tried using
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then add new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}
This works for 2X+7cos(3Y)
(tokenizer("2X+7cos(3Y)","+-/^() \t");)
But gives a runtime error for 2X
I need non Boost solution.
I tried using C++ String Toolkit (StrTk) Tokenizer
std::vector<std::string> results;
strtk::split(delimiter, source,
strtk::range_to_type_back_inserter(results),
strtk::tokenize_options::include_all_delimiters);
return results;
but it doesn't give token as a separate string.
eg: if I give the input as 2X+3Y
output vector contains
2X+
3Y
What's probably happening is this is crashing when passed npos:
lastPos = str.find_first_not_of(delimiters, pos);
Just add breaks to your loop instead of relying on the while clause to break out of it.
if (pos == string::npos)
break;
lastPos = str.find_first_not_of(delimiters, pos);
if (lastPos == string::npos)
break;
pos = str.find_first_of(delimiters, lastPos);
Loop exit condition is broken:
while (string::npos != pos || string::npos != startpos)
Allows entry with, say pos = npos and startpos = 1.
So
strOne.substr(startpos, pos - startpos)
strOne.substr(1, npos - 1)
end is not npos, so substr doesn't stop where it should and BOOM!
If pos = npos and startpos = 0,
strOne.substr(startpos, pos - startpos)
lives, but
strOne.substr(pos, 1) == "\n"
strOne.substr(npos, 1) == "\n"
dies. So does
strOne.substr(pos, 1) != " "
Sadly I'm out of time and can't solve this right now, but QuestionC's got the right idea. Better filtering. Something along the lines of:
if (string::npos != pos)
{
if (strOne.substr(pos, 1) == "\n") // can possibly simplify this with strOne[pos] == '\n'
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne[pos] != ' ')
vS.push_back(strOne.substr(pos, 1));
}
Would be great if you could share some info on your environment. Your program ran fine with an input value of 2X on my Fedora 20 using g++.
I created a little function that splits a string into substrings (which are stored in a vector) and it allows you to set which characters you want to treat as whitespace. Normal whitespace will still be treated as whitespace, so you don't have to define that. Actually, all it does is turns the character you defined as whitespace into actual whitespace (space char ' '). Then it runs that in a stream (stringstream) to separate the substrings and store them in a vector. This may not be what you need for this particular problem, but maybe it can give you some ideas.
// split a string into its whitespace-separated substrings and store
// each substring in a vector<string>. Whitespace can be defined in argument
// w as a string (e.g. ".;,?-'")
vector<string> split(const string& s, const string& w)
{
string temp{ s };
// go through each char in temp (or s)
for (char& ch : temp) {
// check if any characters in temp (s) are whitespace defined in w
for (char white : w) {
if (ch == white)
ch = ' '; // if so, replace them with a space char (' ')
}
}
vector<string> substrings;
stringstream ss{ temp };
for (string buffer; ss >> buffer;) {
substrings.push_back(buffer);
}
return substrings;
}

Is There A Built-In Way to Split Strings In C++?

well is there? by string i mean std::string
Here's a perl-style split function I use:
void split(const string& str, const string& delimiters , vector<string>& tokens)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
There's no built-in way to split a string in C++, but boost provides the string algo library to do all sort of string manipulation, including string splitting.
Yup, stringstream.
std::istringstream oss(std::string("This is a test string"));
std::string word;
while(oss >> word) {
std::cout << "[" << word << "] ";
}
STL strings
You can use string iterators to do your dirty work.
std::string str = "hello world";
std::string::const_iterator pos = std::find(string.begin(), string.end(), ' '); // Split at ' '.
std::string left(str.begin(), pos);
std::string right(pos + 1, str.end());
// Echoes "hello|world".
std::cout << left << "|" << right << std::endl;
void split(string StringToSplit, string Separators)
{
size_t EndPart1 = StringToSplit.find_first_of(Separators)
string Part1 = StringToSplit.substr(0, EndPart1);
string Part2 = StringToSplit.substr(EndPart1 + 1);
}
The answer is no. You have to break them up using one of the library functions.
Something I use:
std::vector<std::string> parse(std::string l, char delim)
{
std::replace(l.begin(), l.end(), delim, ' ');
std::istringstream stm(l);
std::vector<std::string> tokens;
for (;;) {
std::string word;
if (!(stm >> word)) break;
tokens.push_back(word);
}
return tokens;
}
You can also take a look at the basic_streambuf<T>::underflow() method and write a filter.
What the heck... Here's my version...
Note: Splitting on ("XZaaaXZ", "XZ") will give you 3 strings. 2 of those strings will be empty, and won't be added to theStringVector if theIncludeEmptyStrings is false.
Delimiter is not any element in the set, but rather matches that exact string.
inline void
StringSplit( vector<string> * theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
UASSERT( theStringVector, !=, (vector<string> *) NULL );
UASSERT( theDelimiter.size(), >, 0 );
size_t start = 0, end = 0, length = 0;
while ( end != string::npos )
{
end = theString.find( theDelimiter, start );
// If at end, use length=maxLength. Else use length=end-start.
length = (end == string::npos) ? string::npos : end - start;
if ( theIncludeEmptyStrings
|| ( ( length > 0 ) /* At end, end == length == string::npos */
&& ( start < theString.size() ) ) )
theStringVector -> push_back( theString.substr( start, length ) );
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size() );
}
}
inline vector<string>
StringSplit( const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
vector<string> v;
StringSplit( & v, theString, theDelimiter, theIncludeEmptyStrings );
return v;
}
There is no common way doing this.
I prefer the boost::tokenizer, its header only and easy to use.
C strings
Simply insert a \0 where you wish to split. This is about as built-in as you can get with standard C functions.
This function splits on the first occurance of a char separator, returning the second string.
char *split_string(char *str, char separator) {
char *second = strchr(str, separator);
if(second == NULL)
return NULL;
*second = '\0';
++second;
return second;
}
A fairly simple method would be to use the c_str() method of std::string to get a C-style character array, then use strtok() to tokenize the string. Not quite as eloquent as some of the other solutions listed here, but it's easy and works.