finding substring c++ [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How to split a string?
Hi,
I have a string say "1,0,1", how can i get the substring separated by comma operator.

C++ doesn't have a built in function for doing exactly this. However, it can be implemented using either the std::string::find_first_of member function, or the non-member std::find.
Here's an example using the latter:
#include <string>
#include <vector>
#include <algorithm>
// given a string str, split it on every occurrence of the character delim
std::vector<std::string> tokenize(std::string str, char delim) {
// store the results in a vector of strings
std::vector<std::string> tokens;
std::string::iterator end = str.end();
std::string::iterator left = str.begin();
for (;;) {
// find the next occurrence of the delimiter
std::string::iterator right = std::find(left, end, delim);
// create a string from the end of last one up until the one we just foun
tokens.push_back(std::string(left, right));
// if we reached the end of the string, exit the loop
if (right == end) { break; }
// otherwise, start the next iteration just past the delimiter we just found
left = right + 1;
}
return tokens;
}
// test program
int main() {
std::string str = "foo, bar, baz";
std::string str2 = "foo, bar, baz,";
std::string str3 = "foo";
std::string str4 = "";
std::string str5 = ",";
std::vector<std::string> tokens = tokenize(str, ',');
std::vector<std::string> tokens2 = tokenize(str2, ',');
std::vector<std::string> tokens3 = tokenize(str3, ',');
std::vector<std::string> tokens4 = tokenize(str4, ',');
std::vector<std::string> tokens5 = tokenize(str5, ',');
}
Of course there are a lot of border cases to handle, and this implementation might not do exactly what you want, but it should give you a starting point.

another way of doing this is by using strtok. This is a old c way but it still applies to the problem.
using <vector>
using <string>
char* token, line[512];
std::string tokenStr;
std::string lineStr = "0, 1, 2";
std::vector<std::string> commaSplit;
strcpy ( line, lineStr.c_str());
//Remove spaces and find the first instance of ','
token = strtok( line, " ," );
while(token != NULL)
{
//Copy the token to a string
tokenStr = token;
//Add the token to the vector
commaSplit.push_back(token);
//Find next instance of the ,
token = strtok(NULL, " ,");
}

Search google for an algorithm to explode or tokenize your string. It's trivial.
You can also check out the documentation and use available tools : http://www.cplusplus.com/reference/string/string/
A simple implementation could be :
void tokenize(const string & text, vector<string> & tokens, char delim)
{
size_t length = text.size();
string token = "";
for(size_t i=0;i<length;i++)
{
if(text[i] != delim)
{
token += text[i];
}
else
{
if(token.size() > 0)
{
tokens.push_back(token);
}
token = "";
}
}
tokens.push_back(token);
}

Related

How to break up a string into a vector fast?

I am processing CSV and using the following code to process a single line.
play with code
std::vector<std::string> string_to_vector(const std::string& s, const char delimiter, const char escape) {
std::stringstream sstr{s};
std::vector<std::string> result;
while (sstr.good()) {
std::string substr;
getline(sstr, substr, delimiter);
while (substr.back() == escape) {
std::string tmp;
getline(sstr, tmp, delimiter);
substr += "," + tmp;
}
result.emplace_back(substr);
}
return result;
}
What it does: Function breaks up string s based on delimiter. If the delimiter is escaped with escape the delimiter will be ignored.
This code works but is super slow. How can I speed it up?
Do you know any existing csv processing implementation that does exactly this and which I could use?
The fastest way to do something is to not do it at all.
If you can ensure that your source string s will outlive the use of the returned vector, you could replace your std::vector<std::string> with std::vector<char*> which would point to the beginning of each substring. You then replace your identified delimiters with zeroes.
[EDIT] I have not moved up to C++17, so no string_view for me :)
NOTE: typical CSV is different from what you imply; it doesn't use escape for the comma, but surrounds entries with comma in it with double quotes. But I assume you know your data.
Implementation:
#include <iostream>
#include <vector>
#include <string>
std::vector<char*> string_to_vector(std::string& s,
const char delimiter, const char escape)
{
size_t prev(0), pos(0), from(0);
std::vector<char*> v;
while ((pos = s.find(delimiter, from)) != s.npos)
{
if (pos == 0 || s[pos - 1] != escape)
{
s[pos] = 0;
v.push_back(&s[prev]);
prev = pos + 1;
}
from = pos + 1;
}
v.push_back(&s[prev]);
return v;
}
int main() {
std::string test("this,is,a\\,test");
std::vector<char*> v = string_to_vector(test, ',', '\\');
for (auto& s : v)
std::cout << s << " ";
}

Getting the substring of string with multiple instances of the same delimiter [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 4 years ago.
I have a string of data that looks like
string line = "Number:Description:Price:Weight";
I want to separate the string into 4 different variables via the delimiter ":". I was trying this via the substring method
char delimiter = ':';
string number = line.substr(0, line.find(delimiter));
It works fine for the first variable. However, I am unable to figure out how to iterate to the next instance of the ":" for the other variables.
std::string::find() takes a starting index as an optional parameter:
string line = "Number:Description:Price:Weight";
string::size_type start, end;
char delimiter = ':';
end = line.find(delimiter);
string number = line.substr(0, end);
start = end + 1;
end = line.find(delimiter, start);
string desc = line.substr(start, end-start);
start = end + 1;
end = line.find(delimiter, start);
string price = line.substr(start, end-start);
string weight = line.substr(end + 1);
Alternatively, you can split the string on delimiters by using std::istringstream with std::getline():
string line = "Number:Description:Price:Weight";
string number, desc, price, weight;
char delimiter = ':';
istringstream iss(line);
getline(iss, number, delimiter);
getline(iss, desc, delimiter);
getline(iss, price, delimiter);
getline(iss, weight);
It is much easier with a simple for loop:
#include <iostream>
#include <vector>
int main()
{
std::string line = "Number:Description:Price:Weight";
std::vector<std::string> vecStrings;
std::string word;
size_t count = 0;
for (auto const& it : line)
{
if (it != ':') word += it;
if ((it == ':') || (count + 1 == line.size()))
{
vecStrings.emplace_back(word);
word.clear();
}
++count;
}
for(const auto& it: vecStrings)
std::cout << it << "\n";
return 0;
}
See live action: https://www.ideone.com/DiAvjO
Update: If you need something different, you can do the same thing with std::for_each() and a lambda:
#include <algorithm>
std::for_each(std::begin(line), std::end(line), [&](char &letter)
{
if (letter != ':') word += letter;
if ((letter == ':') || (count + 1 == line.size()))
{
vecStrings.emplace_back(word);
word.clear();
}
++count;
});
However, a much simpler solution to this would be to use std::istringstream. Thanks to RemyLebeau for pointing out this:
#include <sstream>
std::stringstream sstr(line);
std::string word;
while (std::getline(sstr, word, ':'))
{
vecStrings.emplace_back(word);
}
you can make use of strtok like below
#include <string.h>
int main()
{
char line[] = "Number:Description:Price:Weight";
char * token = std::strtok (line,":");
while (token != NULL)
{
cout << token << '\n';
token = std::strtok(NULL, ":");
}
return 0;
}
Make necessary changes as per your need
cstring has strtok that does exactly what you want.

Split strings into tokens with delimiter (/ and -) in c++ [duplicate]

This question already has answers here:
Right way to split an std::string into a vector<string>
(12 answers)
Closed 11 months ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Original close reason(s) were not resolved
I have some text (meaningful text or arithmetical expression) and I want to split it into words.
If I had a single delimiter, I'd use:
std::stringstream stringStream(inputString);
std::string word;
while(std::getline(stringStream, word, delimiter))
{
wordVector.push_back(word);
}
How can I break the string into tokens with several delimiters?
Assuming one of the delimiters is newline, the following reads the line and further splits it by the delimiters. For this example I've chosen the delimiters space, apostrophe, and semi-colon.
std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line))
{
std::size_t prev = 0, pos;
while ((pos = line.find_first_of(" ';", prev)) != std::string::npos)
{
if (pos > prev)
wordVector.push_back(line.substr(prev, pos-prev));
prev = pos+1;
}
if (prev < line.length())
wordVector.push_back(line.substr(prev, std::string::npos));
}
If you have boost, you could use:
#include <boost/algorithm/string.hpp>
std::string inputString("One!Two,Three:Four");
std::string delimiters("|,:");
std::vector<std::string> parts;
boost::split(parts, inputString, boost::is_any_of(delimiters));
Using std::regex
A std::regex can do string splitting in a few lines:
std::regex re("[\\|,:]");
std::sregex_token_iterator first{input.begin(), input.end(), re, -1}, last;//the '-1' is what makes the regex split (-1 := what was not matched)
std::vector<std::string> tokens{first, last};
Try it yourself
I don't know why nobody pointed out the manual way, but here it is:
const std::string delims(";,:. \n\t");
inline bool isDelim(char c) {
for (int i = 0; i < delims.size(); ++i)
if (delims[i] == c)
return true;
return false;
}
and in function:
std::stringstream stringStream(inputString);
std::string word; char c;
while (stringStream) {
word.clear();
// Read word
while (!isDelim((c = stringStream.get())))
word.push_back(c);
if (c != EOF)
stringStream.unget();
wordVector.push_back(word);
// Read delims
while (isDelim((c = stringStream.get())));
if (c != EOF)
stringStream.unget();
}
This way you can do something useful with the delims if you want.
And here, ages later, a solution using C++20:
constexpr std::string_view words{"Hello-_-C++-_-20-_-!"};
constexpr std::string_view delimeters{"-_-"};
for (const std::string_view word : std::views::split(words, delimeters)) {
std::cout << std::quoted(word) << ' ';
}
// outputs: Hello C++ 20!
Required headers:
#include <ranges>
#include <string_view>
Reference: https://en.cppreference.com/w/cpp/ranges/split_view
If you interesting in how to do it yourself and not using boost.
Assuming the delimiter string may be very long - let say M, checking for every char in your string if it is a delimiter, would cost O(M) each, so doing so in a loop for all chars in your original string, let say in length N, is O(M*N).
I would use a dictionary (like a map - "delimiter" to "booleans" - but here I would use a simple boolean array that has true in index = ascii value for each delimiter).
Now iterating on the string and check if the char is a delimiter is O(1), which eventually gives us O(N) overall.
Here is my sample code:
const int dictSize = 256;
vector<string> tokenizeMyString(const string &s, const string &del)
{
static bool dict[dictSize] = { false};
vector<string> res;
for (int i = 0; i < del.size(); ++i) {
dict[del[i]] = true;
}
string token("");
for (auto &i : s) {
if (dict[i]) {
if (!token.empty()) {
res.push_back(token);
token.clear();
}
}
else {
token += i;
}
}
if (!token.empty()) {
res.push_back(token);
}
return res;
}
int main()
{
string delString = "MyDog:Odie, MyCat:Garfield MyNumber:1001001";
//the delimiters are " " (space) and "," (comma)
vector<string> res = tokenizeMyString(delString, " ,");
for (auto &i : res) {
cout << "token: " << i << endl;
}
return 0;
}
Note: tokenizeMyString returns vector by value and create it on the stack first, so we're using here the power of the compiler >>> RVO - return value optimization :)
Using Eric Niebler's range-v3 library:
https://godbolt.org/z/ZnxfSa
#include <string>
#include <iostream>
#include "range/v3/all.hpp"
int main()
{
std::string s = "user1:192.168.0.1|user2:192.168.0.2|user3:192.168.0.3";
auto words = s
| ranges::view::split('|')
| ranges::view::transform([](auto w){
return w | ranges::view::split(':');
});
ranges::for_each(words, [](auto i){ std::cout << i << "\n"; });
}

Program gets "Expression: string subscript out of range"

#include <iostream>
#include <string>
using namespace std;
string Latin(string words)
{
string strWord, strSentence = "";
int length = 0, index = 0;
while (words[index] != '\0')
{
if(words.find(' ', index) != -1)
{
length = words.find(' ', index);
length -= index;
strWord = words.substr(index,length);
strWord.insert(length, "ay");
strWord.insert(length, 1, words[index]);
strWord.erase(0,1);
index += length +1;
}
else
{
strWord = words.substr(index);
length = strWord.length();
strWord.insert(length, "ay");
strWord.insert(length,1,words[index]);
strWord.erase(0,1);
index = words.length();
}
strSentence += (strWord + " ");
}
return strSentence;
}
int main()
{
string str;
getline(cin,str);
str = Latin(str);
cout<<str<<endl;
return 0;
}
I get this error that says
I have no clue what to do. As I am new to this, this is a program that is suppose to ask for user input of a length of words and translate them into pig Latin. Any help would be greatly appreciated.
Unless I really wanted to make my own life difficult, I'd do this quite a bit differently. First, I'd use a std::stringstream to break the input string into words to process. Then, I'd use std::rotate to move the first character of the string to the end. Finally, I'd wrap that all in std::transform to manage applying the function to each word in succession.
std::string line;
std::getline(std::cin, line);
std::stringstream buffer(line);
std::stringstream result;
std::transform(std::istream_iterator<std::string>(buffer),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(result, " "),
[](std::string s) {
std::rotate(s.begin(), s.begin() + 1, s.end());
s += "ay";
return s;
});
Of course, this doesn't know the special rules for things like words that start with vowels or letter pairs like sh or ch, but it looks like that's outside the scope of the task at hand.
For more on std::rotate, I recommend watching some of Sean Parent's videos.

Splitting a string

I have this code to split a string. For some reason, it just sits there doing nothing. I am not sure what the problem is. By the way, delim = ' ' here.
vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;
while(beg != str.end())
{
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
beg = temp;
}
return tokens;
}
Here is another nice and short Boost-based version that uses a whole string as delimiter:
std::vector<std::string> result;
boost::iter_split(result, str, boost::first_finder(delim));
Or case-insensitive:
std::vector<std::string> result;
boost::iter_split(result, str,
boost::first_finder(delim, boost::is_iequal()));
I could debug it for you, I guess but that won't help you in the long run. Here's what you do.
After every line, put a printf() or cout staement dumping the changed variables to standard output. Then run your code, passing a simple set of parameters to it:
vector<string> x = split ("Hello there, Bob.", ' ');
Then, examine the output to see why your implementation isn't working. You'll probably have to break out of the code since, if it's just sitting there, you've probably got yourself one of those new-fangled infinite loops.
Give a man a fish and he'll eat for a day, teach a man to fish, he'll never be hungry again.
Or the Terry Pratchett version:
Give a man some fire and he'll be warm for a day, set a man on fire, he'll be warm for the rest of his life.
Update:
Since you've stated that you've actually done what I suggested, here's what I found out from doing it. It's evident that when you set beg to temp at the end of the while loop, it's pointing at the space. That was discovered by printing the beg string at the top of the while loop - it never changed after the first word was extracted.
Then, when you do the next find, it finds that exact same space rather than first skipping spaces then calling find properly. You need to skip the spaces after each find, making sure you don't iterate beyond the end of the string.
This is my solution. Use it as you wish.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
vector<string> split( const string &str, const char &delim ) {
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;
while(beg != str.end()) {
//cout << ":" << beg._Myptr << ":" << endl;
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
beg = temp;
while ((beg != str.end()) && (*beg == delim))
beg++;
}
return tokens;
}
int main () {
vector<string> x = split ("Hello, my name is Bob. ", ' ');
return 0;
}
Without that space-skipping code at the end of the while loop, the output was:
:Hello, my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
and so on, ad infinitum. With the skipping code, you get:
:Hello, my name is Bob. :
:my name is Bob. :
:name is Bob. :
:is Bob. :
:Bob. :
I've got to love Boost, as it's providing a handy solution to this one as well:
std::vector<std::string> Split(const std::string &s, const std::string &d)
{
std::vector<std::string> v;
for (boost::split_iterator<std::string::iterator> i = boost::make_split_iterator(s, boost::first_finder(d, boost::is_iequal()));
i != boost::split_iterator<std::string::iterator>();
++i) {
v.push_back(boost::copy_range<std::string>(*i));
}
return v;
}
There is a problem in your while loop in that if the delimiter is found then temp will point to the first delimiter after the first find call.
At the end of the while loop you set beg to the value of temp.
Now beg also points to the first delimiter.
When find is next called it will return the current value of beg again as it does point to a delimiter.
temp hasn't moved on from it's previous value so you are in an infinite loop.
Maybe this one:
std::vector<std::string> &mysplit(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;
iter beg = str.begin();
vector<string> tokens;
while(beg != str.end())
{
iter temp = find(beg, str.end(), delim);
if(beg != str.end())
tokens.push_back(string(beg, temp));
if(temp != str.end())
temp++;
beg = temp;
}
return tokens;
}
find() will return the position X of the next token. When you then assign this to beg and go into the next iteration, it will start searching at position X again - and again - and again ... i.e. you are stuck in an endless loop.
Try this code:
vector<string> split( const string &str, const char &delim )
{
typedef string::const_iterator iter;
vector<string> tokens;
iter pos = str.begin(), last = str.begin();
while(pos != str.end()) {
last = pos;
pos = find(pos, str.end(), delim);
if (pos != str.end()) {
string token = string(last, pos);
if (token.length() > 0)
tokens.push_back(token);
last = ++pos;
}
}
string lastToken = string(last, pos);
if (lastToken.length() > 0)
tokens.push_back(lastToken);
return tokens;
}
This has the added benefit that it will include the last token in the list (e.g. when splitting on space, the string "a b c" will now return tokens a, b and c instead of only a and b) and that multiple delims will not lead to empty tokens.
You don't have to reinvent the wheel, boost provides a string splitting function for you.
Example code:
string stringtobesplit = "AA/BB-CC")
vector<string> tokens;
boost::split(tokens, stringtobesplit, boost::is_any_of("/-"));
// tokens now holds 3 items: AA BB CC
The simplest way to debug this code, is to print all the positions beg will be. If beg doesn't increase then that's your problem.
Apart from the beg needing to be incremented with the size of the delimiter, one special case is missed: the case where no delimiters are in the string.