How to split the string vector array in C++ - c++

How to split the string vector in C++?
The input string values are read from the file, and it's format is like below.
P00 ARRIVAL:3 CPU:1 I/O:3 CPU:7 I/O:1 CPU:3
P01 ARRIVAL:2 CPU:9
P02 ARRIVAL:0 CPU:6
P03 ARRIVAL:4 CPU:1
P04 ARRIVAL:0 CPU:5
However, I need just the value like,
p00 3 1 3 7 1 3
p01 2 9
p02 0 6
...
And this is my part of code. The file is read by a line by line, and each line is saved on string vector array.
vector<string> procList; // file
void readProc(char *filename) {
std::ifstream file(filename);
std::string str;
while (std::getline(file, str))
{
procList.push_back(str);
}
file.close();
for (int i = 0, size = procList.size(); i < size; ++i) {
_tprintf(TEXT("[%s] \n"), procList[i]);
}
}
Thanks.

Using regexps for parsing:
live
#include <regex>
std::vector<std::string> example = {
"P00 ARRIVAL:3 CPU:1 I/O:3 CPU:7 I/O:1 CPU:3",
"P01 ARRIVAL:2 CPU:9",
"P02 ARRIVAL:0 CPU:6",
"P03 ARRIVAL:4 CPU:1",
"P04 ARRIVAL:0 CPU:5"
};
std::regex re{"[\\w\\d]+:(\\d+)"};
for (auto s : example) {
std::cout << "\n";
std::regex_token_iterator<std::string::iterator> it{s.begin(), s.end(), re, 1};
decltype(it) end{};
while (it != end) std::cout << *it++ << ":";
}
will output:
3:1:3:7:1:3:
2:9:
0:6:
4:1:
0:5:
With regex_token_iterator you could also specify to iterate over name and value in your data. You specify in regexp two groups (using parentheses), and then specify in constructor to regex_token_iterator which groups (sub matches) to include during iteration using: {1, 2}.
live
std::regex re{R"(([\w\d/]+):(\d+))"};
for (auto s : example) {
std::cout << "\n";
using reg_itr = std::regex_token_iterator<std::string::iterator>;
for (reg_itr it{s.begin(), s.end(), re, {1,2}}, end{}; it != end;) {
std::cout << *it++ << ":"; std::cout << std::stoi(*it++) << " ";
}
}
will output:
ARRIVAL:3 CPU:1 I/O:3 CPU:7 I/O:1 CPU:3
ARRIVAL:2 CPU:9
ARRIVAL:0 CPU:6
ARRIVAL:4 CPU:1
ARRIVAL:0 CPU:5

Assuming the format is id {key:value}. The following code utilizes stringstream and getline. Might have problems, I haven't tested.
vector<string> procList; // file
void readProc(char *filename) {
std::ifstream file(filename);
std::string str;
while (std::getline(file, str))
{
std::stringstream in(str);
std::stringstream out;
std::string temp;
in>>temp;
out<<temp;
while(getline(in, temp, ':')) {
in>>temp;
out<<"\t"<<temp;
}
procList.push_back(out.str());
}
file.close();
for (int i = 0, size = procList.size(); i < size; ++i) {
_tprintf(TEXT("[%s] \n"), procList[i]);
}
}

Related

How to remove an item of Vectors c++

i have this code and i want to find a word in my vector and delete the item that includes that word but, my code will delete from the first line until the item that i want, how can i fix that?
std::string s;
std::vector<std::string> lines;
while (std::getline(theFile, s))
{
lines.push_back(s);
}
//finding item in vector and changing it
for (unsigned i = 0; i < lines.size(); i++)
{
std::size_t found = lines[i].find(name);
if (found != std::string::npos)
{
lines.erase(lines.begin() + i);
}
}
Update 1:
this is my full Code:
I'm opening a file that contains some elements in this format
( David, 2002 , 1041 , 1957 )
( cleve, 2003 , 1071 , 1517 )
( Ali, 2005 , 1021 , 1937 )
i'm getting a user input and finding the line that contains it. then i want to remove that line completely so i import it to a vector and then i can't modify it
#include <iostream>
#include <string>
#include <vector>
#include <stream>
#include <algorithm>
using namespace std;
using std::vector;
int main(){
string srch;
string line;
fstream Myfile;
string name;
int counter;
Myfile.open("Patientlist.txt", ios::in | ios::out);
cout <<"Deleting your Account";
cout << "\nEnter your ID: ";
cin.ignore();
getline(cin, srch);
if (Myfile.is_open())
{
while (getline(Myfile, line))
{
if (line.find(srch) != string::npos)
{
cout << "\nYour details are: \n"
<< line << endl;
}
break;
}
}
else
{
cout << "\nSearch Failed... Patient not found!" << endl;
}
Myfile.close();
ifstream theFile("Patientlist.txt");
//using vectors to store value of file
std::string s;
std::vector<std::string> lines;
while (std::getline(theFile, s))
{
lines.push_back(s);
}
//finding item in vector and changing it
for (unsigned i = 0; i < lines.size(); i++)
{
std::size_t found = lines[i].find(name);
if (found != std::string::npos)
{
lines.erase(lines.begin() + i);
}
}
//writing new vector on file
ofstream file;
file.open("Patientlist.txt");
for (int i = 0; i < lines.size(); ++i)
{
file << lines[i] << endl;
}
file.close();
cout << "Done!";
}
The erasing loop is broken. The proper way is to use iterators and use the iterator returned by erase. Like so:
// finding item in vector and changing it
for (auto it = lines.begin(); it != lines.end();) {
if (it->find(name) != std::string::npos) {
it = lines.erase(it);
} else {
++it;
}
}
Or using the erase–remove idiom:
lines.erase(std::remove_if(lines.begin(), lines.end(),
[&name](const std::string& line) {
return line.find(name) != std::string::npos;
}), lines.end());
Or since C++20 std::erase_if(std::vector):
std::erase_if(lines, [&name](const std::string& line) {
return line.find(name) != std::string::npos;
});
You can use remove_if for this. The predicate argument should check if you can find a word in a line. Then remember to use erase to "shrink" the vector to its new size.
[Demo]
#include <algorithm> // remove_if
#include <iostream> // cout
#include <string>
#include <vector>
int main()
{
std::vector<std::string> lines{"one two three", "three four five", "five six one"};
lines.erase(
std::remove_if(std::begin(lines), std::end(lines), [](auto& line) {
return line.find("one") != std::string::npos;}),
std::end(lines)
);
for (auto& line : lines) { std::cout << line << "\n"; }
}

i.m trying to split string by whitespace using c++, where the data from database [duplicate]

What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.
std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from #mchiasson)
}
return elems;
}
Here's an example of splitting a string and populating a vector with the extracted elements using boost.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");
Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};
I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string& str, int delimiter(int) = ::isspace){
vector<string> result;
auto e=str.end();
auto i=str.begin();
while(i!=e){
i=find_if_not(i,e, delimiter);
if(i==e) break;
auto j=find_if(i,e, delimiter);
result.push_back(string(i,j));
i=j;
}
return result;
}
int main(){
string line;
getline(cin,line);
vector<string> result = split(line);
for(auto s: result){
cout<<s<<endl;
}
}
My choice is boost::tokenizer but I didn't have any heavy tasks and test with huge data.
Example from boost doc with lambda modification:
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string s = "This is, a test";
vector<string> v;
tokenizer<> tok(s);
for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
// result 4 items: 1)This 2)is 3)a 4)test
return 0;
}
This is my answer. Verbose, readable and efficient.
std::vector<std::string> tokenize(const std::string& s, char c) {
auto end = s.cend();
auto start = end;
std::vector<std::string> v;
for( auto it = s.cbegin(); it != end; ++it ) {
if( *it != c ) {
if( start == end )
start = it;
continue;
}
if( start != end ) {
v.emplace_back(start, it);
start = end;
}
}
if( start != end )
v.emplace_back(start, end);
return v;
}
#include <string>
#include <vector>
#include <sstream>
inline vector<string> split(const string& s) {
vector<string> result;
istringstream iss(s);
for (string w; iss >> w; )
result.push_back(w);
return result;
}
Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.
I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.
#include <string>
template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
if (delim.empty()) {
*out++ = s;
return out;
}
size_t a = 0, b = s.find(delim);
for ( ; b != std::string::npos;
a = b + delim.length(), b = s.find(delim, a))
{
*out++ = std::move(s.substr(a, b - a));
}
*out++ = std::move(s.substr(a, s.length() - a));
return out;
}
Some test cases:
void test()
{
std::vector<std::string> out;
size_t counter;
std::cout << "Empty input:" << std::endl;
out.clear();
splitStrings("", ",", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, empty delimiter:" << std::endl;
out.clear();
splitStrings("Hello, world!", "", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", no delimiter in string:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string:" << std::endl;
out.clear();
splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", input contains blank token:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", nothing after last delimiter:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", only delimiter exists string:" << std::endl;
out.clear();
splitStrings("xy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
}
Expected output:
Empty input:
0:
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2:
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2:
3: def
4:
Non-empty input, non-empty delimiter, only delimiter exists string:
0:
1:
One possible way of doing this is finding all occurrences of the split string and storing locations to a list. Then count input string characters and when you get to a position where there is a 'search hit' in the position list then you jump forward by 'length of the split string'. This approach takes a split string of any length. Here is my tested and working solution.
#include <iostream>
#include <string>
#include <list>
#include <vector>
using namespace std;
vector<string> Split(string input_string, string search_string)
{
list<int> search_hit_list;
vector<string> word_list;
size_t search_position, search_start = 0;
// Find start positions of every substring occurence and store positions to a hit list.
while ( (search_position = input_string.find(search_string, search_start) ) != string::npos) {
search_hit_list.push_back(search_position);
search_start = search_position + search_string.size();
}
// Iterate through hit list and reconstruct substring start and length positions
int character_counter = 0;
int start, length;
for (auto hit_position : search_hit_list) {
// Skip over substrings we are splitting with. This also skips over repeating substrings.
if (character_counter == hit_position) {
character_counter = character_counter + search_string.size();
continue;
}
start = character_counter;
character_counter = hit_position;
length = character_counter - start;
word_list.push_back(input_string.substr(start, length));
character_counter = character_counter + search_string.size();
}
// If the search string is not found in the input string, then return the whole input_string.
if (word_list.size() == 0) {
word_list.push_back(input_string);
return word_list;
}
// The last substring might be still be unprocessed, get it.
if (character_counter < input_string.size()) {
word_list.push_back(input_string.substr(character_counter, input_string.size() - character_counter));
}
return word_list;
}
int main() {
vector<string> word_list;
string search_string = " ";
// search_string = "the";
string text = "thetheThis is some text to test with the split-thethe function.";
word_list = Split(text, search_string);
for (auto item : word_list) {
cout << "'" << item << "'" << endl;
}
cout << endl;
}

Extracting Numbers from Mixed String using stringstream

I am trying to extract numbers from a string like Hello1234 using stringstream. I have written the code which works for extracting numbers when entered as apart from the string like:
Hello 1234 World 9876 Hello1234
gives 1234 9876 as output
but it doesn't read the mixed string which has both string and number. How can we extract it?
- For example: Hello1234 should give 1234.
Here is my code until now:
cout << "Welcome to the string stream program. " << endl;
string string1;
cout << "Enter a string with numbers and words: ";
getline(cin, string1);
stringstream ss; //intiazling string stream
ss << string1; //stores the string in stringstream
string temp; //string for reading words
int number; //int for reading integers
while(!ss.eof()) {
ss >> temp;
if (stringstream(temp) >> number) {
cout << "A number found is: " << number << endl;
}
}
If you're not limited to a solution that uses std::stringstream, I suggest you take a look at regular expressions. Example:
int main() {
std::string s = "Hello 123 World 456 Hello789";
std::regex regex(R"(\d+)"); // matches a sequence of digits
std::smatch match;
while (std::regex_search(s, match, regex)) {
std::cout << std::stoi(match.str()) << std::endl;
s = match.suffix();
}
}
The output:
123
456
789
Simply replace any alpha characters in the string with white-space before you do the stream extraction.
std::string str = "Hello 1234 World 9876 Hello1234";
for (char& c : str)
{
if (isalpha(c))
c = ' ';
}
std::stringstream ss(str);
int val;
while (ss >> val)
std::cout << val << "\n";
Output:
1234
9876
1234
Question itself is very trivial and as programmer most of us solving this kind of problem everyday. And we know there are many solution for any give problem but as programmer we try to find out best possible for any given problem.
When I came across this question there are already many useful and correct answer, but to satisfy my curiosity I try to benchmark all other solution, to find out best one.
I found best one out of all above, and feel that there is still some room for improvement.
So I am posting here my solution along with benchmark code.
#include <chrono>
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
#define REQUIER_EQUAL(x, y) \
if ((x) != (y)) { \
std::cout << __PRETTY_FUNCTION__ << " failed at :" << __LINE__ \
<< std::endl \
<< "\tx:" << (x) << "\ty:" << (y) << std::endl; \
; \
}
#define RUN_FUNCTION(func, in, out) \
auto start = std::chrono::system_clock::now(); \
func(in, out); \
auto stop = std::chrono::system_clock::now(); \
std::cout << "Time in " << __PRETTY_FUNCTION__ << ":" \
<< std::chrono::duration_cast<std::chrono::microseconds>(stop - \
start) \
.count() \
<< " usec" << std::endl;
//Solution by #Evg
void getNumbers1(std::string input, std::vector<int> &output) {
std::regex regex(R"(\d+)"); // matches a sequence of digits
std::smatch match;
while (std::regex_search(input, match, regex)) {
output.push_back(std::stoi(match.str()));
input = match.suffix();
}
}
//Solution by #n314159
void getNumbers2(std::string input, std::vector<int> &output) {
std::stringstream ss;
int number;
for (const char c : input) {
if (std::isdigit(static_cast<unsigned char>(c))) { // Thanks to Aconcagua
ss << c;
} else if (ss >> number) {
output.push_back(number);
}
}
}
//Solution by #The Failure by Design
void getNumbers3(std::string input, std::vector<int> &output) {
istringstream is{input};
char c;
int n;
while (is.get(c)) {
if (!isdigit(static_cast<unsigned char>(c)))
continue;
is.putback(c);
is >> n;
output.push_back(n);
}
}
//Solution by #acraig5075
void getNumbers4(std::string input, std::vector<int> &output) {
for (char &c : input) {
if (isalpha(c))
c = ' ';
}
std::stringstream ss(input);
int val;
while (ss >> val)
output.push_back(val);
}
//Solution by me
void getNumbers5(std::string input, std::vector<int> &output) {
std::size_t start = std::string::npos, stop = std::string::npos;
for (auto i = 0; i < input.size(); ++i) {
if (isdigit(input.at(i))) {
if (start == std::string::npos) {
start = i;
}
} else {
if (start != std::string::npos) {
output.push_back(std::stoi(input.substr(start, i - start)));
start = std::string::npos;
}
}
}
if (start != std::string::npos)
output.push_back(std::stoi(input.substr(start, input.size() - start)));
}
void test1_getNumbers1() {
std::string input = "Hello 123 World 456 Hello789 ";
std::vector<int> output;
RUN_FUNCTION(getNumbers1, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers2() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers2, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers3() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers3, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers4() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers4, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers5() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers5, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
int main() {
test1_getNumbers1();
// test1_getNumbers2();
test1_getNumbers3();
test1_getNumbers4();
test1_getNumbers5();
return 0;
}
Sample output on my platform
Time in void test1_getNumbers1():703 usec
Time in void test1_getNumbers3():17 usec
Time in void test1_getNumbers4():10 usec
Time in void test1_getNumbers5():6 usec
Adding my version:
#include <iostream>
#include <string>
#include <sstream>
int main(){
std::string s;
std::getline(std::cin, s);
std::stringstream ss;
int number;
for(const char c: s){
if( std::isdigit(static_cast<unsigned char>(c)) ){ //Thanks to Aconcagua
ss << c;
} else if ( ss >> number ) {
std::cout << number << " found\n";
}
ss.clear();
}
if(ss >> number)
{
std::cout << number << " found\n";
}
return 0;
}
You can use the code below with any type of stream - stringstream included. It reads from stream to first digit. The digit is put back in the stream and then the number is read as usually. Live code.
#include <iostream>
using namespace std;
istream& get_number( istream& is, int& n )
{
while ( is && !isdigit( static_cast<unsigned char>( is.get() ) ) )
;
is.unget();
return is >> n;
}
int main()
{
int n;
while ( get_number( cin, n ) )
cout << n << ' ';
}
Notes
Regarding regex - It seems people are forgetting/ignoring the basics and, for some reason (c++ purism?), prefer the sledgehammer for even the most basic problems.
Regarding speed - If you take the stream out of the picture, you cannot beat fundamental c. The code below is tens of times faster than the regex solution and at least a couple of times faster than any answer so far.
const char* get_number( const char*& s, int& n )
{
// end of string
if ( !*s )
return 0;
// skip to first digit
while ( !isdigit( static_cast<unsigned char>( *s ) ) )
++s;
// convert
char* e;
n = strtol( s, &e, 10 );
return s = e;
}
//...
while ( get_number( s, n ) )
//...

C++ alternative of Java's split(str, -1) [duplicate]

What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.
std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from #mchiasson)
}
return elems;
}
Here's an example of splitting a string and populating a vector with the extracted elements using boost.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");
Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};
I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string& str, int delimiter(int) = ::isspace){
vector<string> result;
auto e=str.end();
auto i=str.begin();
while(i!=e){
i=find_if_not(i,e, delimiter);
if(i==e) break;
auto j=find_if(i,e, delimiter);
result.push_back(string(i,j));
i=j;
}
return result;
}
int main(){
string line;
getline(cin,line);
vector<string> result = split(line);
for(auto s: result){
cout<<s<<endl;
}
}
My choice is boost::tokenizer but I didn't have any heavy tasks and test with huge data.
Example from boost doc with lambda modification:
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string s = "This is, a test";
vector<string> v;
tokenizer<> tok(s);
for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
// result 4 items: 1)This 2)is 3)a 4)test
return 0;
}
This is my answer. Verbose, readable and efficient.
std::vector<std::string> tokenize(const std::string& s, char c) {
auto end = s.cend();
auto start = end;
std::vector<std::string> v;
for( auto it = s.cbegin(); it != end; ++it ) {
if( *it != c ) {
if( start == end )
start = it;
continue;
}
if( start != end ) {
v.emplace_back(start, it);
start = end;
}
}
if( start != end )
v.emplace_back(start, end);
return v;
}
#include <string>
#include <vector>
#include <sstream>
inline vector<string> split(const string& s) {
vector<string> result;
istringstream iss(s);
for (string w; iss >> w; )
result.push_back(w);
return result;
}
Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.
I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.
#include <string>
template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
if (delim.empty()) {
*out++ = s;
return out;
}
size_t a = 0, b = s.find(delim);
for ( ; b != std::string::npos;
a = b + delim.length(), b = s.find(delim, a))
{
*out++ = std::move(s.substr(a, b - a));
}
*out++ = std::move(s.substr(a, s.length() - a));
return out;
}
Some test cases:
void test()
{
std::vector<std::string> out;
size_t counter;
std::cout << "Empty input:" << std::endl;
out.clear();
splitStrings("", ",", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, empty delimiter:" << std::endl;
out.clear();
splitStrings("Hello, world!", "", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", no delimiter in string:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string:" << std::endl;
out.clear();
splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", input contains blank token:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", nothing after last delimiter:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", only delimiter exists string:" << std::endl;
out.clear();
splitStrings("xy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
}
Expected output:
Empty input:
0:
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2:
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2:
3: def
4:
Non-empty input, non-empty delimiter, only delimiter exists string:
0:
1:
One possible way of doing this is finding all occurrences of the split string and storing locations to a list. Then count input string characters and when you get to a position where there is a 'search hit' in the position list then you jump forward by 'length of the split string'. This approach takes a split string of any length. Here is my tested and working solution.
#include <iostream>
#include <string>
#include <list>
#include <vector>
using namespace std;
vector<string> Split(string input_string, string search_string)
{
list<int> search_hit_list;
vector<string> word_list;
size_t search_position, search_start = 0;
// Find start positions of every substring occurence and store positions to a hit list.
while ( (search_position = input_string.find(search_string, search_start) ) != string::npos) {
search_hit_list.push_back(search_position);
search_start = search_position + search_string.size();
}
// Iterate through hit list and reconstruct substring start and length positions
int character_counter = 0;
int start, length;
for (auto hit_position : search_hit_list) {
// Skip over substrings we are splitting with. This also skips over repeating substrings.
if (character_counter == hit_position) {
character_counter = character_counter + search_string.size();
continue;
}
start = character_counter;
character_counter = hit_position;
length = character_counter - start;
word_list.push_back(input_string.substr(start, length));
character_counter = character_counter + search_string.size();
}
// If the search string is not found in the input string, then return the whole input_string.
if (word_list.size() == 0) {
word_list.push_back(input_string);
return word_list;
}
// The last substring might be still be unprocessed, get it.
if (character_counter < input_string.size()) {
word_list.push_back(input_string.substr(character_counter, input_string.size() - character_counter));
}
return word_list;
}
int main() {
vector<string> word_list;
string search_string = " ";
// search_string = "the";
string text = "thetheThis is some text to test with the split-thethe function.";
word_list = Split(text, search_string);
for (auto item : word_list) {
cout << "'" << item << "'" << endl;
}
cout << endl;
}

C++ split string using a list of words as separators

I would like to split a string like this one
“this1245is#g$0,therhsuidthing345”
using a list of words like the one bellow
{“this”, “is”, “the”, “thing”}
into this list
{“this”, “1245”, “is”, “#g$0,”, “the”, “rhsuid”, “thing”, “345”}
// ^--------------^---------------^------------------^-- these were the delimiters
The delimiters are allowed to appear more than once in the string to split, and it can be done using regular expressions
The precedence is in the order in which the delimiters appear in the array
The platform I'm developing for has no support for the Boost library
Update
This is what I have for the moment
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this1245is#g$0,therhsuidthing345");
std::string delimiters[] = {"this", "is", "the", "thing"};
for (int i=0; i<4; i++) {
std::string delimiter = "(" + delimiters[i] + ")(.*)";
std::regex e (delimiter); // matches words beginning by the i-th delimiter
// default constructor = end-of-sequence:
std::sregex_token_iterator rend;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::sregex_token_iterator c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
}
return 0;
}
output:
1st and 2nd submatches:[this][x1245fisA#g$0,therhsuidthing345]
1st and 2nd submatches:[is][x1245fisA#g$0,therhsuidthing345]
1st and 2nd submatches:[the][rhsuidthing345]
1st and 2nd submatches:[thing][345]
I think I need to make some recursive thing to call on each iteration
Build the expression you want for matches only (re), then pass in {-1, 0} to your std::sregex_token_iterator to return all non-matches (-1) and matches (0).
#include <iostream>
#include <regex>
int main() {
std::string s("this1245is#g$0,therhsuidthing345");
std::regex re("(this|is|the|thing)");
std::sregex_token_iterator iter(s.begin(), s.end(), re, { -1, 0 });
std::sregex_token_iterator end;
while (iter != end) {
//Works in vc13, clang requires you increment separately,
//haven't gone into implementation to see if/how ssub_match is affected.
//Workaround: increment separately.
//std::cout << "[" << *iter++ << "] ";
std::cout << "[" << *iter << "] ";
++iter;
}
}
I don't know how to perform the precedence requirement. This seems to work on the given input:
std::vector<std::string> parse (std::string s)
{
std::vector<std::string> out;
std::regex re("\(this|is|the|thing).*");
std::string word;
auto i = s.begin();
while (i != s.end()) {
std::match_results<std::string::iterator> m;
if (std::regex_match(i, s.end(), m, re)) {
if (!word.empty()) {
out.push_back(word);
word.clear();
}
out.push_back(std::string(m[1].first, m[1].second));
i += out.back().size();
} else {
word += *i++;
}
}
if (!word.empty()) {
out.push_back(word);
}
return out;
}
vector<string> strs;
boost::split(strs,line,boost::is_space());