extracting last 2 words from a sequence of strings, space-separated - c++

I have any sequence (or sentence) and i want to extract the last 2 strings.
For example,
sdfsdfds sdfs dfsd fgsd 3 dsfds should produce: 3 dsfds
sdfsd (dfgdg)gfdg fg 6 gg should produce: 6 gg

You can use std::string::find_last_of function to find spaces.
int main()
{
std::string test = "sdfsdfds sdfs dfsd fgsd 3 dsfds";
size_t found1 = test.find_last_of( " " );
if ( found1 != string::npos ) {
size_t found2 = test.find_last_of( " ", found1-1 );
if ( found2 != string::npos )
std::cout << test.substr(found2+1, found1-found2-1) << std::endl;
std::cout << test.substr(found1+1) << std::endl;
}
return 0;
}

The following will work if your strings are whitespace separated.
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str = "jfdf fhfeif shfowejef dhfojfe";
stringstream sstr(str);
vector<string> vstr;
while(sstr >> str)
{
vstr.push_back(str);
}
if (vstr.size() >= 2)
cout << vstr[vstr.size()-2] << ' ';
if (vstr.size())
cout << vstr[vstr.size()-1] << endl;
return 0;
}

Returns the strings in the wrong order, but if that doesn't matter,
std::string s ("some words here");
std::string::size_type j;
for(int i=0; i<2; ++i) {
if((j = s.find_last_of(' ')) == std::string::npos) {
// there aren't two strings, throw, return, or do something else
return 0;
}
std::cout << s.c_str()+j+1;
s = " " + s.substr(0,j);
}
Alternatively,
struct extract_two_words {
friend std::istream& operator>> (std::istream& in , extract_two_words& etw);
std::string word1;
std::string word2;
};
std::istream& operator>> (std::istream& in , extract_two_words& etw) {
std::string str1, str2;
while(in) {
in >> str1;
in >> str2;
}
etw.word2 = str1;
etw.word1 = str2;
}

I would encourage you to have a look at the Boost library. It has algorithms and data structures that help you tremendously. Here's how to solve your problem using Boost.StringAlgo:
#include <boost/algorithm/string/split.hpp>
#include <iostream>
#include <vector>
#include <string>
int main()
{
std::string test = "sdfsdfds sdfs dfsd fgsd 3 dsfds";
std::vector<std::string> v;
boost::algorithm::split(v, test, [](char c) { return c==' ';});
std::cout << "Second to last: " << v.at(v.size()-2) << std::endl;
std::cout << "Last: " << v.at(v.size()-1) << std::endl;
}
I would also encourage you to always use the vector::at method instead of []. This will give you proper error handling.

int main()
{
std::string test = "sdfsdfds sdfs dfsd fgsd 3 dsfds";
size_t pos = test.length();
for (int i=0; i < 2; i++)
pos = test.find_last_of(" ", pos-1);
std::cout << test.substr(pos+1) << std::endl;
}
Simpler :)

Related

C++ String Stream

I'm just learning how to use streams in C++ and I have one question.
I thought that each stream has state true or false. I want to enter each word from the string below and 1 until there is a word, but I get an error:
cannot convert 'std::istringstream {aka std::__cxx11::basic_istringstream<char>}' to 'bool' in initialization
bool canReadMore = textIn;
It should be like:
antilope
1
ant
1
antagonist
1
antidepressant
1
What am I doing wrong?
int main() {
std:: string text = "antilope ant antagonist antidepressant";
std:: istringstream textIn(text);
for(int i = 0; i < 5; i++ ){
std:: string s;
textIn >> s;
bool canReadMore = textIn;
std::cout << s << std:: endl;
std::cout << canReadMore << std:: endl;
}
return 0;
}
``1
Since C++11, std::istringstream operator bool is explicit. What this means is that you must explicitly make the cast yourself:
#include <iostream>
#include <sstream>
#include <string>
int main() {
std::string text = "antilope ant antagonist antidepressant";
std::istringstream textIn(text);
for (int i = 0; i < 5; i++) {
std::string s;
textIn >> s;
bool canReadMore = bool(textIn);
std::cout << s << std::endl;
std::cout << canReadMore << std::endl;
}
return 0;
}
Output:
./a.out
antilope
1
ant
1
antagonist
1
antidepressant
1
0
Now, if you use a std::stringstream in a bool context, the conversion will be automatic. This is an idiomatic use:
#include <iostream>
#include <sstream>
#include <string>
int main() {
std::string text = "antilope ant antagonist antidepressant";
std::istringstream textIn(text);
std::string s;
while (textIn >> s) {
std::cout << s << "\n";
}
}
Output:
antilope
ant
antagonist
antidepressant

i.m trying to split string by whitespace using c++, where the data from database [duplicate]

What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.
std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from #mchiasson)
}
return elems;
}
Here's an example of splitting a string and populating a vector with the extracted elements using boost.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");
Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};
I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string& str, int delimiter(int) = ::isspace){
vector<string> result;
auto e=str.end();
auto i=str.begin();
while(i!=e){
i=find_if_not(i,e, delimiter);
if(i==e) break;
auto j=find_if(i,e, delimiter);
result.push_back(string(i,j));
i=j;
}
return result;
}
int main(){
string line;
getline(cin,line);
vector<string> result = split(line);
for(auto s: result){
cout<<s<<endl;
}
}
My choice is boost::tokenizer but I didn't have any heavy tasks and test with huge data.
Example from boost doc with lambda modification:
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string s = "This is, a test";
vector<string> v;
tokenizer<> tok(s);
for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
// result 4 items: 1)This 2)is 3)a 4)test
return 0;
}
This is my answer. Verbose, readable and efficient.
std::vector<std::string> tokenize(const std::string& s, char c) {
auto end = s.cend();
auto start = end;
std::vector<std::string> v;
for( auto it = s.cbegin(); it != end; ++it ) {
if( *it != c ) {
if( start == end )
start = it;
continue;
}
if( start != end ) {
v.emplace_back(start, it);
start = end;
}
}
if( start != end )
v.emplace_back(start, end);
return v;
}
#include <string>
#include <vector>
#include <sstream>
inline vector<string> split(const string& s) {
vector<string> result;
istringstream iss(s);
for (string w; iss >> w; )
result.push_back(w);
return result;
}
Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.
I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.
#include <string>
template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
if (delim.empty()) {
*out++ = s;
return out;
}
size_t a = 0, b = s.find(delim);
for ( ; b != std::string::npos;
a = b + delim.length(), b = s.find(delim, a))
{
*out++ = std::move(s.substr(a, b - a));
}
*out++ = std::move(s.substr(a, s.length() - a));
return out;
}
Some test cases:
void test()
{
std::vector<std::string> out;
size_t counter;
std::cout << "Empty input:" << std::endl;
out.clear();
splitStrings("", ",", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, empty delimiter:" << std::endl;
out.clear();
splitStrings("Hello, world!", "", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", no delimiter in string:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string:" << std::endl;
out.clear();
splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", input contains blank token:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", nothing after last delimiter:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", only delimiter exists string:" << std::endl;
out.clear();
splitStrings("xy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
}
Expected output:
Empty input:
0:
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2:
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2:
3: def
4:
Non-empty input, non-empty delimiter, only delimiter exists string:
0:
1:
One possible way of doing this is finding all occurrences of the split string and storing locations to a list. Then count input string characters and when you get to a position where there is a 'search hit' in the position list then you jump forward by 'length of the split string'. This approach takes a split string of any length. Here is my tested and working solution.
#include <iostream>
#include <string>
#include <list>
#include <vector>
using namespace std;
vector<string> Split(string input_string, string search_string)
{
list<int> search_hit_list;
vector<string> word_list;
size_t search_position, search_start = 0;
// Find start positions of every substring occurence and store positions to a hit list.
while ( (search_position = input_string.find(search_string, search_start) ) != string::npos) {
search_hit_list.push_back(search_position);
search_start = search_position + search_string.size();
}
// Iterate through hit list and reconstruct substring start and length positions
int character_counter = 0;
int start, length;
for (auto hit_position : search_hit_list) {
// Skip over substrings we are splitting with. This also skips over repeating substrings.
if (character_counter == hit_position) {
character_counter = character_counter + search_string.size();
continue;
}
start = character_counter;
character_counter = hit_position;
length = character_counter - start;
word_list.push_back(input_string.substr(start, length));
character_counter = character_counter + search_string.size();
}
// If the search string is not found in the input string, then return the whole input_string.
if (word_list.size() == 0) {
word_list.push_back(input_string);
return word_list;
}
// The last substring might be still be unprocessed, get it.
if (character_counter < input_string.size()) {
word_list.push_back(input_string.substr(character_counter, input_string.size() - character_counter));
}
return word_list;
}
int main() {
vector<string> word_list;
string search_string = " ";
// search_string = "the";
string text = "thetheThis is some text to test with the split-thethe function.";
word_list = Split(text, search_string);
for (auto item : word_list) {
cout << "'" << item << "'" << endl;
}
cout << endl;
}

Extracting Numbers from Mixed String using stringstream

I am trying to extract numbers from a string like Hello1234 using stringstream. I have written the code which works for extracting numbers when entered as apart from the string like:
Hello 1234 World 9876 Hello1234
gives 1234 9876 as output
but it doesn't read the mixed string which has both string and number. How can we extract it?
- For example: Hello1234 should give 1234.
Here is my code until now:
cout << "Welcome to the string stream program. " << endl;
string string1;
cout << "Enter a string with numbers and words: ";
getline(cin, string1);
stringstream ss; //intiazling string stream
ss << string1; //stores the string in stringstream
string temp; //string for reading words
int number; //int for reading integers
while(!ss.eof()) {
ss >> temp;
if (stringstream(temp) >> number) {
cout << "A number found is: " << number << endl;
}
}
If you're not limited to a solution that uses std::stringstream, I suggest you take a look at regular expressions. Example:
int main() {
std::string s = "Hello 123 World 456 Hello789";
std::regex regex(R"(\d+)"); // matches a sequence of digits
std::smatch match;
while (std::regex_search(s, match, regex)) {
std::cout << std::stoi(match.str()) << std::endl;
s = match.suffix();
}
}
The output:
123
456
789
Simply replace any alpha characters in the string with white-space before you do the stream extraction.
std::string str = "Hello 1234 World 9876 Hello1234";
for (char& c : str)
{
if (isalpha(c))
c = ' ';
}
std::stringstream ss(str);
int val;
while (ss >> val)
std::cout << val << "\n";
Output:
1234
9876
1234
Question itself is very trivial and as programmer most of us solving this kind of problem everyday. And we know there are many solution for any give problem but as programmer we try to find out best possible for any given problem.
When I came across this question there are already many useful and correct answer, but to satisfy my curiosity I try to benchmark all other solution, to find out best one.
I found best one out of all above, and feel that there is still some room for improvement.
So I am posting here my solution along with benchmark code.
#include <chrono>
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
#define REQUIER_EQUAL(x, y) \
if ((x) != (y)) { \
std::cout << __PRETTY_FUNCTION__ << " failed at :" << __LINE__ \
<< std::endl \
<< "\tx:" << (x) << "\ty:" << (y) << std::endl; \
; \
}
#define RUN_FUNCTION(func, in, out) \
auto start = std::chrono::system_clock::now(); \
func(in, out); \
auto stop = std::chrono::system_clock::now(); \
std::cout << "Time in " << __PRETTY_FUNCTION__ << ":" \
<< std::chrono::duration_cast<std::chrono::microseconds>(stop - \
start) \
.count() \
<< " usec" << std::endl;
//Solution by #Evg
void getNumbers1(std::string input, std::vector<int> &output) {
std::regex regex(R"(\d+)"); // matches a sequence of digits
std::smatch match;
while (std::regex_search(input, match, regex)) {
output.push_back(std::stoi(match.str()));
input = match.suffix();
}
}
//Solution by #n314159
void getNumbers2(std::string input, std::vector<int> &output) {
std::stringstream ss;
int number;
for (const char c : input) {
if (std::isdigit(static_cast<unsigned char>(c))) { // Thanks to Aconcagua
ss << c;
} else if (ss >> number) {
output.push_back(number);
}
}
}
//Solution by #The Failure by Design
void getNumbers3(std::string input, std::vector<int> &output) {
istringstream is{input};
char c;
int n;
while (is.get(c)) {
if (!isdigit(static_cast<unsigned char>(c)))
continue;
is.putback(c);
is >> n;
output.push_back(n);
}
}
//Solution by #acraig5075
void getNumbers4(std::string input, std::vector<int> &output) {
for (char &c : input) {
if (isalpha(c))
c = ' ';
}
std::stringstream ss(input);
int val;
while (ss >> val)
output.push_back(val);
}
//Solution by me
void getNumbers5(std::string input, std::vector<int> &output) {
std::size_t start = std::string::npos, stop = std::string::npos;
for (auto i = 0; i < input.size(); ++i) {
if (isdigit(input.at(i))) {
if (start == std::string::npos) {
start = i;
}
} else {
if (start != std::string::npos) {
output.push_back(std::stoi(input.substr(start, i - start)));
start = std::string::npos;
}
}
}
if (start != std::string::npos)
output.push_back(std::stoi(input.substr(start, input.size() - start)));
}
void test1_getNumbers1() {
std::string input = "Hello 123 World 456 Hello789 ";
std::vector<int> output;
RUN_FUNCTION(getNumbers1, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers2() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers2, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers3() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers3, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers4() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers4, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
void test1_getNumbers5() {
std::string input = "Hello 123 World 456 Hello789";
std::vector<int> output;
RUN_FUNCTION(getNumbers5, input, output);
REQUIER_EQUAL(output.size(), 3);
REQUIER_EQUAL(output[0], 123);
REQUIER_EQUAL(output[1], 456);
REQUIER_EQUAL(output[2], 789);
}
int main() {
test1_getNumbers1();
// test1_getNumbers2();
test1_getNumbers3();
test1_getNumbers4();
test1_getNumbers5();
return 0;
}
Sample output on my platform
Time in void test1_getNumbers1():703 usec
Time in void test1_getNumbers3():17 usec
Time in void test1_getNumbers4():10 usec
Time in void test1_getNumbers5():6 usec
Adding my version:
#include <iostream>
#include <string>
#include <sstream>
int main(){
std::string s;
std::getline(std::cin, s);
std::stringstream ss;
int number;
for(const char c: s){
if( std::isdigit(static_cast<unsigned char>(c)) ){ //Thanks to Aconcagua
ss << c;
} else if ( ss >> number ) {
std::cout << number << " found\n";
}
ss.clear();
}
if(ss >> number)
{
std::cout << number << " found\n";
}
return 0;
}
You can use the code below with any type of stream - stringstream included. It reads from stream to first digit. The digit is put back in the stream and then the number is read as usually. Live code.
#include <iostream>
using namespace std;
istream& get_number( istream& is, int& n )
{
while ( is && !isdigit( static_cast<unsigned char>( is.get() ) ) )
;
is.unget();
return is >> n;
}
int main()
{
int n;
while ( get_number( cin, n ) )
cout << n << ' ';
}
Notes
Regarding regex - It seems people are forgetting/ignoring the basics and, for some reason (c++ purism?), prefer the sledgehammer for even the most basic problems.
Regarding speed - If you take the stream out of the picture, you cannot beat fundamental c. The code below is tens of times faster than the regex solution and at least a couple of times faster than any answer so far.
const char* get_number( const char*& s, int& n )
{
// end of string
if ( !*s )
return 0;
// skip to first digit
while ( !isdigit( static_cast<unsigned char>( *s ) ) )
++s;
// convert
char* e;
n = strtol( s, &e, 10 );
return s = e;
}
//...
while ( get_number( s, n ) )
//...

C++ alternative of Java's split(str, -1) [duplicate]

What would be easiest method to split a string using c++11?
I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.
Edit: I would like to have a vector<string> as a result and be able to delimitate on a single character.
std::regex_token_iterator performs generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
Here is a (maybe less verbose) way to split string (based on the post you mentioned).
#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
std::vector<std::string> elems;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
// elems.push_back(std::move(item)); // if C++11 (based on comment from #mchiasson)
}
return elems;
}
Here's an example of splitting a string and populating a vector with the extracted elements using boost.
#include <boost/algorithm/string.hpp>
std::string my_input("A,B,EE");
std::vector<std::string> results;
boost::algorithm::split(results, my_input, boost::is_any_of(","));
assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");
Another regex solution inspired by other answers but hopefully shorter and easier to read:
std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};
I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 features it uses is auto and range-based for loop.
#include <string>
#include <cctype>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string s = "hello how are you won't you tell me your name";
vector<string> tokens;
string token;
for (const auto& c: s) {
if (!isspace(c))
token += c;
else {
if (token.length()) tokens.push_back(token);
token.clear();
}
}
if (token.length()) tokens.push_back(token);
return 0;
}
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
vector<string> split(const string& str, int delimiter(int) = ::isspace){
vector<string> result;
auto e=str.end();
auto i=str.begin();
while(i!=e){
i=find_if_not(i,e, delimiter);
if(i==e) break;
auto j=find_if(i,e, delimiter);
result.push_back(string(i,j));
i=j;
}
return result;
}
int main(){
string line;
getline(cin,line);
vector<string> result = split(line);
for(auto s: result){
cout<<s<<endl;
}
}
My choice is boost::tokenizer but I didn't have any heavy tasks and test with huge data.
Example from boost doc with lambda modification:
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>
int main()
{
using namespace std;
using namespace boost;
string s = "This is, a test";
vector<string> v;
tokenizer<> tok(s);
for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
// result 4 items: 1)This 2)is 3)a 4)test
return 0;
}
This is my answer. Verbose, readable and efficient.
std::vector<std::string> tokenize(const std::string& s, char c) {
auto end = s.cend();
auto start = end;
std::vector<std::string> v;
for( auto it = s.cbegin(); it != end; ++it ) {
if( *it != c ) {
if( start == end )
start = it;
continue;
}
if( start != end ) {
v.emplace_back(start, it);
start = end;
}
}
if( start != end )
v.emplace_back(start, end);
return v;
}
#include <string>
#include <vector>
#include <sstream>
inline vector<string> split(const string& s) {
vector<string> result;
istringstream iss(s);
for (string w; iss >> w; )
result.push_back(w);
return result;
}
Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.
I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.
#include <string>
template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
if (delim.empty()) {
*out++ = s;
return out;
}
size_t a = 0, b = s.find(delim);
for ( ; b != std::string::npos;
a = b + delim.length(), b = s.find(delim, a))
{
*out++ = std::move(s.substr(a, b - a));
}
*out++ = std::move(s.substr(a, s.length() - a));
return out;
}
Some test cases:
void test()
{
std::vector<std::string> out;
size_t counter;
std::cout << "Empty input:" << std::endl;
out.clear();
splitStrings("", ",", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, empty delimiter:" << std::endl;
out.clear();
splitStrings("Hello, world!", "", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", no delimiter in string:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string:" << std::endl;
out.clear();
splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", input contains blank token:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", delimiter exists string"
", nothing after last delimiter:" << std::endl;
out.clear();
splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
std::cout << "Non-empty input, non-empty delimiter"
", only delimiter exists string:" << std::endl;
out.clear();
splitStrings("xy", "xy", std::back_inserter(out));
counter = 0;
for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
std::cout << counter << ": " << *i << std::endl;
}
}
Expected output:
Empty input:
0:
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2:
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2:
3: def
4:
Non-empty input, non-empty delimiter, only delimiter exists string:
0:
1:
One possible way of doing this is finding all occurrences of the split string and storing locations to a list. Then count input string characters and when you get to a position where there is a 'search hit' in the position list then you jump forward by 'length of the split string'. This approach takes a split string of any length. Here is my tested and working solution.
#include <iostream>
#include <string>
#include <list>
#include <vector>
using namespace std;
vector<string> Split(string input_string, string search_string)
{
list<int> search_hit_list;
vector<string> word_list;
size_t search_position, search_start = 0;
// Find start positions of every substring occurence and store positions to a hit list.
while ( (search_position = input_string.find(search_string, search_start) ) != string::npos) {
search_hit_list.push_back(search_position);
search_start = search_position + search_string.size();
}
// Iterate through hit list and reconstruct substring start and length positions
int character_counter = 0;
int start, length;
for (auto hit_position : search_hit_list) {
// Skip over substrings we are splitting with. This also skips over repeating substrings.
if (character_counter == hit_position) {
character_counter = character_counter + search_string.size();
continue;
}
start = character_counter;
character_counter = hit_position;
length = character_counter - start;
word_list.push_back(input_string.substr(start, length));
character_counter = character_counter + search_string.size();
}
// If the search string is not found in the input string, then return the whole input_string.
if (word_list.size() == 0) {
word_list.push_back(input_string);
return word_list;
}
// The last substring might be still be unprocessed, get it.
if (character_counter < input_string.size()) {
word_list.push_back(input_string.substr(character_counter, input_string.size() - character_counter));
}
return word_list;
}
int main() {
vector<string> word_list;
string search_string = " ";
// search_string = "the";
string text = "thetheThis is some text to test with the split-thethe function.";
word_list = Split(text, search_string);
for (auto item : word_list) {
cout << "'" << item << "'" << endl;
}
cout << endl;
}

Reading from file separated with semicolons and storing into array

I am completely lost and have been trying for hours to read from a file named "movies.txt" and storing the info from it into arrays, because it has semicolons. Any help? Thanks.
movies.txt:
The Avengers ; 2012 ; 89 ; 623357910.79
Guardians of the Galaxy ; 2014 ; 96 ; 333130696.46
Code:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
struct Movie {
std::string name;
int year;
int rating;
double earnings;
};
int main()
{
const int MAX_SIZE = 100;
Movie movieList[MAX_SIZE];
std::string line;
int i = 0;
std::ifstream movieFile;
movieFile.open("movies.txt");
while (getline(movieFile, line, ';'))
{
movieFile >> movieList[i].name >> movieList[i].year >> movieList[i].rating >> movieList[i].earnings;
i++;
}
movieFile.close();
std::cout << movieList[0].name << " " << movieList[0].year << " " << movieList[0].rating << " " << movieList[0].earnings << std::endl;
std::cout << movieList[1].name << " " << movieList[1].year << " " << movieList[1].rating << " " << movieList[1].earnings << std::endl;
return 0;
}
What I want is to have:
movieList[0].name = "The Avengers";
movieList[0].year = 2012;
movieList[0].rating = 89;
movieList[0].earnings = 623357910.79;
movieList[1].name = "Guardians of the Galaxy";
movieList[1].year = 2014;
movieList[1].rating = 96;
movieList[1].earnings = 333130696.46;
I amended your code.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
struct Movie {
std::string name;
int year;
int rating;
double earnings;
};
std::vector<std::string>
split(const std::string &s, char delim = ',')
{
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim))
{
elems.push_back(item);
}
return elems;
}
int main()
{
std::vector<Movie> movieList;
std::string line;
std::ifstream movieFile;
movieFile.open("movies.txt");
while (getline(movieFile, line))
{
std::vector<std::string> columns = split(line,';');
Movie movie;
movie.name = columns[0];
movie.year = std::stoi(columns[1]);
movie.rating = std::stoi(columns[2]);
movie.earnings = std::stof(columns[3]);
movieList.push_back(movie);
}
movieFile.close();
for (const Movie & m: movieList)
{
std::cout << m.name << " " << m.year << " " << m.rating << " " << m.earnings << std::endl;
}
return 0;
}
Basicly, I added a split function that splits the lines using ';'. Also I use vector to store the movies rather than hard coded array of movies. Much better this way.
P.S. Second version without vectors
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
struct Movie {
std::string name;
int year;
int rating;
double earnings;
};
void split(const std::string &s, char delim, std::string elems[])
{
std::stringstream ss(s);
std::string item;
int i = 0;
while (std::getline(ss, item, delim))
{
elems[i++] = item;
}
}
int main()
{
//std::vector<Movie> movieList;
const int MAX_SIZE = 100;
Movie movieList[MAX_SIZE];
int movieNo = 0;
std::string line;
std::ifstream movieFile;
movieFile.open("/home/marcin/testing/movies.txt");
std::string columns[4];
while (getline(movieFile, line))
{
split(line,';', columns);
movieList[movieNo].name = columns[0];
movieList[movieNo].year = std::stoi(columns[1]);
movieList[movieNo].rating = std::stoi(columns[2]);
movieList[movieNo].earnings = std::stof(columns[3]);
++movieNo;
}
movieFile.close();
for (int i =0; i < movieNo; ++i) {
std::cout << movieList[i].name
<< " "
<< movieList[i].year
<< " "
<< movieList[i].rating
<< " "
<< movieList[i].earnings
<< std::endl;
}
return 0;
}
Use getline(my_movieFile, movie_name, ';') to get the name of the movie up to the ;.
You'll need to figure out how to remove the trailing whitespace from the name if necessary.. you can search for examples.
Read the rest of the line using getline(movieFile, line)
Use std::replace to replace all ; with a space in line
Put line into a std::stringstream.
Then extract the remaining fields from the stringstream using the >> operators.
Put this in loop do { ... } while (movieFile);
Also, don't hardcode an arbitrary number of movies. Use a std::vector<Movie> and push_back to add new ones.
I think you want to break your line into tokens using something like std::strtok. Check out the reference here. The example given on that page uses a blank as a separator, you would use a semicolon.