How can I read separate integers from the code below?
while (getline(cin, line)) {
// for each integer in line do something.....
// myVector.push_back(each integer)
}
The input is like this: 1, 2, 3, 5 (separated by comma except the last integer).
Sample Input (ignore the line # part):
line1: 1, 2, 3, 4, 5
line2: 6, 7, 8, 9, 10
line3: 3, 3, 3, 3, 3
/// and so on...
I need to read the integers one by one, and let's say increment and print them.
I make use of a handy utility to split a string into pieces using a char delimeter:
std::vector<std::string> split(const std::string& str, char delim) {
std::vector<std::string> strings;
size_t start;
size_t end = 0;
while ((start = str.find_first_not_of(delim, end)) != std::string::npos) {
end = str.find(delim, start);
strings.push_back(str.substr(start, end - start));
}
return strings;
}
and then do something like this:
while (getline(cin, line)) {
std::vector<std::string> strings = split(line, ',');
for (const auto& str : strings) {
const int i = std::stoi(str);
// do something w i
}
}
By default, '\n' is the delimiter for std::getline(). You can specify ',' instead as the delimiter, eg:
string value;
while (getline(cin, value, ',')) {
int num = stoi(value);
...
}
Otherwise, you can use std::getline() with '\n' as the delimiter to read an entire line, and then use a separate std::istringstream to read values from that line, such as by using std::getline() with ',' as the delimiter, eg:
string line;
if (getline(cin, line)) {
istringstream iss(line);
string value;
while (getline(iss, value, ',')) {
int num = stoi(value);
...
}
}
Alternatively, you can use streaming extraction via operator>>, eg:
string line;
if (getline(cin, line)) {
istringstream iss(line);
int num;
while (iss >> num) {
...
iss.ignore(); // skip terminating comma/whitespace
}
}
I will show you several different approaches on how to tokenize a string:
Splitting a string into tokens is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
Using old style std::strtok function. Maybe unsafe. Maybe should not be used any longer
std::getline. Most used implementation. But actually a "misuse" and not so flexible
Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
Related
I want a function that split text by array of delimiters. I have a demo that works perfectly, but it is really really slow. Here is a example of parameters.
text:
"pop-pap-bab bob"
vector of delimiters:
"-"," "
the result:
"pop", "-", "pap", "-", "bab", "bob"
So the function loops throw the string and tries to find delimeters and if it finds one it pushes the text and the delimiter that was found to the result array, if the text only contains spaces or if it is empty then don't push the text.
std::string replace(std::string str,std::string old,std::string new_str){
size_t pos = 0;
while ((pos = str.find(old)) != std::string::npos) {
str.replace(pos, old.length(), new_str);
}
return str;
}
std::vector<std::string> split_with_delimeter(std::string str,std::vector<std::string> delimeters){
std::vector<std::string> result;
std::string token;
int flag = 0;
for(int i=0;i<(int)str.size();i++){
for(int j=0;j<(int)delimeters.size();j++){
if(str.substr(i,delimeters.at(j).size()) == delimeters.at(j)){
if(token != ""){
result.push_back(token);
token = "";
}
if(replace(delimeters.at(j)," ","") != ""){
result.push_back(delimeters.at(j));
}
i += delimeters.at(j).size()-1;
flag = 1;
break;
}
}
if(flag == 0){token += str.at(i);}
flag = 0;
}
if(token != ""){
result.push_back(token);
}
return result;
}
My issue is that, the functions is really slow since it has 3 loops. I am wondering if anyone knows how to make the function faster. I am sorry, if I wasn't clear enough my english isn't the best.
It might be a good idea to use boost expressive. It is a powerful tool for various string operations more than struggling with string::find_xx and self for-loop or regex.
Concise explanation:
+as_xpr(" ") is repeated match more than 1 like regex and then prefix "-" means
shortest match.
If you define regex parser as sregex rex = "(" >> (+_w | +"_") >> ":" >> +_d >> ")", it would match (port_num:8080). In this case, ">>" means the concat of parsers and (+_w | +"_") means that it matches character or "_" repeatedly.
#include <vector>
#include <string>
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace std;
using namespace boost::xpressive;
int main() {
string source = "Nigeria is a multi&&national state in--habited by more than 2;;50 ethnic groups speak###ing 500 distinct languages";
vector<string> delimiters{ " ", " ", "&&", "-", ";;", "###"};
vector<sregex> pss{ -+as_xpr(delimiters.front()) };
for (const auto& d : delimiters) pss.push_back(pss.back() | -+as_xpr(d));
vector<string> ret;
size_t pos = 0;
auto push = [&](auto s, auto e) { ret.push_back(source.substr(s, e)); };
for_each(sregex_iterator(source.begin(), source.end(), pss.back()), {}, [&](smatch const& m) {
if (m.position() - pos) push(pos, m.position() - pos);
pos = m.position() + m.str().size();
}
);
push(pos, source.size() - pos);
for (auto& s : ret) printf("%s\n", s.c_str());
}
Output is splitted by multiple string delimiers.
Nigeria
is
a
multi
national
state
in
habited
by
more
than
2
50
ethnic
groups
speak
ing
500
distinct
languages
Maybe, as an alternative, you could use a regex? But maybe also too slow for you . . .
With a regex life would be very simple.
Please see the following example:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <iterator>
const std::regex re(R"((\w+|[\- ]))");
int main() {
std::string s{"pop-pap-bab bob"};
std::vector<std::string> part{std::sregex_token_iterator(s.begin(),s.end(),re),{}};
for (const std::string& p : part) std::cout << p << '\n';
}
We use the std::sregex_token_iterator in combination with the std::vectors range constructor, to extract everything specified in the regex and then put all those stuff into the std::vector
The regex itself is also simple. It specifies words or delimiters.
Maybe its worth a try . . .
NOTE: You've complained that your code is slow, but it's important to understand that most of the answers will have options to potentially speed up the program. And even if the author of the option measured the acceleration of the program, the option may be slower on your machine, so do not forget to measure the execution speed yourself.
If I were you, I would create a separate function that receives an array of strings and outputs an array of delimited strings. The problem with this approach may be that if the delimiter includes another delimiter, the result may not be what you expect, but it will be easier to iterate through different options for string splitting, finding the best.
And my solution would looks like this(though, it requires c++20)
#include <iomanip>
#include <iostream>
#include <ranges>
#include <string_view>
#include <vector>
std::vector<std::string> split_elems_of_array(const std::vector<std::string>& array, const std::string& delim)
{
std::vector<std::string> result;
for(const auto str: array)
{
for (const auto word : std::views::split(str, delim))
{
std::string chunk(word.begin(), word.end());
if(!chunk.empty() && chunk != " ")
result.push_back(chunk + delim);
}
}
return result;
}
std::vector<std::string> split_string(std::string str, std::vector<std::string> delims)
{
std::vector<std::string> result = {std::string(str)};
for(const auto&delim: delims)
result = split_elems_of_array(result, delim);
return {result.begin(), result.end()};
}
For my machine, my approach is 56 times faster: 67 ms versus 5112 ms. Length of string is 1000000, there are 100 delims with length 100
Here is the algorithm of standard splitting. if you split pop-pap-bab bob by {'-' , ' '} it gives you ["pop", "pap", "bab", "bob"] it's not storing delimiters and doesn't check for empty text. You can change it to do those things too.
Define a vector of strings named result.
Define a string variable named buffer.
Loop over your string, if current character is not a delimiter append it to buffer.
if current character is a delimiter, append buffer to result.
Return result at the end.
std::vector<std::string> split(std::string str, std::vector<char> delimiters)
{
std::vector<std::string> result;
std::string buffer;
for (const auto ch : str)
{
if (std::find(delimiters.begin(), delimiters.end(), ch) == delimiters.end())
buffer += ch;
else
{
result.insert(result.end(), buffer);
buffer.clear();
}
}
if (buffer.length())
result.insert(result.end(), buffer);
return result;
}
It's time complexity is O(n.m). n is the length of string and m is the length of delimiters.
I'm a student and this c++ subject is really hard for me . I learned a topic about file and were given a file that has 50 rows with 4 columns. I try to display the file using my lecturer notes . This is what i try :
#include < iostream >
using namespace std;
int main() {
FILE* stream = fopen("student.csv", "r");
char line[1024];
while (fgets(line, 1024, stream))
{
printf(" %s ",line);
}
}
i managed to display the file eventhough i can't really understand it. Can someone explain to me what is the char line for ? Is it represent the 50 rows ? and if i want to find the smallest value for one column , i have to declare a new variables ?
In C++, you would normally use a std::string to read a file and split it into columns.
I am sorry, I cannot "downgrade" to use char arrays in C++. So, I will assume that you open a file using a std::ifstream and the read line by line with std::getline in a loop. Then you have each line in a std::string
Then:
Splitting a string into parts is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
Using old style std::strtok function. Maybe unsafe. Maybe should not be used any longer
std::getline. Most used implementation. But actually a "misuse" and not so flexible
Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
How to convert:
string x = "1+2+3";
to:
char y[] = {'1', '2', '3'};
What approach should I do?
The task is to split a string separated by '+'. In the below example, the delimiter ',' is used.
Splitting a string into tokens is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
Using old style std::strtok function. Maybe unsafe. Maybe should not be used any longer
std::getline. Most used implementation. But actually a "misuse" and not so flexible
Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
You can use an std::vector<std::string> instead of char[], that way, it would work with more than one-digit numbers. Try this:
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
int main() {
using namespace std;
std::string str("1+2+3");
std::string buff;
std::stringstream ss(str);
std::vector<std::string> result;
while(getline(ss, buff, '+')){
result.push_back(buff);
}
for(std::string num : result){
std::cout << num << std::endl;
}
}
Here is a coliru link to show it works with numbers having more than one digit.
Here are my steps:
convert the original string into char*
split the obtained char* with the delimiter + by using the function strtok. I store each token into a vector<char>
convert this vector<char> into a C char array char*
#include <iostream>
#include <string.h>
#include <vector>
using namespace std;
int main()
{
string line = "1+2+3";
std::vector<char> vectChar;
// convert the original string into a char array to allow splitting
char* input= (char*) malloc(sizeof(char)*line.size());
strcpy(input,line.data());
// splitting the string
char *token = strtok(input, "+");
int len=0;
while(token) {
std::cout << *token;
vectChar.push_back(*token);
token = strtok(NULL, "+");
}
// end of splitting step
std::cout << std::endl;
//test display the content of the vect<char>={'1', '2', ...}
for (int i=0; i< vectChar.size(); i++)
{
std::cout << vectChar[i];
}
// Now that the vector contains the needed list of char
// we need to convert it to char array (char*)
// first malloc
char* buffer = (char*) malloc(vectChar.size()*sizeof(char));
// then convert the vector into char*
std::copy(vectChar.begin(), vectChar.end(), buffer);
std::cout << std::endl;
//now buffer={'1', '2', ...}
// les ut stest by displaying
while ( *buffer != '\0')
{
printf("%c", *buffer);
buffer++;
}
}
You can run/check this code in https://repl.it/#JomaCorpFX/StringSplit#main.cpp
Code
#include <iostream>
#include <vector>
std::vector<std::string> Split(const std::string &data, const std::string &toFind)
{
std::vector<std::string> v;
if (data.empty() || toFind.empty())
{
v.push_back(data);
return v;
}
size_t ini = 0;
size_t pos;
while ((pos = data.find(toFind, ini)) != std::string::npos)
{
std::string s = data.substr(ini, pos - ini);
if (!s.empty())
{
v.push_back(s);
}
ini = pos + toFind.length();
}
if (ini < data.length())
{
v.push_back(data.substr(ini));
}
return v;
}
int main()
{
std::string x = "1+2+3";
for (auto value : Split(x, u8"+"))
{
std::cout << "Value: " << value << std::endl;
}
std::cout << u8"Press enter to continue... ";
std::cin.get();
return EXIT_SUCCESS;
}
Output
Value: 1
Value: 2
Value: 3
Press enter to continue...
I want to store words separated by spaces into single string elements in a vector.
The input is a string that may end or may not end in a symbol( comma, period, etc.)
All symbols will be separated by spaces too.
I created this function but it doesn't return me a vector of words.
vector<string> single_words(string sentence)
{
vector<string> word_vector;
string result_word;
for (size_t character = 0; character < sentence.size(); ++character)
{
if (sentence[character] == ' ' && result_word.size() != 0)
{
word_vector.push_back(result_word);
result_word = "";
}
else
result_word += character;
}
return word_vector;
}
What did I do wrong?
Your problem has already been resolved by answers and comments.
I would like to give you the additional information that such functionality is already existing in C++.
You could take advantage of the fact that the extractor operator extracts space separated tokens from a stream. Because a std::string is not a stream, we can put the string first into an std::istringstream and then extract from this stream vie the std:::istream_iterator.
We could life make even more easier.
Since roundabout 10 years we have a dedicated, special C++ functionality for splitting strings into tokens, explicitely designed for this purpose. The std::sregex_token_iterator. And because we have such a dedicated function, we should simply use it.
The idea behind it is the iterator concept. In C++ we have many containers and always iterators, to iterate over the similar elements in these containers. And a string, with similar elements (tokens), separated by a delimiter, can also be seen as such a container. And with the std::sregex:token_iterator, we can iterate over the elements/tokens/substrings of the string, splitting it up effectively.
This iterator is very powerfull and you can do really much much more fancy stuff with it. But that is too much for here. Important is that splitting up a string into tokens is a one-liner. For example a variable definition using a range constructor for iterating over the tokens.
See some examples below:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
#include <regex>
const std::regex delimiter{ " " };
const std::regex reWord{ "(\\w+)" };
int main() {
// Some debug print function
auto print = [](const std::vector<std::string>& sv) -> void {
std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, "\n")); std::cout << "\n"; };
// The test string
std::string test{ "word1 word2 word3 word4." };
//-----------------------------------------------------------------------------------------
// Solution 1: use istringstream and then extract from there
std::istringstream iss1(test);
// Define a vector (CTAD), use its range constructor and, the std::istream_iterator as iterator
std::vector words1(std::istream_iterator<std::string>(iss1), {});
print(words1); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 2: directly use dedicated function sregex_token iterator
std::vector<std::string> words2(std::sregex_token_iterator(test.begin(), test.end(), delimiter, -1), {});
print(words2); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 3: directly use dedicated function sregex_token iterator and look for words only
std::vector<std::string> words3(std::sregex_token_iterator(test.begin(), test.end(), reWord, 1), {});
print(words3); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 4: Use such iterator in an algorithm, to copy data to a vector
std::vector<std::string> words4{};
std::copy(std::sregex_token_iterator(test.begin(), test.end(), reWord, 1), {}, std::back_inserter(words4));
print(words4); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 5: Use such iterator in an algorithm for direct output
std::copy(std::sregex_token_iterator(test.begin(), test.end(), reWord, 1), {}, std::ostream_iterator<std::string>(std::cout,"\n"));
return 0;
}
You added the index instead of the character:
vector<string> single_words(string sentence)
{
vector<string> word_vector;
string result_word;
for (size_t i = 0; i < sentence.size(); ++i)
{
char character = sentence[i];
if (character == ' ' && result_word.size() != 0)
{
word_vector.push_back(result_word);
result_word = "";
}
else
result_word += character;
}
return word_vector;
}
Since your mistake was only due to the reason, that you named your iterator variable character even though it is actually not a character, but rather an iterator or index, I would like to suggest to use a ranged-base loop here, since it avoids this kind of confusion. The clean solution is obviously to do what #ArminMontigny said, but I assume you are prohibited to use stringstreams. The code would look like this:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<string> single_words(string sentence)
{
vector<string> word_vector;
string result_word;
for (char& character: sentence) // Now `character` is actually a character.
{
if (character==' ' && result_word.size() != 0)
{
word_vector.push_back(result_word);
result_word = "";
}
else
result_word += character;
}
word_vector.push_back(result_word); // In your solution, you forgot to push the last word into the vector.
return word_vector;
}
int main() {
string sentence="Maybe try range based loops";
vector<string> result= single_words(sentence);
for(string& word: result)
cout<<word<<" ";
return 0;
}
I'm building a small utility method that parses a line (a string) and returns a vector of all the words. The istringstream code I have below works fine except for when there is punctuation so naturally my fix is to want to "sanitize" the line before I run it through the while loop.
I would appreciate some help in using the regex library in c++ for this. My initial solution was to us substr() and go to town but that seems complicated as I'll have to iterate and test each character to see what it is then perform some operations.
vector<string> lineParser(Line * ln)
{
vector<string> result;
string word;
string line = ln->getLine();
istringstream iss(line);
while(iss)
{
iss >> word;
result.push_back(word);
}
return result;
}
Don't need to use regular expressions just for punctuation:
// Replace all punctuation with space character.
std::replace_if(line.begin(), line.end(),
std::ptr_fun<int, int>(&std::ispunct),
' '
);
Or if you want everything but letters and numbers turned into space:
std::replace_if(line.begin(), line.end(),
std::not1(std::ptr_fun<int,int>(&std::isalphanum)),
' '
);
While we are here:
Your while loop is broken and will push the last value into the vector twice.
It should be:
while(iss)
{
iss >> word;
if (iss) // If the read of a word failed. Then iss state is bad.
{ result.push_back(word);// Only push_back() if the state is not bad.
}
}
Or the more common version:
while(iss >> word) // Loop is only entered if the read of the word worked.
{
result.push_back(word);
}
Or you can use the stl:
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(result)
);
[^A-Za-z\s] should do what you need if your replace the matching characters by nothing. It should remove all characters that are not letters and spaces. Or [^A-Za-z0-9\s] if you want to keep numbers too.
You can use online tools like this one : http://gskinner.com/RegExr/ to test out your patterns (Replace tab). Indeed some modifications can be required based on the regex lib you are using.
I'm not positive, but I think this is what you're looking for:
#include<iostream>
#include<regex>
#include<vector>
int
main()
{
std::string line("some words: with some punctuation.");
std::regex words("[\\w]+");
std::sregex_token_iterator i(line.begin(), line.end(), words);
std::vector<std::string> list(i, std::sregex_token_iterator());
for (auto j = list.begin(), e = list.end(); j != e; ++j)
std::cout << *j << '\n';
}
some
words
with
some
punctuation
The simplest solution is probably to create a filtering
streambuf to convert all non alphanumeric characters to space,
then to read using std::copy:
class StripPunct : public std::streambuf
{
std::streambuf* mySource;
char myBuffer;
protected:
virtual int underflow()
{
int result = mySource->sbumpc();
if ( result != EOF ) {
if ( !::isalnum( result ) )
result = ' ';
myBuffer = result;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return result;
}
public:
explicit StripPunct( std::streambuf* source )
: mySource( source )
{
}
};
std::vector<std::string>
LineParser( std::istream& source )
{
StripPunct sb( source.rdbuf() );
std::istream src( &sb );
return std::vector<std::string>(
(std::istream_iterator<std::string>( src )),
(std::istream_iterator<std::string>()) );
}