This question already has answers here:
C++ function split string into words
(1 answer)
taking input of a string word by word
(3 answers)
Right way to split an std::string into a vector<string>
(12 answers)
Closed last year.
myStr = input("Enter something - ")
// say I enter "Hi there"
arrayStr = myStr.split()
print(arrayStr)
// Output: ['Hi', 'there']
What is the exact C++ equivalent of this code? (My aim is to further iterate over the array and perform comparisons with other arrays).
One way of doing this would be using std::vector and std::istringstream as shown below:
#include <iostream>
#include <string>
#include<sstream>
#include <vector>
int main()
{
std::string input, temp;
//take input from user
std::getline(std::cin, input);
//create a vector that will hold the individual words
std::vector<std::string> vectorOfString;
std::istringstream ss(input);
//go word by word
while(ss >> temp)
{
vectorOfString.emplace_back(temp);
}
//iterate over all elements of the vector and print them out
for(const std::string& element: vectorOfString)
{
std::cout<<element<<std::endl;
}
return 0;
}
You can use string_views to avoid generating copies of the input string (efficient in memory), it literally will give you views on the words in the string, like this :
#include <iostream>
#include <string_view>
#include <vector>
inline bool is_delimiter(const char c)
{
// order by frequency in your input for optimal performance
return (c == ' ') || (c == ',') || (c == '.') || (c == '\n') || (c == '!') || (c == '?');
}
auto split_view(const char* line)
{
const char* word_start_pos = line;
const char* p = line;
std::size_t letter_count{ 0 };
std::vector<std::string_view> words;
// while parsing hasn't seen the terminating 0
while(*p != '\0')
{
// if it is a character from a word then start counting the letters in the word
if (!is_delimiter(*p))
{
letter_count++;
}
else
{
//delimiter reached and word detected
if (letter_count > 0)
{
//add another string view to the characters in the input string
// this will call the constructor of string_view with arguments const char* and size
words.emplace_back(word_start_pos, letter_count);
// skip to the next word
word_start_pos += letter_count;
}
// skip delimiters for as long as you encounter them
word_start_pos++;
letter_count = 0ul;
}
// move on to the next character
++p;
}
return words;
}
int main()
{
auto words = split_view("the quick brown fox is fast. And the lazy dog is asleep!");
for (const auto& word : words)
{
std::cout << word << "\n";
}
return 0;
}
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template <typename Out>
void split(const std::string &s, char delim, Out result) {
std::istringstream iss(s);
std::string item;
while (std::getline(iss, item, delim)) {
*result++ = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
std::vector<std::string> x = split("one:two::three", ':');
Where 'x' is your converted array with 4 elements.
Basically #AnoopRana's solution but using STL algorithms and removing punctuation signs from words:
[Demo]
#include <cctype> // ispunct
#include <algorithm> // copy, transform
#include <iostream> // cout
#include <iterator> // istream_iterator, ostream_iterator
#include <sstream> // istringstream
#include <string>
#include <vector>
int main() {
const std::string s{"In the beginning, there was simply the event and its consequences."};
std::vector<std::string> ws{};
std::istringstream iss{s};
std::transform(std::istream_iterator<std::string>{iss}, {},
std::back_inserter(ws), [](std::string w) {
w.erase(std::remove_if(std::begin(w), std::end(w),
[](unsigned char c) { return std::ispunct(c); }),
std::end(w));
return w;
});
std::copy(std::cbegin(ws), std::cend(ws), std::ostream_iterator<std::string>{std::cout, "\n"});
}
// Outputs:
//
// In
// the
// beginning
// there
// was
// simply
// the
// event
// and
// its
// consequences
Related
I've been looking for ways to count the number of words in a string, but specifically for strings that may contain typos (i.e. "_This_is_a___test" as opposed to "This_is_a_test"). Most of the pages I've looked at only handle single spaces.
This is actually my first time programming in C++, and I don't have much other programming experience to speak of (2 years of college in C and Java). Although what I have is functional, I'm also aware it's complex, and I'm wondering if there is a more efficient way to achieve the same results?
This is what I have currently. Before I run the string through numWords(), I run it through a trim function that removes leading whitespace, then check that there are still characters remaining.
int numWords(string str) {
int count = 1;
for (int i = 0; i < str.size(); i++) {
if (str[i] == ' ' || str[i] == '\t' || str[i] == '\n') {
bool repeat = true;
int j = 1;
while (j < (str.size() - i) && repeat) {
if (str[i + j] != ' ' && str[i + j] != '\t' && str[i + j] != '\n') {
repeat = false;
i = i + j;
count++;
}
else
j++;
}
}
}
return count;
}
Also, I wrote mine to take a string argument, but most of the examples I've seen used (char* str) instead, which I wasn't sure how to use with my input string.
You don't need all those stringstreams to count word boundary
#include <string>
#include <cctype>
int numWords(std::string str)
{
bool space = true; // not in word
int count = 0;
for(auto c:str){
if(std::isspace(c))space=true;
else{
if(space)++count;
space=false;
}
}
return count;
}
One solution is to utilize std::istringstream to count the number of words and to skip over spaces automatically.
#include <sstream>
#include <string>
#include <iostream>
int numWords(std::string str)
{
int count = 0;
std::istringstream strm(str);
std::string word;
while (strm >> word)
++count;
return count;
}
int main()
{
std::cout << numWords(" This is a test ");
}
Output:
4
Albeit as mentioned std::istringstream is more "heavier" in terms of performance than writing your own loop.
Sam's comment made me write a function that does not allocate strings for words. But just creates string_views on the input string.
#include <cassert>
#include <cctype>
#include <vector>
#include <string_view>
#include <iostream>
std::vector<std::string_view> get_words(const std::string& input)
{
std::vector<std::string_view> words;
// the first word begins at an alpha character
auto begin_of_word = std::find_if(input.begin(), input.end(), [](const char c) { return std::isalpha(c); });
auto end_of_word = input.begin();
auto end_of_input = input.end();
// parse the whole string
while (end_of_word != end_of_input)
{
// as long as you see text characters move end_of_word one back
while ((end_of_word != end_of_input) && std::isalpha(*end_of_word)) end_of_word++;
// create a string view from begin of word to end of word.
// no new string memory will be allocated
// std::vector will do some dynamic memory allocation to store string_view (metadata of word positions)
words.emplace_back(begin_of_word, end_of_word);
// then skip all non readable characters.
while ((end_of_word != end_of_input) && !std::isalpha(*end_of_word) ) end_of_word++;
// and if we haven't reached the end then we are at the beginning of a new word.
if ( end_of_word != input.end()) begin_of_word = end_of_word;
}
return words;
}
int main()
{
std::string input{ "This, this is a test!" };
auto words = get_words(input);
for (const auto& word : words)
{
std::cout << word << "\n";
}
return 0;
}
You can use standard function std::distance with std::istringstream the following way
#include <iostream>
#include <sstream>
#include <string>
#include <iterator>
int main()
{
std::string s( " This is a test" );
std::istringstream iss( s );
auto count = std::distance( std::istream_iterator<std::string>( iss ),
std::istream_iterator<std::string>() );
std::cout << count << '\n';
}
The program output is
4
If you want you can place the call of std::distance in a separate function like
#include <iostream>
#include <sstream>
#include <string>
#include <iterator>
size_t numWords( const std::string &s )
{
std::istringstream iss( s );
return std::distance( std::istream_iterator<std::string>( iss ),
std::istream_iterator<std::string>() );
}
int main()
{
std::string s( " This is a test" );
std::cout << numWords( s ) << '\n';
}
If separators can include other characters apart from white space characters as for example punctuations then you should use methods of the class std::string or std::string_view find_first_of and find_first_not_of.
Here is a demonstration program.
#include <iostream>
#include <string>
#include <string_view>
size_t numWords( const std::string_view s, std::string_view delim = " \t" )
{
size_t count = 0;
for ( std::string_view::size_type pos = 0;
( pos = s.find_first_not_of( delim, pos ) ) != std::string_view::npos;
pos = s.find_first_of( delim, pos ) )
{
++count;
}
return count;
}
int main()
{
std::string s( "Is it a test ? Yes ! Now we will run it ..." );
std::cout << numWords( s, " \t!?.," ) << '\n';
}
The program output is
10
you can do it easily with regex
int numWords(std::string str)
{
std::regex re("\\S+"); // or `[^ \t\n]+` to exactly match the question
return std::distance(
std::sregex_iterator(str.begin(), str.end(), re),
std::sregex_iterator()
);
}
How to convert:
string x = "1+2+3";
to:
char y[] = {'1', '2', '3'};
What approach should I do?
The task is to split a string separated by '+'. In the below example, the delimiter ',' is used.
Splitting a string into tokens is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
Using old style std::strtok function. Maybe unsafe. Maybe should not be used any longer
std::getline. Most used implementation. But actually a "misuse" and not so flexible
Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
You can use an std::vector<std::string> instead of char[], that way, it would work with more than one-digit numbers. Try this:
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
int main() {
using namespace std;
std::string str("1+2+3");
std::string buff;
std::stringstream ss(str);
std::vector<std::string> result;
while(getline(ss, buff, '+')){
result.push_back(buff);
}
for(std::string num : result){
std::cout << num << std::endl;
}
}
Here is a coliru link to show it works with numbers having more than one digit.
Here are my steps:
convert the original string into char*
split the obtained char* with the delimiter + by using the function strtok. I store each token into a vector<char>
convert this vector<char> into a C char array char*
#include <iostream>
#include <string.h>
#include <vector>
using namespace std;
int main()
{
string line = "1+2+3";
std::vector<char> vectChar;
// convert the original string into a char array to allow splitting
char* input= (char*) malloc(sizeof(char)*line.size());
strcpy(input,line.data());
// splitting the string
char *token = strtok(input, "+");
int len=0;
while(token) {
std::cout << *token;
vectChar.push_back(*token);
token = strtok(NULL, "+");
}
// end of splitting step
std::cout << std::endl;
//test display the content of the vect<char>={'1', '2', ...}
for (int i=0; i< vectChar.size(); i++)
{
std::cout << vectChar[i];
}
// Now that the vector contains the needed list of char
// we need to convert it to char array (char*)
// first malloc
char* buffer = (char*) malloc(vectChar.size()*sizeof(char));
// then convert the vector into char*
std::copy(vectChar.begin(), vectChar.end(), buffer);
std::cout << std::endl;
//now buffer={'1', '2', ...}
// les ut stest by displaying
while ( *buffer != '\0')
{
printf("%c", *buffer);
buffer++;
}
}
You can run/check this code in https://repl.it/#JomaCorpFX/StringSplit#main.cpp
Code
#include <iostream>
#include <vector>
std::vector<std::string> Split(const std::string &data, const std::string &toFind)
{
std::vector<std::string> v;
if (data.empty() || toFind.empty())
{
v.push_back(data);
return v;
}
size_t ini = 0;
size_t pos;
while ((pos = data.find(toFind, ini)) != std::string::npos)
{
std::string s = data.substr(ini, pos - ini);
if (!s.empty())
{
v.push_back(s);
}
ini = pos + toFind.length();
}
if (ini < data.length())
{
v.push_back(data.substr(ini));
}
return v;
}
int main()
{
std::string x = "1+2+3";
for (auto value : Split(x, u8"+"))
{
std::cout << "Value: " << value << std::endl;
}
std::cout << u8"Press enter to continue... ";
std::cin.get();
return EXIT_SUCCESS;
}
Output
Value: 1
Value: 2
Value: 3
Press enter to continue...
I have a string of items (see code). I want to say when a specific item from that list is found. In my example I want the output to be 3 since the item is found after the first two items. I can print out the separate items to the console but I cannot figure out how to do a count on these two items. I think it is because of the while loop... I always get numbers like 11 instead of two separate 1s. Any tips? :)
#include <iostream>
#include <string>
using namespace std;
int main() {
string items = "box,cat,dog,cat";
string delim = ",";
size_t pos = 0;
string token;
string item1 = "dog";
int count = 0;
`;
while ((pos = items.find(delim)) != string::npos)
{
token = items.substr(0, pos);
if (token != item1)
{
cout << token << endl; //here I would like to increment count for every
//item before item1 (dog) is found
items.erase(0, pos + 1);
}
else if (token == item1)
return 0;
}
return 0; //output: box cat
}
I replaced your search algorithm with the method explode, that separates your string by a delimiter and returns a vector, which is better suited for searching and getting the element count:
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#include <algorithm>
std::vector<std::string> explode(const std::string& s, char delim)
{
std::vector<std::string> result;
std::istringstream iss(s);
for (std::string token; std::getline(iss, token, delim); )
{
result.push_back(std::move(token));
}
return result;
}
int main()
{
std::string items = "box,cat,dog,cat";
std::string item1 = "dog";
char delim = ',';
auto resultVec = explode(items, delim);
auto itResult = std::find_if(resultVec.begin(), resultVec.end()
, [&item1](const auto& resultString)
{
return item1 == resultString;
});
if (itResult != resultVec.end())
{
auto index(std::distance(resultVec.begin(), itResult) + 1); // index is zero based
std::cout << index;
}
return 0;
}
By using std::find_if you can get the position of item1 by iterator, which you can use with std::distance to get the count of elements that are in front of it.
Credits for the explode method go to this post: Is there an equivalent in C++ of PHP's explode() function?
There are many ways to Rome. Here an additional solution using a std::regex.
But main approach is the same as the accepted answer. Using modern C++17 language elements, it is a little bit more compact.
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <vector>
const std::regex re{ "," };
int main() {
std::string items{ "box,cat,dog,cat" };
// Split String and put all sub-items in a vector
std::vector subItems(std::sregex_token_iterator(items.begin(), items.end(), re, -1), {});
// Search and check if found and show result
if (auto it = std::find(subItems.begin(), subItems.end(), "dog"); it != subItems.end())
std::cout << "Found at position: " << std::distance(subItems.begin(), it) + 1 << '\n';
else
std::cout << "Not found.\n";
return 0;
}
The function takes a string containing of comma(,) separated numbers as string and converts into numbers. Sometimes it produces a garbage value at the end.
vector<int> parseInts(string str)
{
int as[200]={0};
int i=0,j=0;
for(;str[i]!='\0';i++)
{
while(str[i]!=','&&str[i]!='\0')
{as[j]= as[j]*10 +str[i] -'0';
i++;}
j++;
}
vector<int>rr;
for(int i=0;i<j;i++)
rr.push_back(as[i]);
return rr;
}
If you're writing in C++, use C++ features instead of C-style string manipulation. You can combine std::istringstream, std::getline(), and std::stoi() into a very short solution. (Also note that you should take the argument by const reference since you do not modify it.)
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
std::vector<int> parseInts(std::string const & str) {
std::vector<int> values;
std::istringstream src{str};
std::string buf;
while (std::getline(src, buf, ',')) {
// Note no error checking on this conversion -- exercise for the reader.
values.push_back(std::stoi(buf));
}
return values;
}
(Demo)
The code doesn't handle whitespace and inputs with more than 200 numbers.
An alternative working solution:
#include <iostream>
#include <sstream>
#include <iterator>
#include <algorithm>
#include <vector>
std::vector<int> parseInts(std::string s) {
std::replace(s.begin(), s.end(), ',', ' ');
std::istringstream ss(std::move(s));
return std::vector<int>{
std::istream_iterator<int>{ss},
std::istream_iterator<int>{}
};
}
int main() {
auto v = parseInts("1,2 , 3 ,,, 4,5,,,");
for(auto i : v)
std::cout << i << '\n';
}
Output:
1
2
3
4
5
You never really asked a question. If you are looking for an elegant method, then I provide that below. If you are asking us to debug the code, then that is a different matter.
First here is a nice utility for splitting a string
std::vector<std::string> split(const std::string& str, char delim) {
std::vector<std::string> strings;
size_t start;
size_t end = 0;
while ((start = str.find_first_not_of(delim, end)) != std::string::npos) {
end = str.find(delim, start);
strings.push_back(str.substr(start, end - start));
}
return strings;
}
First split the string on commas:
std::vector<std::string> strings = split(str, ',');
Then covert each to an int
std::vector<int> ints;
for (auto s : strings)
ints.push_back(std::stoi(s))
I need to split string by line.
I used to do in the following way:
int doSegment(char *sentence, int segNum)
{
assert(pSegmenter != NULL);
Logger &log = Logger::getLogger();
char delims[] = "\n";
char *line = NULL;
if (sentence != NULL)
{
line = strtok(sentence, delims);
while(line != NULL)
{
cout << line << endl;
line = strtok(NULL, delims);
}
}
else
{
log.error("....");
}
return 0;
}
I input "we are one.\nyes we are." and invoke the doSegment method. But when i debugging, i found the sentence parameter is "we are one.\\nyes we are", and the split failed. Can somebody tell me why this happened and what should i do. Is there anyway else i can use to split string in C++. thanks !
I'd like to use std::getline or std::string::find to go through the string.
below code demonstrates getline function
int doSegment(char *sentence)
{
std::stringstream ss(sentence);
std::string to;
if (sentence != NULL)
{
while(std::getline(ss,to,'\n')){
cout << to <<endl;
}
}
return 0;
}
You can call std::string::find in a loop and the use std::string::substr.
std::vector<std::string> split_string(const std::string& str,
const std::string& delimiter)
{
std::vector<std::string> strings;
std::string::size_type pos = 0;
std::string::size_type prev = 0;
while ((pos = str.find(delimiter, prev)) != std::string::npos)
{
strings.push_back(str.substr(prev, pos - prev));
prev = pos + delimiter.size();
}
// To get the last substring (or only, if delimiter is not found)
strings.push_back(str.substr(prev));
return strings;
}
See example here.
#include <sstream>
#include <string>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string& str)
{
auto result = std::vector<std::string>{};
auto ss = std::stringstream{str};
for (std::string line; std::getline(ss, line, '\n');)
result.push_back(line);
return result;
}
#include <iostream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> splitter(string in_pattern, string& content){
vector<string> split_content;
regex pattern(in_pattern);
copy( sregex_token_iterator(content.begin(), content.end(), pattern, -1),
sregex_token_iterator(),back_inserter(split_content));
return split_content;
}
int main()
{
string sentence = "This is the first line\n";
sentence += "This is the second line\n";
sentence += "This is the third line\n";
vector<string> lines = splitter(R"(\n)", sentence);
for (string line: lines){cout << line << endl;}
}
We have a string with multiple lines
we split those into an array (vector)
We print out those elements in a for loop
Using the library range-v3:
#include <range/v3/all.hpp>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string_view str) {
return str | ranges::views::split('\n')
| ranges::to<std::vector<std::string>>();
}
Using C++23 ranges:
#include <ranges>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string_view str) {
return str | std::ranges::views::split('\n')
| std::ranges::to<std::vector<std::string>>();
}
This fairly inefficient way just loops through the string until it encounters an \n newline escape character. It then creates a substring and adds it to a vector.
std::vector<std::string> Loader::StringToLines(std::string string)
{
std::vector<std::string> result;
std::string temp;
int markbegin = 0;
int markend = 0;
for (int i = 0; i < string.length(); ++i) {
if (string[i] == '\n') {
markend = i;
result.push_back(string.substr(markbegin, markend - markbegin));
markbegin = (i + 1);
}
}
return result;
}