Parse a C-string of floating numbers

Parse a C-string of floating numbers - c++

I have a C-string which contains a list of floating numbers separated by commas and spaces. Each pair of numbers is separated by one (or more) spaces and represents a point where the x and y fields are separated by a comma (and optionally by spaces).
" 10,9 2.5, 3 4 ,150.32 "
I need to parse this string in order to fill a list of Point(x, y).
Following is my current implementation:
const char* strPoints = getString();
std::istringstream sstream(strPoints);
float x, y;
char comma;
while (sstream >> x >> comma >> y)
{
myList.push(Point(x, y));
}
Since I need to parse a lot (up to 500,000) of these strings I'm wondering if there is a faster solution.

Look at Boost Spirit:
How to parse space-separated floats in C++ quickly?
It supports NaN, positive and negative infinity just fine. Also it allows you to express the constraining grammar succinctly.
Simple adaptation of the code
Here is the adapted sample for your grammar:
struct Point { float x,y; };
typedef std::vector<Point> data_t;
// And later:
bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);
The iterators can be any iterators. So you can hook it up with your C string just fine.
Here's a straight adaptation of the linked benchmark case. This shows you how to parse from any std::istream or directly from a memory mapped file.
Live On Coliru
Further optimizations (strictly for C strings)
Here's a version that doesn't need to know the length of the string up front (this is neat because it avoids the strlen call in case you didn't have the length available):
template <typename OI>
static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
bool ok = qi::phrase_parse(it, last,
*(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
qi::space);
if (!ok || !(it == last || *it == '\0')) {
throw it; // TODO proper error reporting?
}
}
Note how I made it take an output iterator so that you get to decide how to accumulate the results. The obvious wrapper to /just/ parse to a vector would be:
static inline data_t parse_points(char const* szInput) {
data_t pts;
parse_points(back_inserter(pts), szInput);
return pts;
}
But you can also do different things (like append to an existing container, that could have reserved a known capacity up front etc.). Things like this often allow truly optimized integration in the end.
Here's that code fully demo-ed in ~30 lines of essential code:
Live On Coliru
Extra Awesome Bonus
To show off the flexibility of this parser; if you just wanted to check the input and get a count of the points, you can replace the output iterator with a simple lambda function that increments a counter instead of adds a newly constructed point.
int main() {
int count = 0;
parse_points( " 10,9 2.5, 3 4 ,150.32 ", boost::make_function_output_iterator([&](Point const&){count++;}));
std::cout << "elements in sample: " << count << "\n";
}
Live On Coliru
Since everything is inlined the compiler will notice that the whole Point doesn't need to be constructed here and eliminate that code: http://paste.ubuntu.com/9781055/
The main function is seen directly invoking the very parser primitives. Handcoding the parser won't get you better tuning here, at least not without a lot of effort.

I got much better performance parsing out the points using a combination of std::find and std::strtof and the code wasn't much more complicated. Here's the test I ran:
#include <iostream>
#include <sstream>
#include <random>
#include <chrono>
#include <cctype>
#include <algorithm>
#include <cstdlib>
#include <forward_list>
struct Point { float x; float y; };
using PointList = std::forward_list<Point>;
using Clock = std::chrono::steady_clock;
using std::chrono::milliseconds;
std::string generate_points(int n) {
static auto random_generator = std::mt19937{std::random_device{}()};
std::ostringstream oss;
std::uniform_real_distribution<float> distribution(-1, 1);
for (int i=0; i<n; ++i) {
oss << distribution(random_generator) << " ," << distribution(random_generator) << "\t \n";
}
return oss.str();
}
PointList parse_points1(const char* s) {
std::istringstream iss(s);
PointList points;
float x, y;
char comma;
while (iss >> x >> comma >> y)
points.push_front(Point{x, y});
return points;
}
inline
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {
auto is_whitespace = [](char c) { return std::isspace(c); };
auto x_last = std::find(x_first, last, ',');
auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);
auto y_last = std::find_if(y_first, last, is_whitespace);
auto x = std::strtof(x_first, (char**)&x_last);
auto y = std::strtof(y_first, (char**)&y_last);
auto next_x_first = std::find_if_not(y_last, last, is_whitespace);
return std::make_tuple(Point{x, y}, next_x_first);
}
PointList parse_points2(const char* i, const char* last) {
PointList points;
Point point;
while (i != last) {
std::tie(point, i) = parse_point2(i, last);
points.push_front(point);
}
return points;
}
int main() {
auto s = generate_points(500000);
auto time0 = Clock::now();
auto points1 = parse_points1(s.c_str());
auto time1 = Clock::now();
auto points2 = parse_points2(s.data(), s.data() + s.size());
auto time2 = Clock::now();
std::cout << "using stringstream: "
<< std::chrono::duration_cast<milliseconds>(time1 - time0).count() << '\n';
std::cout << "using strtof: "
<< std::chrono::duration_cast<milliseconds>(time2 - time1).count() << '\n';
return 0;
}
outputs:
using stringstream: 1262
using strtof: 120

You can first try to disable the sychronization with the C I/O:
std::ios::sync_with_stdio(false);
Source: Using scanf() in C++ programs is faster than using cin?
You can also try to use alternatives to iostream:
boost_lexical_cast and define BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
scanf
I think you should give the sync_with_stdio(false) a try. The other alternatives require more coding, and I'm not sure that you will win much (if any).

Related

Reverse string and Palindrome

input : "A man, a plan, a canal: Panama"
Why does output capital char become small not capital?
output :
x: amanaPlanacanalpanam
y: AmanaplanacanalPanama
#include <iostream>
#include <stdbool.h>
#include <stdio.h>
#include<string.h>
using namespace std;
int isPalindrome(char* x) {
int i, j = 0;
char y[50];
int n = strlen(x);
for (i = 0; i < n; i++)
if ((x[i] >= 'a' && x[i] <= 'z') || (x[i] >= 'A' && x[i] <= 'Z'))
y[j++] = x[i];
y[j] = '\0';
n = strlen(y);
int c = n - 1;
for (i = 0; i < n - 1; i++) {
x[i] = y[c];
c--;
}
x[i] = '\0';
when i print x and y :
cout << "x: " << x << "\ny: " << y;
why char are small
output : x: amanaPlanacanalpanam
y: AmanaplanacanalPanama
if (strcmp(x, y) == 0)
return 1;
return 0;
}
int main() {
char x[50];
cout << "Enter : ";
scanf_s("%[^\n]", x, sizeof(x));
if (isPalindrome(x))
printf("\n true\n");
else
printf("\nfalse\n");
}
screen for output :
output screen

If you are using C++, you should use std::string instead and that will make your life easier and code much cleaner. So let's first create a function, that returns a string reversed:
std::string reverse( const std::string &s )
{
return std::string( s.rbegin(), s.rend() );
}
very simple, right? Now we need a function, that removes all non letters from a string, here it is:
std::string removeNonLetters( std::string s )
{
auto it = std::remove_if( s.begin(), s.end(), []( char c ) { return not std::isalpha( c ); } );
return std::string( s.begin(), it );
}
It is little bit more complicated, as we use remove-erase idiom from `std::remove_if() but it is a 2 line function. And last task - we need to make our string all lowercase:
std::string lowerCase( std::string s )
{
std::transform( s.begin(), s.end(), s.begin(), []( char c ) { return std::tolower( c ); } );
return s;
}
which uses another algo from standard library - std::transform(), here we can test them all live example, results:
amanaP :lanac a ,nalp a ,nam A
AmanaplanacanalPanama
a man, a plan, a canal: panama
Now combining these 3 functions you can easily achieve your task - to check if a string is a palindrome.
Note: if using standard algorithms is too complicated for you or it is too early for your class, you can easily rewrite them using loops. Point is you should write simple functions, test that they do work and then combine them into more complex task.

Your code is not really C++. It's C with cin and cout mixed in. C++ lets you do this in a much nicer way! You don't need stdbool header in C++, by the way.
First, let's see how it could be done in C++98, using built-in algorithms. Try it on online!: https://godbolt.org/z/j5s4r91s1
// C++98, v1
#include <algorithm>
#include <cassert>
#include <cctype>
#include <string>
// We adapt the C-compatible signatures to C++
char my_tolower(char c) { return std::tolower(c); }
// We pass the string by const reference, thus avoiding the copy.
// Passing by value would look like is_palindrome(const std::string str)
// and would be wasteful, since that copy is not needed at all.
bool is_palindrome(const std::string &input)
{
std::string filtered;
// For each character of the input string (within the {begin(), end()} range),
// insert at the back of the filtered string only if isalpha returns true
// for that character. This removes non-letters - space and punctuation.
for (std::string::const_iterator it = input.begin(); it != input.end(); ++it)
if (std::isalpha(*it))
filtered.push_back(*it);
if (input == "A man, a plan, a canal: Panama")
assert (filtered == "AmanaplanacanalPanama");
// Transform the filtered string to lower case: for each character
// in the {begin(), end()} range, apply the my_tolower function to it, and
// write the result back to the same range {begin(), ...}
std::transform (filtered.begin(), filtered.end(), filtered.begin(), my_tolower);
if (input == "A man, a plan, a canal: Panama")
assert (filtered == "amanaplanacanalpanama");
// Is the string in forward direction {begin(), end()}
// equal to the string in backward direction {rbegin(), ...}?
return std::equal (filtered.begin(), filtered.end(), filtered.rbegin());
}
int main()
{
// For testing, input can be provided directly in the code.
const char input[] = "A man, a plan, a canal: Panama";
// We now can plainly state that `input` contains a palindrome.
assert (is_palindrome(input));
}
Note how assert is used as a crude form of automated testing: we don't have to be entering inputs manually before we get the code working. Up till then, an assertion will suffice. Assertions are a means of conveying to both humans and machines that some condition is true. If the condition is not satisfied, the program terminates, usually with an error message.
Instead of the for loop, we should have used copy_if, but that's only available in C++11 and later:
std::copy_if (input.begin(), input.end(), std::back_inserter(filtered), my_isalpha);
But one shouldn't use the algorithms blindly. In our particular case, they don't improve performance nor make the code more generic. We can simplify things dramatically by explicitly iterating over the input string.
https://godbolt.org/z/s8YYMK3a4
// C++98, v2
#include <algorithm>
#include <cassert>
#include <cctype>
#include <string>
std::string filter_for_palindrome_check(const std::string &input)
{
std::string result;
for (std::string::const_iterator it = input.begin(); it != input.end(); ++it)
if (std::isalpha(*it))
result.push_back(std::tolower(*it));
return result;
}
bool is_palindrome(const std::string &input)
{
const std::string filtered = filter_for_palindrome_check(input);
return std::equal (filtered.begin(), filtered.end(), filtered.rbegin());
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (filter_for_palindrome_check(input) == "amanaplanacanalpanama");
assert (is_palindrome(input));
}
In C++11, we could have used the range-for.
https://godbolt.org/z/WGaezj3oq
// C++11, v1
#include <algorithm>
#include <cassert>
#include <cctype>
#include <string>
std::string filter_for_palindrome_check(const std::string &input)
{
std::string result;
for (char c : input)
if (std::isalpha(c))
result.push_back(std::tolower(c));
return result;
}
bool is_palindrome(const std::string &input)
{
const std::string filtered = filter_for_palindrome_check(input);
return std::equal (filtered.begin(), filtered.end(), filtered.rbegin());
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (filter_for_palindrome_check(input) == "amanaplanacanalpanama");
assert (is_palindrome(input));
}
Now we can begin to experiment with C++20. Note how the algorithm becomes less verbose even after this initial transformation. The pipe syntax is reminiscent of Unix.
https://godbolt.org/z/cM4ova8a5
// C++20, v1
#include <algorithm>
#include <cassert>
#include <cctype>
#include <ranges>
#include <string>
#include <string_view>
// Saves us from typing std:: every time we use these namespaces.
namespace views = std::ranges::views;
// We adapt the C-compatible signatures to C++
bool my_isalpha(char c) { return std::isalpha(c); }
char my_tolower(char c) { return std::tolower(c); }
// It's more general to use a string_view - always passed by value!
// - in place of const string &.
bool is_palindrome(std::string_view input)
{
// C++17 lets us do it all in a single `using` statement :)
using views::filter, views::transform, views::reverse;
std::string filtered;
std::ranges::copy(input | filter(my_isalpha) | transform(my_tolower),
std::back_inserter(filtered));
return std::ranges::equal(filtered, filtered | reverse);
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (is_palindrome(input));
}
The C++ standard that's coming after C++20 will likely make it easier to convert a range to a concrete value. In the meantime, Eric Niebler's range-v3 library does the trick.
https://godbolt.org/z/ojo9zWPr9
// C++20, v2 (using Eric Niebler's range-v3)
#include <algorithm>
#include <cassert>
#include <cctype>
#include <ranges>
#include <range/v3/to_container.hpp>
#include <string>
#include <string_view>
namespace views = std::ranges::views;
bool my_isalpha(char c) { return std::isalpha(c); }
char my_tolower(char c) { return std::tolower(c); }
bool is_palindrome(std::string_view input)
{
using views::filter, views::transform, views::reverse, ranges::to;
auto const filtered = input
| filter(my_isalpha) | transform(my_tolower) | to<std::string>();
return std::ranges::equal(filtered, filtered | reverse);
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (is_palindrome(input));
}
Note how the C++ code is now approximating a plain-language description of the algorithm: take the input, only alphanumeric part of it, in lower case, put it into a string. Call the input a palindrome if the string is equal to its reverse.
As you may have guessed, the type of filtered is const std::string.
You can of course ask: why not avoid the intermediate string? Say:
bool is_palindrome(std::string_view input)
{
using views::filter, views::transform, views::reverse;
auto const filtered = input | filter(my_isalpha) | transform(my_tolower);
return std::ranges::equal(filtered, filtered | reverse);
}
This won't compile until we make filtered non-const. That's a bit curious: are the compiler and library designers perhaps trying to tell us something? Yes!
Without said const, this approach works, but filtered here is a lazily evaluated range. Lazy evaluation means that the filtered result is computed only as needed. And here, it's needed twice: for the 1st and 2nd argument to std::ranges::equal. So this will, in general, make things worse even if it appears "simpler". Let's not write such things.
Hopefully this gives you some taste for how much C++ is a different language than C. There're some nifty things in C++, but not all of them are in the standard library. There is IMHO at least one somewhat better alternative to C++20 ranges, e.g. ast-al/rangeless. It's like C# LINQ, but in C++ :)
C++20 ranges work "well" when things are trivial, but become hard to use when you're expecting more complex expressions to "just work" - and they don't. So - the above C++20 examples must be read with this in mind. C++20 ranges aren't an answer to the ultimate question for sure.

The piece of code in which you fill x with the contents of y is not correct, since it does not consider the first element of y. If you start at index c = n - 1, but in the for loop you stop at i < n - 1, the last index you get from y is 2, not 1.
If you want to keep the original upper and lower cases, a possible solution would be to update both i and c in the loop conditions, and to assure that all the indices are correctly swapped:
for (i = 0, c = n - 1; i < n; i++, c--) {
x[i] = y[c];
}

How can you split string in C++ and store them in variables? [duplicate]

Java has a convenient split method:
String str = "The quick brown fox";
String[] results = str.split(" ");
Is there an easy way to do this in C++?

The Boost tokenizer class can make this sort of thing quite simple:
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(text, sep);
BOOST_FOREACH (const string& t, tokens) {
cout << t << "." << endl;
}
}
Updated for C++11:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer<char_separator<char>> tokens(text, sep);
for (const auto& t : tokens) {
cout << t << "." << endl;
}
}

Here's a real simple one:
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ' ')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}

C++ standard library algorithms are pretty universally based around iterators rather than concrete containers. Unfortunately this makes it hard to provide a Java-like split function in the C++ standard library, even though nobody argues that this would be convenient. But what would its return type be? std::vector<std::basic_string<…>>? Maybe, but then we’re forced to perform (potentially redundant and costly) allocations.
Instead, C++ offers a plethora of ways to split strings based on arbitrarily complex delimiters, but none of them is encapsulated as nicely as in other languages. The numerous ways fill whole blog posts.
At its simplest, you could iterate using std::string::find until you hit std::string::npos, and extract the contents using std::string::substr.
A more fluid (and idiomatic, but basic) version for splitting on whitespace would use a std::istringstream:
auto iss = std::istringstream{"The quick brown fox"};
auto str = std::string{};
while (iss >> str) {
process(str);
}
Using std::istream_iterators, the contents of the string stream could also be copied into a vector using its iterator range constructor.
Multiple libraries (such as Boost.Tokenizer) offer specific tokenisers.
More advanced splitting require regular expressions. C++ provides the std::regex_token_iterator for this purpose in particular:
auto const str = "The quick brown fox"s;
auto const re = std::regex{R"(\s+)"};
auto const vec = std::vector<std::string>(
std::sregex_token_iterator{begin(str), end(str), re, -1},
std::sregex_token_iterator{}
);

Another quick way is to use getline. Something like:
stringstream ss("bla bla");
string s;
while (getline(ss, s, ' ')) {
cout << s << endl;
}
If you want, you can make a simple split() method returning a vector<string>, which is
really useful.

Use strtok. In my opinion, there isn't a need to build a class around tokenizing unless strtok doesn't provide you with what you need. It might not, but in 15+ years of writing various parsing code in C and C++, I've always used strtok. Here is an example
char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
printf ("Token: %s\n", p);
p = strtok(NULL, " ");
}
A few caveats (which might not suit your needs). The string is "destroyed" in the process, meaning that EOS characters are placed inline in the delimter spots. Correct usage might require you to make a non-const version of the string. You can also change the list of delimiters mid parse.
In my own opinion, the above code is far simpler and easier to use than writing a separate class for it. To me, this is one of those functions that the language provides and it does it well and cleanly. It's simply a "C based" solution. It's appropriate, it's easy, and you don't have to write a lot of extra code :-)

You can use streams, iterators, and the copy algorithm to do this fairly directly.
#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>
int main()
{
std::string str = "The quick brown fox";
// construct a stream from the string
std::stringstream strstr(str);
// use stream iterators to copy the stream to the vector as whitespace separated strings
std::istream_iterator<std::string> it(strstr);
std::istream_iterator<std::string> end;
std::vector<std::string> results(it, end);
// send the vector to stdout.
std::ostream_iterator<std::string> oit(std::cout);
std::copy(results.begin(), results.end(), oit);
}

A solution using regex_token_iterators:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main()
{
string str("The quick brown fox");
regex reg("\\s+");
sregex_token_iterator iter(str.begin(), str.end(), reg, -1);
sregex_token_iterator end;
vector<string> vec(iter, end);
for (auto a : vec)
{
cout << a << endl;
}
}

No offense folks, but for such a simple problem, you are making things way too complicated. There are a lot of reasons to use Boost. But for something this simple, it's like hitting a fly with a 20# sledge.
void
split( vector<string> & theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter)
{
UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.
size_t start = 0, end = 0;
while ( end != string::npos)
{
end = theString.find( theDelimiter, start);
// If at end, use length=maxLength. Else use length=end-start.
theStringVector.push_back( theString.substr( start,
(end == string::npos) ? string::npos : end - start));
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size());
}
}
For example (for Doug's case),
#define SHOW(I,X) cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl
int
main()
{
vector<string> v;
split( v, "A:PEP:909:Inventory Item", ":" );
for (unsigned int i = 0; i < v.size(); i++)
SHOW( i, v[i] );
}
And yes, we could have split() return a new vector rather than passing one in. It's trivial to wrap and overload. But depending on what I'm doing, I often find it better to re-use pre-existing objects rather than always creating new ones. (Just as long as I don't forget to empty the vector in between!)
Reference: http://www.cplusplus.com/reference/string/string/.
(I was originally writing a response to Doug's question: C++ Strings Modifying and Extracting based on Separators (closed). But since Martin York closed that question with a pointer over here... I'll just generalize my code.)

Boost has a strong split function: boost::algorithm::split.
Sample program:
#include <vector>
#include <boost/algorithm/string.hpp>
int main() {
auto s = "a,b, c ,,e,f,";
std::vector<std::string> fields;
boost::split(fields, s, boost::is_any_of(","));
for (const auto& field : fields)
std::cout << "\"" << field << "\"\n";
return 0;
}
Output:
"a"
"b"
" c "
""
"e"
"f"
""

This is a simple STL-only solution (~5 lines!) using std::find and std::find_first_not_of that handles repetitions of the delimiter (like spaces or periods for instance), as well leading and trailing delimiters:
#include <string>
#include <vector>
void tokenize(std::string str, std::vector<string> &token_v){
size_t start = str.find_first_not_of(DELIMITER), end=start;
while (start != std::string::npos){
// Find next occurence of delimiter
end = str.find(DELIMITER, start);
// Push back the token found into vector
token_v.push_back(str.substr(start, end-start));
// Skip all occurences of the delimiter to find new start
start = str.find_first_not_of(DELIMITER, end);
}
}
Try it out live!

I know you asked for a C++ solution, but you might consider this helpful:
Qt
#include <QString>
...
QString str = "The quick brown fox";
QStringList results = str.split(" ");
The advantage over Boost in this example is that it's a direct one to one mapping to your post's code.
See more at Qt documentation

Here is a sample tokenizer class that might do what you want
//Header file
class Tokenizer
{
public:
static const std::string DELIMITERS;
Tokenizer(const std::string& str);
Tokenizer(const std::string& str, const std::string& delimiters);
bool NextToken();
bool NextToken(const std::string& delimiters);
const std::string GetToken() const;
void Reset();
protected:
size_t m_offset;
const std::string m_string;
std::string m_token;
std::string m_delimiters;
};
//CPP file
const std::string Tokenizer::DELIMITERS(" \t\n\r");
Tokenizer::Tokenizer(const std::string& s) :
m_string(s),
m_offset(0),
m_delimiters(DELIMITERS) {}
Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
m_string(s),
m_offset(0),
m_delimiters(delimiters) {}
bool Tokenizer::NextToken()
{
return NextToken(m_delimiters);
}
bool Tokenizer::NextToken(const std::string& delimiters)
{
size_t i = m_string.find_first_not_of(delimiters, m_offset);
if (std::string::npos == i)
{
m_offset = m_string.length();
return false;
}
size_t j = m_string.find_first_of(delimiters, i);
if (std::string::npos == j)
{
m_token = m_string.substr(i);
m_offset = m_string.length();
return true;
}
m_token = m_string.substr(i, j - i);
m_offset = j;
return true;
}
Example:
std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
v.push_back(s.GetToken());
}

pystring is a small library which implements a bunch of Python's string functions, including the split method:
#include <string>
#include <vector>
#include "pystring.h"
std::vector<std::string> chunks;
pystring::split("this string", chunks);
// also can specify a separator
pystring::split("this-string", chunks, "-");

I posted this answer for similar question.
Don't reinvent the wheel. I've used a number of libraries and the fastest and most flexible I have come across is: C++ String Toolkit Library.
Here is an example of how to use it that I've posted else where on the stackoverflow.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " \t\r\n\f";
const char *whitespace_and_punctuation = " \t\r\n\f;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}

Adam Pierce's answer provides an hand-spun tokenizer taking in a const char*. It's a bit more problematic to do with iterators because incrementing a string's end iterator is undefined. That said, given string str{ "The quick brown fox" } we can certainly accomplish this:
auto start = find(cbegin(str), cend(str), ' ');
vector<string> tokens{ string(cbegin(str), start) };
while (start != cend(str)) {
const auto finish = find(++start, cend(str), ' ');
tokens.push_back(string(start, finish));
start = finish;
}
Live Example
If you're looking to abstract complexity by using standard functionality, as On Freund suggests strtok is a simple option:
vector<string> tokens;
for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i);
If you don't have access to C++17 you'll need to substitute data(str) as in this example: http://ideone.com/8kAGoa
Though not demonstrated in the example, strtok need not use the same delimiter for each token. Along with this advantage though, there are several drawbacks:
strtok cannot be used on multiple strings at the same time: Either a nullptr must be passed to continue tokenizing the current string or a new char* to tokenize must be passed (there are some non-standard implementations which do support this however, such as: strtok_s)
For the same reason strtok cannot be used on multiple threads simultaneously (this may however be implementation defined, for example: Visual Studio's implementation is thread safe)
Calling strtok modifies the string it is operating on, so it cannot be used on const strings, const char*s, or literal strings, to tokenize any of these with strtok or to operate on a string who's contents need to be preserved, str would have to be copied, then the copy could be operated on
c++20 provides us with split_view to tokenize strings, in a non-destructive manner: https://topanswers.xyz/cplusplus?q=749#a874
The previous methods cannot generate a tokenized vector in-place, meaning without abstracting them into a helper function they cannot initialize const vector<string> tokens. That functionality and the ability to accept any white-space delimiter can be harnessed using an istream_iterator. For example given: const string str{ "The quick \tbrown \nfox" } we can do this:
istringstream is{ str };
const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() };
Live Example
The required construction of an istringstream for this option has far greater cost than the previous 2 options, however this cost is typically hidden in the expense of string allocation.
If none of the above options are flexable enough for your tokenization needs, the most flexible option is using a regex_token_iterator of course with this flexibility comes greater expense, but again this is likely hidden in the string allocation cost. Say for example we want to tokenize based on non-escaped commas, also eating white-space, given the following input: const string str{ "The ,qu\\,ick ,\tbrown, fox" } we can do this:
const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" };
const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() };
Live Example

Check this example. It might help you..
#include <iostream>
#include <sstream>
using namespace std;
int main ()
{
string tmps;
istringstream is ("the dellimiter is the space");
while (is.good ()) {
is >> tmps;
cout << tmps << "\n";
}
return 0;
}

If you're using C++ ranges - the full ranges-v3 library, not the limited functionality accepted into C++20 - you could do it this way:
auto results = str | ranges::views::tokenize(" ",1);
... and this is lazily-evaluated. You can alternatively set a vector to this range:
auto results = str | ranges::views::tokenize(" ",1) | ranges::to<std::vector>();
this will take O(m) space and O(n) time if str has n characters making up m words.
See also the library's own tokenization example, here.

MFC/ATL has a very nice tokenizer. From MSDN:
CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;
resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
printf("Resulting token: %s\n", resToken);
resToken= str.Tokenize("% #",curPos);
};
Output
Resulting Token: First
Resulting Token: Second
Resulting Token: Third

If you're willing to use C, you can use the strtok function. You should pay attention to multi-threading issues when using it.

For simple stuff I just use the following:
unsigned TokenizeString(const std::string& i_source,
const std::string& i_seperators,
bool i_discard_empty_tokens,
std::vector<std::string>& o_tokens)
{
unsigned prev_pos = 0;
unsigned pos = 0;
unsigned number_of_tokens = 0;
o_tokens.clear();
pos = i_source.find_first_of(i_seperators, pos);
while (pos != std::string::npos)
{
std::string token = i_source.substr(prev_pos, pos - prev_pos);
if (!i_discard_empty_tokens || token != "")
{
o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos));
number_of_tokens++;
}
pos++;
prev_pos = pos;
pos = i_source.find_first_of(i_seperators, pos);
}
if (prev_pos < i_source.length())
{
o_tokens.push_back(i_source.substr(prev_pos));
number_of_tokens++;
}
return number_of_tokens;
}
Cowardly disclaimer: I write real-time data processing software where the data comes in through binary files, sockets, or some API call (I/O cards, camera's). I never use this function for something more complicated or time-critical than reading external configuration files on startup.

You can simply use a regular expression library and solve that using regular expressions.
Use expression (\w+) and the variable in \1 (or $1 depending on the library implementation of regular expressions).

Many overly complicated suggestions here. Try this simple std::string solution:
using namespace std;
string someText = ...
string::size_type tokenOff = 0, sepOff = tokenOff;
while (sepOff != string::npos)
{
sepOff = someText.find(' ', sepOff);
string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff;
string token = someText.substr(tokenOff, tokenLen);
if (!token.empty())
/* do something with token */;
tokenOff = sepOff;
}

I thought that was what the >> operator on string streams was for:
string word; sin >> word;

Here's an approach that allows you control over whether empty tokens are included (like strsep) or excluded (like strtok).
#include <string.h> // for strchr and strlen
/*
* want_empty_tokens==true : include empty tokens, like strsep()
* want_empty_tokens==false : exclude empty tokens, like strtok()
*/
std::vector<std::string> tokenize(const char* src,
char delim,
bool want_empty_tokens)
{
std::vector<std::string> tokens;
if (src and *src != '\0') // defensive
while( true ) {
const char* d = strchr(src, delim);
size_t len = (d)? d-src : strlen(src);
if (len or want_empty_tokens)
tokens.push_back( std::string(src, len) ); // capture token
if (d) src += len+1; else break;
}
return tokens;
}

Seems odd to me that with all us speed conscious nerds here on SO no one has presented a version that uses a compile time generated look up table for the delimiter (example implementation further down). Using a look up table and iterators should beat std::regex in efficiency, if you don't need to beat regex, just use it, its standard as of C++11 and super flexible.
Some have suggested regex already but for the noobs here is a packaged example that should do exactly what the OP expects:
std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
std::smatch m{};
std::vector<std::string> ret{};
while (std::regex_search (it,end,m,e)) {
ret.emplace_back(m.str());
std::advance(it, m.position() + m.length()); //next start position = match position + match length
}
return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){ //comfort version calls flexible version
return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
auto v = split(str);
for(const auto&s:v){
std::cout << s << std::endl;
}
std::cout << "crazy version:" << std::endl;
v = split(str, std::regex{"[^e]+"}); //using e as delim shows flexibility
for(const auto&s:v){
std::cout << s << std::endl;
}
return 0;
}
If we need to be faster and accept the constraint that all chars must be 8 bits we can make a look up table at compile time using metaprogramming:
template<bool...> struct BoolSequence{}; //just here to hold bools
template<char...> struct CharSequence{}; //just here to hold chars
template<typename T, char C> struct Contains; //generic
template<char First, char... Cs, char Match> //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
Contains<CharSequence<Cs...>, Match>{}; //strip first and increase index
template<char First, char... Cs> //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {};
template<char Match> //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};
template<int I, typename T, typename U>
struct MakeSequence; //generic
template<int I, bool... Bs, typename U>
struct MakeSequence<I,BoolSequence<Bs...>, U>: //not last
MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U>
struct MakeSequence<0,BoolSequence<Bs...>,U>{ //last
using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
/* could be made constexpr but not yet supported by MSVC */
static bool isDelim(const char c){
static const bool table[256] = {Bs...};
return table[static_cast<int>(c)];
}
};
using Delims = CharSequence<'.',',',' ',':','\n'>; //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;
With that in place making a getNextToken function is easy:
template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
auto second = std::find_if(begin,end,Table{}); //find first delim or end
return std::make_pair(begin,second);
}
Using it is also easy:
int main() {
std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
auto it = std::begin(s);
auto end = std::end(s);
while(it != std::end(s)){
auto token = getNextToken(it,end);
std::cout << std::string(token.first,token.second) << std::endl;
it = token.second;
}
return 0;
}
Here is a live example: http://ideone.com/GKtkLQ

I know this question is already answered but I want to contribute. Maybe my solution is a bit simple but this is what I came up with:
vector<string> get_words(string const& text, string const& separator)
{
vector<string> result;
string tmp = text;
size_t first_pos = 0;
size_t second_pos = tmp.find(separator);
while (second_pos != string::npos)
{
if (first_pos != second_pos)
{
string word = tmp.substr(first_pos, second_pos - first_pos);
result.push_back(word);
}
tmp = tmp.substr(second_pos + separator.length());
second_pos = tmp.find(separator);
}
result.push_back(tmp);
return result;
}
Please comment if there is a better approach to something in my code or if something is wrong.
UPDATE: added generic separator

you can take advantage of boost::make_find_iterator. Something similar to this:
template<typename CH>
inline vector< basic_string<CH> > tokenize(
const basic_string<CH> &Input,
const basic_string<CH> &Delimiter,
bool remove_empty_token
) {
typedef typename basic_string<CH>::const_iterator string_iterator_t;
typedef boost::find_iterator< string_iterator_t > string_find_iterator_t;
vector< basic_string<CH> > Result;
string_iterator_t it = Input.begin();
string_iterator_t it_end = Input.end();
for(string_find_iterator_t i = boost::make_find_iterator(Input, boost::first_finder(Delimiter, boost::is_equal()));
i != string_find_iterator_t();
++i) {
if(remove_empty_token){
if(it != i->begin())
Result.push_back(basic_string<CH>(it,i->begin()));
}
else
Result.push_back(basic_string<CH>(it,i->begin()));
it = i->end();
}
if(it != it_end)
Result.push_back(basic_string<CH>(it,it_end));
return Result;
}

Here's my Swiss® Army Knife of string-tokenizers for splitting up strings by whitespace, accounting for single and double-quote wrapped strings as well as stripping those characters from the results. I used RegexBuddy 4.x to generate most of the code-snippet, but I added custom handling for stripping quotes and a few other things.
#include <string>
#include <locale>
#include <regex>
std::vector<std::wstring> tokenize_string(std::wstring string_to_tokenize) {
std::vector<std::wstring> tokens;
std::wregex re(LR"(("[^"]*"|'[^']*'|[^"' ]+))", std::regex_constants::collate);
std::wsregex_iterator next( string_to_tokenize.begin(),
string_to_tokenize.end(),
re,
std::regex_constants::match_not_null );
std::wsregex_iterator end;
const wchar_t single_quote = L'\'';
const wchar_t double_quote = L'\"';
while ( next != end ) {
std::wsmatch match = *next;
const std::wstring token = match.str( 0 );
next++;
if (token.length() > 2 && (token.front() == double_quote || token.front() == single_quote))
tokens.emplace_back( std::wstring(token.begin()+1, token.begin()+token.length()-1) );
else
tokens.emplace_back(token);
}
return tokens;
}

I wrote a simplified version (and maybe a little bit efficient) of https://stackoverflow.com/a/50247503/3976739 for my own use. I hope it would help.
void StrTokenizer(string& source, const char* delimiter, vector<string>& Tokens)
{
size_t new_index = 0;
size_t old_index = 0;
while (new_index != std::string::npos)
{
new_index = source.find(delimiter, old_index);
Tokens.emplace_back(source.substr(old_index, new_index-old_index));
if (new_index != std::string::npos)
old_index = ++new_index;
}
}

If the maximum length of the input string to be tokenized is known, one can exploit this and implement a very fast version. I am sketching the basic idea below, which was inspired by both strtok() and the "suffix array"-data structure described Jon Bentley's "Programming Perls" 2nd edition, chapter 15. The C++ class in this case only gives some organization and convenience of use. The implementation shown can be easily extended for removing leading and trailing whitespace characters in the tokens.
Basically one can replace the separator characters with string-terminating '\0'-characters and set pointers to the tokens withing the modified string. In the extreme case when the string consists only of separators, one gets string-length plus 1 resulting empty tokens. It is practical to duplicate the string to be modified.
Header file:
class TextLineSplitter
{
public:
TextLineSplitter( const size_t max_line_len );
~TextLineSplitter();
void SplitLine( const char *line,
const char sep_char = ',',
);
inline size_t NumTokens( void ) const
{
return mNumTokens;
}
const char * GetToken( const size_t token_idx ) const
{
assert( token_idx < mNumTokens );
return mTokens[ token_idx ];
}
private:
const size_t mStorageSize;
char *mBuff;
char **mTokens;
size_t mNumTokens;
inline void ResetContent( void )
{
memset( mBuff, 0, mStorageSize );
// mark all items as empty:
memset( mTokens, 0, mStorageSize * sizeof( char* ) );
// reset counter for found items:
mNumTokens = 0L;
}
};
Implementattion file:
TextLineSplitter::TextLineSplitter( const size_t max_line_len ):
mStorageSize ( max_line_len + 1L )
{
// allocate memory
mBuff = new char [ mStorageSize ];
mTokens = new char* [ mStorageSize ];
ResetContent();
}
TextLineSplitter::~TextLineSplitter()
{
delete [] mBuff;
delete [] mTokens;
}
void TextLineSplitter::SplitLine( const char *line,
const char sep_char /* = ',' */,
)
{
assert( sep_char != '\0' );
ResetContent();
strncpy( mBuff, line, mMaxLineLen );
size_t idx = 0L; // running index for characters
do
{
assert( idx < mStorageSize );
const char chr = line[ idx ]; // retrieve current character
if( mTokens[ mNumTokens ] == NULL )
{
mTokens[ mNumTokens ] = &mBuff[ idx ];
} // if
if( chr == sep_char || chr == '\0' )
{ // item or line finished
// overwrite separator with a 0-terminating character:
mBuff[ idx ] = '\0';
// count-up items:
mNumTokens ++;
} // if
} while( line[ idx++ ] );
}
A scenario of usage would be:
// create an instance capable of splitting strings up to 1000 chars long:
TextLineSplitter spl( 1000 );
spl.SplitLine( "Item1,,Item2,Item3" );
for( size_t i = 0; i < spl.NumTokens(); i++ )
{
printf( "%s\n", spl.GetToken( i ) );
}
output:
Item1
Item2
Item3

boost::spirit::x3 parsing is slower than strsep parsing

I wrote a x3 parser to parse a structured text file, here is the demo code:
int main() {
char buf[10240];
type_t example; // def see below
FILE* fp = fopen("text", "r");
while (fgets(buf, 10240, fp)) // read to the buffer
{
int n = strlen(buf);
example.clear();
if (client::parse_numbers(buf, buf+n, example)) // def see below
{ // do nothing here, only parse the buf and fill into the example }
}
}
struct type_t {
int id;
std::vector<int> fads;
std::vector<int> fbds;
std::vector<float> fvalues;
float target;
void clear() {
fads.clear();
fbds.clear();
fvalues.clear();
}
};
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last, type_t& example)
{
using x3::int_;
using x3::double_;
using x3::phrase_parse;
using x3::parse;
using x3::_attr;
using ascii::space;
auto fn_id = [&](auto& ctx) { example.id = _attr(ctx); };
auto fn_fad = [&](auto& ctx) { example.fads.push_back(_attr(ctx)); };
auto fn_fbd = [&](auto& ctx) { example.fbds.push_back(_attr(ctx)); };
auto fn_value = [&](auto& ctx) { example.fvalues.push_back(_attr(ctx)); };
auto fn_target = [&](auto& ctx) { example.target = _attr(ctx); };
bool r = phrase_parse(first, last,
// Begin grammar
(
int_[fn_id] >>
double_[fn_target] >>
+(int_[fn_fad] >> ':' >> int_[fn_fbd] >> ':' >> double_[fn_value])
)
,
// End grammar
space);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
//]
}
Am I doing it the right way or how to improve? I'd like to see if any optimization could be done before I switch back to my strsep parsing implementation, since it's much faster than this x3 version.

Why do you use semantic actions for this? An interesting point to read about is sehe's article Boost Spirit: “Semantic actions are evil”? and other notes about.
Parsing into an AST structure as shown by the X3 examples, e.g. Employee - Parsing into structs is IMO much more natural. You need the visitor pattern to evaluate the data later on.
One solution is shown here:
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
namespace ast {
struct triple {
double fad;
double fbd;
double value;
};
struct data {
int id;
double target;
std::vector<ast::triple> triple;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::triple, fad, fbd, value)
BOOST_FUSION_ADAPT_STRUCT(ast::data, id, target, triple)
namespace x3 = boost::spirit::x3;
namespace parser {
using x3::int_; using x3::double_;
auto const triple = x3::rule<struct _, ast::triple>{ "triple" } =
int_ >> ':' >> int_ >> ':' >> double_;
auto const data = x3::rule<struct _, ast::data>{ "data" } =
int_ >> double_ >> +triple;
}
int main()
{
std::stringstream buffer;
std::ifstream file{ R"(C:\data.txt)" };
if(file.is_open()) {
buffer << file.rdbuf();
file.close();
}
auto iter = std::begin(buffer.str());
auto const end = std::cend(buffer.str());
ast::data data;
bool parse_ok = x3::phrase_parse(iter, end, parser::data, x3::space, data);
if(parse_ok && (iter == end)) return true;
return false;
}
It does compile (see Wandbox), but isn't tested due to missing input data (which you can generate by you own inside the main() of course), but you are interested in benchmarking only.
Also note the use of stringstream to read the rdbuf. The are several ways to skin the cat, I refer here to How to read in a file in C++ where the rdbufreading approach is fast.
Further, how did you benchmark? Simply measure the time required by x3::phrase_parse() resp. strsep part only or the hole binary? file loading time inclusive? It must be compareable! Also consider OS filesystem caching etc.
BTW, it would be interesting to see the results and the test environment (data file size, strsep implementation etc).
Addendum:
If you approximately know how much data you can expect, you can pre-allocate memory for the vector using data.triple.reserve(10240); (or write an own constructor with this as arg). This prevents re-allocating during parsing (don't forget to enclose this into try/catch block to capture std::bad_alloc etc.). IIR the default capacity is 1000 on older gcc.

How to generate 'consecutive' c++ strings?

I would like to generate consecutive C++ strings like e.g. in cameras: IMG001, IMG002 etc. being able to indicate the prefix and the string length.
I have found a solution where I can generate random strings from concrete character set: link
But I cannot find the thing I want to achieve.

A possible solution:
#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
std::string make_string(const std::string& a_prefix,
size_t a_suffix,
size_t a_max_length)
{
std::ostringstream result;
result << a_prefix <<
std::setfill('0') <<
std::setw(a_max_length - a_prefix.length()) <<
a_suffix;
return result.str();
}
int main()
{
for (size_t i = 0; i < 100; i++)
{
std::cout << make_string("IMG", i, 6) << "\n";
}
return 0;
}
See online demo at http://ideone.com/HZWmtI.

Something like this would work
#include <string>
#include <iomanip>
#include <sstream>
std::string GetNextNumber( int &lastNum )
{
std::stringstream ss;
ss << "IMG";
ss << std::setfill('0') << std::setw(3) << lastNum++;
return ss.str();
}
int main()
{
int x = 1;
std::string s = GetNextNumber( x );
s = GetNextNumber( x );
return 0;
}
You can call GetNextNumber repeatedly with an int reference to generate new image numbers. You can always use sprintf but it won't be the c++ way :)

const int max_size = 7 + 1; // maximum size of the name plus one
char buf[max_size];
for (int i = 0 ; i < 1000; ++i) {
sprintf(buf, "IMG%.04d", i);
printf("The next name is %s\n", buf);
}

char * seq_gen(char * prefix) {
static int counter;
char * result;
sprintf(result, "%s%03d", prefix, counter++);
return result;
}
This would print your prefix with 3 digit padding string. If you want a lengthy string, all you have to do is provide the prefix as much as needed and change the %03d in the above code to whatever length of digit padding you want.

Well, the idea is rather simple. Just store the current number and increment it each time new string is generated. You can implement it to model an iterator to reduce the fluff in using it (you can then use standard algorithms with it). Using Boost.Iterator (it should work with any string type, too):
#include <boost/iterator/iterator_facade.hpp>
#include <sstream>
#include <iomanip>
// can't come up with a better name
template <typename StringT, typename OrdT>
struct ordinal_id_generator : boost::iterator_facade<
ordinal_id_generator<StringT, OrdT>, StringT,
boost::forward_traversal_tag, StringT
> {
ordinal_id_generator(
const StringT& prefix = StringT(),
typename StringT::size_type suffix_length = 5, OrdT initial = 0
) : prefix(prefix), suffix_length(suffix_length), ordinal(initial)
{}
private:
StringT prefix;
typename StringT::size_type suffix_length;
OrdT ordinal;
friend class boost::iterator_core_access;
void increment() {
++ordinal;
}
bool equal(const ordinal_id_generator& other) const {
return (
ordinal == other.ordinal
&& prefix == other.prefix
&& suffix_length == other.suffix_length
);
}
StringT dereference() const {
std::basic_ostringstream<typename StringT::value_type> ss;
ss << prefix << std::setfill('0')
<< std::setw(suffix_length) << ordinal;
return ss.str();
}
};
And example code:
#include <string>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef ordinal_id_generator<std::string, unsigned> generator;
int main() {
std::ostream_iterator<std::string> out(std::cout, "\n");
std::copy_n(generator("IMG"), 5, out);
// can even behave as a range
std::copy(generator("foo", 1, 2), generator("foo", 1, 4), out);
return 0;
}

Take a look at the standard library's string streams. Have an integer that you increment, and insert into the string stream after every increment. To control the string length, there's the concept of fill characters, and the width() member function.

You have many ways of doing that.
The generic one would be to, like the link that you showed, have an array of possible characters. Then after each iteration, you start from right-most character, increment it (that is, change it to the next one in the possible characters list) and if it overflowed, set it to the first one (index 0) and go the one on the left. This is exactly like incrementing a number in base, say 62.
In your specific example, you are better off with creating the string from another string and a number.
If you like *printf, you can write a string with "IMG%04d" and have the parameter go from 0 to whatever.
If you like stringstream, you can similarly do so.

What exactly do you mean by consecutive strings ?
Since you've mentioned that you're using C++ strings, try using the .string::append method.
string str, str2;
str.append("A");
str.append(str2);
Lookup http://www.cplusplus.com/reference/string/string/append/ for more overloaded calls of the append function.

it's pseudo code. you'll understand what i mean :D
int counter = 0, retval;
do
{
char filename[MAX_PATH];
sprintf(filename, "IMG00%d", counter++);
if(retval = CreateFile(...))
//ok, return
}while(!retval);

You have to keep a counter that is increased everytime you get a new name. This counter has to be saved when your application is ends, and loaded when you application starts.
Could be something like this:
class NameGenerator
{
public:
NameGenerator()
: m_counter(0)
{
// Code to load the counter from a file
}
~NameGenerator()
{
// Code to save the counter to a file
}
std::string get_next_name()
{
// Combine your preferred prefix with your counter
// Increase the counter
// Return the string
}
private:
int m_counter;
}
NameGenerator my_name_generator;
Then use it like this:
std::string my_name = my_name_generator.get_next_name();

How to implode a vector of strings into a string (the elegant way)

I'm looking for the most elegant way to implode a vector of strings into a string. Below is the solution I'm using now:
static std::string& implode(const std::vector<std::string>& elems, char delim, std::string& s)
{
for (std::vector<std::string>::const_iterator ii = elems.begin(); ii != elems.end(); ++ii)
{
s += (*ii);
if ( ii + 1 != elems.end() ) {
s += delim;
}
}
return s;
}
static std::string implode(const std::vector<std::string>& elems, char delim)
{
std::string s;
return implode(elems, delim, s);
}
Is there any others out there?

Use boost::algorithm::join(..):
#include <boost/algorithm/string/join.hpp>
...
std::string joinedString = boost::algorithm::join(elems, delim);
See also this question.

std::vector<std::string> strings;
const char* const delim = ", ";
std::ostringstream imploded;
std::copy(strings.begin(), strings.end(),
std::ostream_iterator<std::string>(imploded, delim));
(include <string>, <vector>, <sstream> and <iterator>)
If you want to have a clean end (no trailing delimiter) have a look here

You should use std::ostringstream rather than std::string to build the output (then you can call its str() method at the end to get a string, so your interface need not change, only the temporary s).
From there, you could change to using std::ostream_iterator, like so:
copy(elems.begin(), elems.end(), ostream_iterator<string>(s, delim));
But this has two problems:
delim now needs to be a const char*, rather than a single char. No big deal.
std::ostream_iterator writes the delimiter after every single element, including the last. So you'd either need to erase the last one at the end, or write your own version of the iterator which doesn't have this annoyance. It'd be worth doing the latter if you have a lot of code that needs things like this; otherwise the whole mess might be best avoided (i.e. use ostringstream but not ostream_iterator).

Because I love one-liners (they are very useful for all kinds of weird stuff, as you'll see at the end), here's a solution using std::accumulate and C++11 lambda:
std::accumulate(alist.begin(), alist.end(), std::string(),
[](const std::string& a, const std::string& b) -> std::string {
return a + (a.length() > 0 ? "," : "") + b;
} )
I find this syntax useful with stream operator, where I don't want to have all kinds of weird logic out of scope from the stream operation, just to do a simple string join. Consider for example this return statement from method that formats a string using stream operators (using std;):
return (dynamic_cast<ostringstream&>(ostringstream()
<< "List content: " << endl
<< std::accumulate(alist.begin(), alist.end(), std::string(),
[](const std::string& a, const std::string& b) -> std::string {
return a + (a.length() > 0 ? "," : "") + b;
} ) << endl
<< "Maybe some more stuff" << endl
)).str();
Update:
As pointed out by #plexando in the comments, the above code suffers from misbehavior when the array starts with empty strings due to the fact that the check for "first run" is missing previous runs that have resulted in no additional characters, and also - it is weird to run a check for "is first run" on all runs (i.e. the code is under-optimized).
The solution for both of these problems is easy if we know for a fact that the list has at least one element. OTOH, if we know for a fact that the list does not have at least one element, then we can shorten the run even more.
I think the resulting code isn't as pretty, so I'm adding it here as The Correct Solution, but I think the discussion above still has merrit:
alist.empty() ? "" : /* leave early if there are no items in the list */
std::accumulate( /* otherwise, accumulate */
++alist.begin(), alist.end(), /* the range 2nd to after-last */
*alist.begin(), /* and start accumulating with the first item */
[](auto& a, auto& b) { return a + "," + b; });
Notes:
For containers that support direct access to the first element, its probably better to use that for the third argument instead, so alist[0] for vectors.
As per the discussion in the comments and chat, the lambda still does some copying. This can be minimized by using this (less pretty) lambda instead: [](auto&& a, auto&& b) -> auto& { a += ','; a += b; return a; }) which (on GCC 10) improves performance by more than x10. Thanks to #Deduplicator for the suggestion. I'm still trying to figure out what is going on here.

I like to use this one-liner accumulate (no trailing delimiter):
(std::accumulate defined in <numeric>)
std::accumulate(
std::next(elems.begin()),
elems.end(),
elems[0],
[](std::string a, std::string b) {
return a + delimiter + b;
}
);

what about simple stupid solution?
std::string String::join(const std::vector<std::string> &lst, const std::string &delim)
{
std::string ret;
for(const auto &s : lst) {
if(!ret.empty())
ret += delim;
ret += s;
}
return ret;
}

With fmt you can do.
#include <fmt/format.h>
auto s = fmt::format("{}",fmt::join(elems,delim));
But I don't know if join will make it to std::format.

string join(const vector<string>& vec, const char* delim)
{
stringstream res;
copy(vec.begin(), vec.end(), ostream_iterator<string>(res, delim));
return res.str();
}

Especially with bigger collections, you want to avoid having to check if youre still adding the first element or not to ensure no trailing separator...
So for the empty or single-element list, there is no iteration at all.
Empty ranges are trivial: return "".
Single element or multi-element can be handled perfectly by accumulate:
auto join = [](const auto &&range, const auto separator) {
if (range.empty()) return std::string();
return std::accumulate(
next(begin(range)), // there is at least 1 element, so OK.
end(range),
range[0], // the initial value
[&separator](auto result, const auto &value) {
return result + separator + value;
});
};
Running sample (require C++14): http://cpp.sh/8uspd

A version that uses std::accumulate:
#include <numeric>
#include <iostream>
#include <string>
struct infix {
std::string sep;
infix(const std::string& sep) : sep(sep) {}
std::string operator()(const std::string& lhs, const std::string& rhs) {
std::string rz(lhs);
if(!lhs.empty() && !rhs.empty())
rz += sep;
rz += rhs;
return rz;
}
};
int main() {
std::string a[] = { "Hello", "World", "is", "a", "program" };
std::string sum = std::accumulate(a, a+5, std::string(), infix(", "));
std::cout << sum << "\n";
}

While I would normally recommend using Boost as per the top answer, I recognise that in some projects that's not desired.
The STL solutions suggested using std::ostream_iterator will not work as intended - it'll append a delimiter at the end.
There is now a way to do this with modern C++ using std::experimental::ostream_joiner:
std::ostringstream outstream;
std::copy(strings.begin(),
strings.end(),
std::experimental::make_ostream_joiner(outstream, delimiter.c_str()));
return outstream.str();

Here's what I use, simple and flexible
string joinList(vector<string> arr, string delimiter)
{
if (arr.empty()) return "";
string str;
for (auto i : arr)
str += i + delimiter;
str = str.substr(0, str.size() - delimiter.size());
return str;
}
using:
string a = joinList({ "a", "bbb", "c" }, "!##");
output:
a!##bbb!##c

Here is another one that doesn't add the delimiter after the last element:
std::string concat_strings(const std::vector<std::string> &elements,
const std::string &separator)
{
if (!elements.empty())
{
std::stringstream ss;
auto it = elements.cbegin();
while (true)
{
ss << *it++;
if (it != elements.cend())
ss << separator;
else
return ss.str();
}
}
return "";

Using part of this answer to another question gives you a joined this, based on a separator without a trailing comma,
Usage:
std::vector<std::string> input_str = std::vector<std::string>({"a", "b", "c"});
std::string result = string_join(input_str, ",");
printf("%s", result.c_str());
/// a,b,c
Code:
std::string string_join(const std::vector<std::string>& elements, const char* const separator)
{
switch (elements.size())
{
case 0:
return "";
case 1:
return elements[0];
default:
std::ostringstream os;
std::copy(elements.begin(), elements.end() - 1, std::ostream_iterator<std::string>(os, separator));
os << *elements.rbegin();
return os.str();
}
}

Another simple and good solution is using ranges v3. The current version is C++14 or greater, but there are older versions that are C++11 or greater. Unfortunately, C++20 ranges don't have the intersperse function.
The benefits of this approach are:
Elegant
Easily handle empty strings
Handles the last element of the list
Efficiency. Because ranges are lazily evaluated.
Small and useful library
Functions breakdown(Reference):
accumulate = Similar to std::accumulate but arguments are a range and the initial value. There is an optional third argument that is the operator function.
filter = Like std::filter, filter the elements that don't fit the predicate.
intersperse = The key function! Intersperses a delimiter between range input elements.
#include <iostream>
#include <string>
#include <vector>
#include <range/v3/numeric/accumulate.hpp>
#include <range/v3/view/filter.hpp>
#include <range/v3/view/intersperse.hpp>
int main()
{
using namespace ranges;
// Can be any std container
std::vector<std::string> a{ "Hello", "", "World", "is", "", "a", "program" };
std::string delimiter{", "};
std::string finalString =
accumulate(a | views::filter([](std::string s){return !s.empty();})
| views::intersperse(delimiter)
, std::string());
std::cout << finalString << std::endl; // Hello, World, is, a, program
}

A possible solution with ternary operator ?:.
std::string join(const std::vector<std::string> & v, const std::string & delimiter = ", ") {
std::string result;
for (size_t i = 0; i < v.size(); ++i) {
result += (i ? delimiter : "") + v[i];
}
return result;
}
join({"2", "4", "5"}) will give you 2, 4, 5.

If you are already using a C++ base library (for commonly used tools), string-processing features are typically included. Besides Boost mentioned above, Abseil provides:
std::vector<std::string> names {"Linus", "Dennis", "Ken"};
std::cout << absl::StrJoin(names, ", ") << std::endl;
Folly provides:
std::vector<std::string> names {"Linus", "Dennis", "Ken"};
std::cout << folly::join(", ", names) << std::endl;
Both give the string "Linus, Dennis, Ken".

Slightly long solution, but doesn't use std::ostringstream, and doesn't require a hack to remove the last delimiter.
http://www.ideone.com/hW1M9
And the code:
struct appender
{
appender(char d, std::string& sd, int ic) : delim(d), dest(sd), count(ic)
{
dest.reserve(2048);
}
void operator()(std::string const& copy)
{
dest.append(copy);
if (--count)
dest.append(1, delim);
}
char delim;
mutable std::string& dest;
mutable int count;
};
void implode(const std::vector<std::string>& elems, char delim, std::string& s)
{
std::for_each(elems.begin(), elems.end(), appender(delim, s, elems.size()));
}

This can be solved using boost
#include <boost/range/adaptor/filtered.hpp>
#include <boost/algorithm/string/join.hpp>
#include <boost/algorithm/algorithm.hpp>
std::vector<std::string> win {"Stack", "", "Overflow"};
const std::string Delimitor{","};
const std::string combined_string =
boost::algorithm::join(win |
boost::adaptors::filtered([](const auto &x) {
return x.size() != 0;
}), Delimitor);
Output:
combined_string: "Stack,Overflow"

I'm using the following approach that works fine in C++17. The function starts checking if the given vector is empty, in which case returns an empty string. If that's not the case, it takes the first element from the vector, then iterates from the second one until the end and appends the separator followed by the vector element.
template <typename T>
std::basic_string<T> Join(std::vector<std::basic_string<T>> vValues,
std::basic_string<T> strDelim)
{
std::basic_string<T> strRet;
typename std::vector<std::basic_string<T>>::iterator it(vValues.begin());
if (it != vValues.end()) // The vector is not empty
{
strRet = *it;
while (++it != vValues.end()) strRet += strDelim + *it;
}
return strRet;
}
Usage example:
std::vector<std::string> v1;
std::vector<std::string> v2 { "Hello" };
std::vector<std::string> v3 { "Str1", "Str2" };
std::cout << "(1): " << Join<char>(v1, ",") << std::endl;
std::cout << "(2): " << Join<char>(v2, "; ") << std::endl;
std::cout << "(3): [" << Join<char>(v3, "] [") << "]" << std::endl;
Output:
(1):
(2): Hello
(3): [Str1] [Str2]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse a C-string of floating numbers - c++

Related

Reverse string and Palindrome

How can you split string in C++ and store them in variables? [duplicate]

boost::spirit::x3 parsing is slower than strsep parsing

How to generate 'consecutive' c++ strings?

How to implode a vector of strings into a string (the elegant way)

Categories

Resources