Splitting a vector into values - c++

Say I have a vector of values from a tokenizing function, tokenize(). I know it will only have two values. I want to store the first value in a and the second in b. In Python, I would do:
a, b = string.split(' ')
I could do it as such in an ugly way:
vector<string> tokens = tokenize(string);
string a = tokens[0];
string b = tokens[1];
But that requires two extra lines of code, an extra variable, and less readability.
How would I do such a thing in C++ in a clean and efficient way?
EDIT: I must emphasize that efficiency is very important. Too many answers don't satisfy this. This includes modifying my tokenization function.
EDIT 2: I am using C++11 for reasons outside of my control and I also cannot use Boost.

With structured bindings (definitely will be in C++17), you'd be able to write something like:
auto [a,b] = as_tuple<2>(tokenize(str));
where as_tuple<N> is some to-be-declared function that converts a vector<string> to a tuple<string, string, ... N times ...>, probably throwing if the sizes don't match. You can't destructure a std::vector since it's size isn't known at compile time. This will necessarily do extra moves of the string so you're losing some efficiency in order to gain some code clarity. Maybe that's ok.
Or maybe you write a tokenize<N> that returns a tuple<string, string, ... N times ...> directly, avoiding the extra move. In that case:
auto [a, b] = tokenize<2>(str);
is great.
Before C++17, what you have is what you can do. But just make your variables references:
std::vector<std::string> tokens = tokenize(str);
std::string& a = tokens[0];
std::string& b = tokens[1];
Yeah, it's a couple extra lines of code. That's not the end of the world. It's easy to understand.

If you "know it will only have two values", you could write something like:
#include <cassert>
#include <iostream>
#include <string>
#include <tuple>
std::pair<std::string, std::string> tokenize(const std::string &text)
{
const auto pos(text.find(' '));
assert(pos != std::string::npos);
return {text.substr(0, pos), text.substr(pos + 1)};
}
your code is a great example of the power of STL but it's probably a bit slower.
int main()
{
std::string a, b;
std::tie(a, b) = tokenize("first second");
std::cout << a << " " << b << '\n';
}
Unfortunately without structured bindings (C++17) you have to use the std::tie hack and the variables a and b have to exist.

Ideally you'd rewrite the tokenize() function so that it returns a pair of strings rather than a vector:
std::pair<std::string, std::string> tokenize(const std::string& str);
Or you would pass two references to empty strings to the function as parameters.
void tokenize(const std::string& str, std::string& result_1, std::string& result_2);
If you have no control over the tokenize function the best you can do is move the strings out of the vector in an optimal way.
std::vector<std::string> tokens = tokenize(str);
std::string a = std::move(tokens.first());
std::string b = std::move(tokens.last());

Related

How to efficiently get a `string_view` for a substring of `std::string`

Using http://en.cppreference.com/w/cpp/string/basic_string_view as a reference, I see no way to do this more elegantly:
std::string s = "hello world!";
std::string_view v = s;
v = v.substr(6, 5); // "world"
Worse, the naive approach is a pitfall and leaves v a dangling reference to a temporary:
std::string s = "hello world!";
std::string_view v(s.substr(6, 5)); // OOPS!
I seem to remember something like there might be an addition to the standard library to return a substring as a view:
auto v(s.substr_view(6, 5));
I can think of the following workarounds:
std::string_view(s).substr(6, 5);
std::string_view(s.data()+6, 5);
// or even "worse":
std::string_view(s).remove_prefix(6).remove_suffix(1);
Frankly, I don't think any of these are very nice. Right now the best thing I can think of is using aliases to simply make things less verbose.
using sv = std::string_view;
sv(s).substr(6, 5);
There's the free-function route, but unless you also provide overloads for std::string it's a snake-pit.
#include <string>
#include <string_view>
std::string_view sub_string(
std::string_view s,
std::size_t p,
std::size_t n = std::string_view::npos)
{
return s.substr(p, n);
}
int main()
{
using namespace std::literals;
auto source = "foobar"s;
// this is fine and elegant...
auto bar = sub_string(source, 3);
// but uh-oh...
bar = sub_string("foobar"s, 3);
}
IMHO the whole design of string_view is a horror show which will take us back to a world of segfaults and angry customers.
update:
Even adding overloads for std::string is a horror show. See if you can spot the subtle segfault timebomb...
#include <string>
#include <string_view>
std::string_view sub_string(std::string_view s,
std::size_t p,
std::size_t n = std::string_view::npos)
{
return s.substr(p, n);
}
std::string sub_string(std::string&& s,
std::size_t p,
std::size_t n = std::string::npos)
{
return s.substr(p, n);
}
std::string sub_string(std::string const& s,
std::size_t p,
std::size_t n = std::string::npos)
{
return s.substr(p, n);
}
int main()
{
using namespace std::literals;
auto source = "foobar"s;
auto bar = sub_string(std::string_view(source), 3);
// but uh-oh...
bar = sub_string("foobar"s, 3);
}
The compiler found nothing to warn about here. I am certain that a code review would not either.
I've said it before and I'll say it again, in case anyone on the c++ committee is watching, allowing implicit conversions from std::string to std::string_view is a terrible error which will only serve to bring c++ into disrepute.
Update
Having raised this (to me) rather alarming property of string_view on the cpporg message board, my concerns have been met with indifference.
The consensus of advice from this group is that std::string_view must never be returned from a function, which means that my first offering above is bad form.
There is of course no compiler help to catch times when this happens by accident (for example through template expansion).
As a result, std::string_view should be used with the utmost care, because from a memory management point of view it is equivalent to a copyable pointer pointing into the state of another object, which may no longer exist. However, it looks and behaves in all other respects like a value type.
Thus code like this:
auto s = get_something().get_suffix();
Is safe when get_suffix() returns a std::string (either by value or reference)
but is UB if get_suffix() is ever refactored to return a std::string_view.
Which in my humble view means that any user code that stores returned strings using auto will break if the libraries they are calling are ever refactored to return std::string_view in place of std::string const&.
So from now on, at least for me, "almost always auto" will have to become, "almost always auto, except when it's strings".
You can use the conversion operator from std::string to std::string_view:
std::string s = "hello world!";
std::string_view v = std::string_view(s).substr(6, 5);
This is how you can efficiently create a sub-string string_view.
#include <string>
inline std::string_view substr_view(const std::string& source, size_t offset = 0,
std::string_view::size_type count =
std::numeric_limits<std::string_view::size_type>::max()) {
if (offset < source.size())
return std::string_view(source.data() + offset,
std::min(source.size() - offset, count));
return {};
}
#include <iostream>
int main(void) {
std::cout << substr_view("abcd",3,11) << "\n";
std::string s {"0123456789"};
std::cout << substr_view(s,3,2) << "\n";
// be cautious about lifetime, as illustrated at https://en.cppreference.com/w/cpp/string/basic_string_view
std::string_view bad = substr_view("0123456789"s, 3, 2); // "bad" holds a dangling pointer
std::cout << bad << "\n"; // possible access violation
return 0;
}
I realize that the question is about C++17, but it's worth noting that C++20 introduced a string_view constructor that accepts two iterators to char (or whatever the base type is) which allows writing
std::string_view v{ s.begin() +6, s.begin()+6 +5 };
Not sure if there is a cleaner syntax, but it's not difficult to
#define RANGE(_container,_start,_length) (_container).begin() + (_start), (_container).begin() + (_start) + (_length)
for a final
std::string_view v{ RANGE(s,6,5) };
PS: I called RANGE's first parameter _container instead of _string for a reason: the macro can be used with any Container (or class supporting at least begin() and end()), even as part of a function call like
auto pisPosition= std::find( RANGE(myDoubleVector,11,23), std::numbers::pi );
PPS: When possible, prefer C++20's actual ranges library to this poor person's solution.

Trouble using find function in C++

I am trying to find the difference in my code when I use std::find.
For my test code. I made a Vector called Test
std::vector<const char*> Test;
To test the find function, I filled the Test vector with dummy data by using push_back function
Test.push_back("F_S");
Test.push_back("FC");
Test.push_back("ID");
Test.push_back("CD");
Test.push_back("CT");
Test.push_back("DS");
Test.push_back("CR");
Test.push_back("5K_2");
Test.push_back("10K_5");
Test.push_back("10K_1");
Test.push_back("10K_2");
Test.push_back("10K_3");
Test.push_back("10K_4");
Test.push_back("10K_5");
What I want to do with the find function is to go through the Test and see if there are any repeated data. The first time a encounter the data, I will save it to a vector called Unique_Data.
std::vector<const char*> Unique_Data;
So for the 14 data points above, only 13 will be saved because 10K_5 repeated.
The Code I am using looks like this
for(int i = 0; i < Test.size(); i++)
{
if( Unique_Data.empty())
{
Unique_Data.push_back(Test[i]);
}
else if (std::find(Unique_Data.begin(), Unique_Data.end(), Test[i]) != Unique_Data.end())
{
// Move on to next index
}
else
{
Unique_Data.push_back(Test[i]);
}
}
The problem I am having is when I am using the dummy data. I am getting a correct answer for Unique_Data.
However, if I save the actual data into the Test vector which are saved in linked list. I get that they are all unique.
The code looks like this
p_curr = List.p_root;
while(p_curr != NULL)
{
// id starts from 0
if(atoi(p_curr->id) == 14) break;
Test.push_back(p_curr->Descriptor);
p_curr = p_curr->p_next;
}
I tested with the same 14 data. They are all const char* types. However, when I used the linked list data. The find function thinks all the data is unique.
Can anyone tell me what is wrong with this?
Using C-style strings is a bit tricky, they are just a pointer, and pointers are compared by identity. Two C strings with the same sequence of characters, but different addresses will compare different.
const char first[] = "Hi";
const char second[] = "Hi";
assert(first == second); // will fail!
There are two solutions to this problem. The simple one is using std::string in your container, as std::string will provide value comparisons. The alternative is to pass a comparison functor to std::find as a last argument. But this will still leave the problem of managing the lifetime of the const char*-s stored in the vector.
This is a pointers problem. You're not storing strings in your array, you're storing the memory address of the data in the string.
This strange behaviour is probably because in your example case you have literal strings that cannot be changed, so the compiler is optimising the storage, and when two strings are the same then it stores the same address for all strings that have the same text.
In your real data example, you have a bunch of strings that hold the same data, but each of these strings lives at a different memory address, so the find function is saying that all strings have a different address.
In summary, your find function is looking at the memory address of the string, not the data (text) in the string. If you use std::strings then this problem will disappear.
I would highly recommend using strings, as performance is going to be more than good enough and they eliminate a vast number of problems.
As David Rodriguez mentions in his answer, you're only comparing pointers, and not the contents of the strings themselves. Your solution will work as is if you were storing std::strings instead of char const *. With the latter, you need to resort to std::find_if and a predicate that calls strcmp to determine whether the strings are identical.
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstring>
int main()
{
std::vector<const char*> Test;
Test.push_back("F_S");
Test.push_back("FC");
Test.push_back("ID");
Test.push_back("CD");
Test.push_back("CT");
Test.push_back("DS");
Test.push_back("CR");
Test.push_back("5K_2");
Test.push_back("10K_5");
Test.push_back("10K_1");
Test.push_back("10K_2");
Test.push_back("10K_3");
Test.push_back("10K_4");
Test.push_back("10K_5");
std::vector<const char*> Unique_Data;
for(auto const& s1 : Test) {
if(std::find_i(Unique_Data.cbegin(), Unique_Data.cend(),
[&](const char *s2) { return std::strcmp(s1, s2) == 0; })
== Unique_Data.cend()) {
Unique_Data.push_back(s1);
}
}
for(auto const& s : Unique_Data) {
std::cout << s << '\n';
}
}
Here's a live example

Can we split, manipulate and rejoin a string in c++ in one statement?

This is a bit of a daft question, but out of curiousity would it be possibly to split a string on comma, perform a function on the string and then rejoin it on comma in one statement with C++?
This is what I have so far:
string dostuff(const string& a) {
return string("Foo");
}
int main() {
string s("a,b,c,d,e,f");
vector<string> foobar(100);
transform(boost::make_token_iterator<string>(s.begin(), s.end(), boost::char_separator<char>(",")),
boost::make_token_iterator<string>(s.end(), s.end(), boost::char_separator<char>(",")),
foobar.begin(),
boost::bind(&dostuff, _1));
string result = boost::algorithm::join(foobar, ",");
}
So this would result in turning "a,b,c,d,e,f" into "Foo,Foo,Foo,Foo,Foo,Foo"
I realise this is OTT, but was just looking to expand my boost wizardry.
First, note that your program writes "Foo,Foo,Foo,Foo,Foo,Foo,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,," to your result string -- as already mentioned in comments, you wanted to use back_inserter there.
As for the answer, whenever there's a single value resulting from a range, I look at std::accumulate (since that is the C++ version of fold/reduce)
#include <string>
#include <iostream>
#include <numeric>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/bind.hpp>
std::string dostuff(const std::string& a) {
return std::string("Foo");
}
int main() {
std::string s("a,b,c,d,e,f");
std::string result =
accumulate(
++boost::make_token_iterator<std::string>(s.begin(), s.end(), boost::char_separator<char>(",")),
boost::make_token_iterator<std::string>(s.end(), s.end(), boost::char_separator<char>(",")),
dostuff(*boost::make_token_iterator<std::string>(s.begin(), s.end(), boost::char_separator<char>(","))),
boost::bind(std::plus<std::string>(), _1,
bind(std::plus<std::string>(), ",",
bind(dostuff, _2)))); // or lambda, for slightly better readability
std::cout << result << '\n';
}
Except now it's way over the top and repeats make_token_iterator twice. I guess boost.range wins.
void dostuff(string& a) {
a = "Foo";
}
int main()
{
string s("a,b,c,d,e,f");
vector<string> tmp;
s = boost::join(
(
boost::for_each(
boost::split(tmp, s, boost::is_any_of(",")),
dostuff
),
tmp
),
","
);
return 0;
}
Unfortunately I can't eliminate mentioning tmp twice. Maybe I'll think of something later.
I am actually working on a library to allow writing code in a more readable fashion than iterators alone... don't know if I'll ever finish the project though, seems dead projects tend to accumulate on my computer...
Anyway the main reproach I have here is obviously the use of iterators. I tend to think of iterators as low-level implementation details, when coding you rarely want to use them at all.
So, let's assume that we have a proper library:
struct DoStuff { std::string operator()(std::string const&); };
int main(int argc, char* argv[])
{
std::string const reference = "a,b,c,d,e,f";
std::string const result = boost::join(
view::transform(
view::split(reference, ","),
DoStuff()
),
","
);
}
The idea of a view is to be a lightwrapper around another container:
from the user point of view it behaves like a container (minus the operations that actually modify the container structure)
from the implementation point of view, it's a lightweight object, containing as few data as possible --> the value is ephemeral here, and only lives as long as the iterator lives.
I already have the transform part working, I am wondering how the split could work (generally), but I think I'll get into it ;)
Okay, I guess it's possible, but please please don't really do this in production code.
Much better would be something like
std::string MakeCommaEdFoo(std::string input)
{
std::size_t commas = std::count_if(input.begin(), input.end(),
std::bind2nd(std::equal_to<char>(), ','));
std::string output("foo");
output.reserve((commas+1)*4-1);
for(std::size_t idx = 1; idx < commas; ++idx)
output.append(",foo");
return output;
}
Not only will it perform better, it will is much easier for the next guy to read and understand.

String handling in C++

How do I write a function in C++ that takes a string s and an integer n as input and gives at output a string that has spaces placed every n characters in s?
For example, if the input is s = "abcdefgh" and n = 3 then the output should be "abc def gh"
EDIT:
I could have used loops for this but I am looking for concise and an idiomatic C++ solution (i.e. the one that uses algorithms from STL).
EDIT:
Here's how I would I do it in Scala (which happens to be my primary language):
def drofotize(s: String, n: Int) = s.grouped(n).toSeq.flatMap(_ + " ").mkString
Is this level of conciseness possible with C++? Or do I have to use explicit loops after all?
Copy each character in a loop and when i>0 && i%(n+1)==0 add extra space in the destination string.
As for Standard Library you could write your own std::back_inserter which will add extra spaces and then you could use it as follows:
std::copy( str1.begin(), str1.end(), my_back_inserter(str2, n) );
but I could say that writing such a functor is just a wasting of your time. It is much simpler to write a function copy_with_spaces with an old good for-loop in it.
STL algorithms don't really provide anything like this. Best I can think of:
#include <string>
using namespace std;
string drofotize(const string &s, size_t n)
{
if (s.size() <= n)
{
return s;
}
return s.substr(0,n) + " " + drofotize(s.substr(n), n);
}

Abusing the comma operator

I'm looking for an easy way to build an array of strings at compile time. For a test, I put together a class named Strings that has the following members:
Strings();
Strings(const Strings& that);
Strings(const char* s1);
Strings& operator=(const char* s1);
Strings& operator,(const char* s2);
Using this, I can successfully compile code like this:
Strings s;
s="Hello","World!";
The s="Hello" part invokes the operator= which returns a Strings& and then the operator, get called for "World!".
What I can't get to work (in MSVC, haven't tried any other compilers yet) is
Strings s="Hello","World!";
I'd assume here that Strings s="Hello" would call the copy constructor and then everything would behave the same as the first example. But I get the error: error C2059: syntax error : 'string'
However, this works fine:
Strings s="Hello";
So I know that the copy constructor does at least work for one string. Any ideas? I'd really like to have the second method work just to make the code a little cleaner.
I think that the comma in your second example is not the comma operator but rather the grammar element for multiple variable declarations.
e.g., the same way that you can write:
int a=3, b=4
It seems to me that you are essentially writing:
Strings s="Hello", stringliteral
So the compiler expects the item after the comma to be the name of a variable, and instead it sees a string literal and announces an error. In other words, the constructor is applied to "Hello", but the comma afterwards is not the comma operator of Strings.
By the way, the constructor is not really a copy constructor. It creates a Strings object from a literal string parameter... The term copy constructor is typically applied to the same type.
I wouldn't recommend this kind of an API. You are going to continue discovering cases that don't work as expected, since comma is the operator with the lowest precedence. For example, this case won't work either:
if ("Hello","world" == otherStrings) { ... }
You may be able to get things working if you use brackets every time around the set of strings, like this:
Strings s=("Hello","World!");
And my example above would look like this:
if (("Hello","world") == otherStrings) { ... }
That can likely be made to work, but the shorthand syntax is probably not worth the tricky semantics that come with it.
Use boost::list_of.
It's possible to make this work, for a sufficiently loose definition of "work." Here's a working example I wrote in response to a similar question some years ago. It was fun as a challenge, but I wouldn't use it in real code:
#include <iostream>
#include <algorithm>
#include <iterator>
#include <vector>
void f0(std::vector<int> const &v) {
std::copy(v.begin(), v.end(),
std::ostream_iterator<int>(std::cout, "\t"));
std::cout << "\n";
}
template<class T>
class make_vector {
std::vector<T> data;
public:
make_vector(T const &val) {
data.push_back(val);
}
make_vector<T> &operator,(T const &t) {
data.push_back(t);
return *this;
}
operator std::vector<T>() { return data; }
};
template<class T>
make_vector<T> makeVect(T const &t) {
return make_vector<T>(t);
}
int main() {
f0((makeVect(1), 2, 3, 4, 5));
f0((makeVect(1), 2, 3));
return 0;
}
You could use an array of character pointers
Strings::Strings(const char* input[]);
const char* input[] = {
"string one",
"string two",
0};
Strings s(input);
and inside the constructor, iterate through the pointers until you hit the null.
If the only job of Strings is to store a list of strings, then boost::assign could do the job better with standard containers, I think :)
using namespace boost::assign;
vector<string> listOfThings;
listOfThings += "Hello", "World!";
If you c++0x, they have new inializer lists for this! I wish you could use those. For example:
std::vector<std::string> v = { "xyzzy", "plugh", "abracadabra" };
std::vector<std::string> v{ "xyzzy", "plugh", "abracadabra" };