Effective construction std::string from std::unordered_set<char> - c++

I have an unordered_set of chars
std::unordered_set<char> u_setAlphabet;
Then I want to get a content from the set as std::string. My implementation now looks like this:
std::string getAlphabet() {
std::string strAlphabet;
for (const char& character : u_setAlphabet)
strAlphabet += character;
return strAlphabet;
}
Is this a good way to solve this task? The additions of signle chars to string seems not to be optimal for large u_setAlphabet (multiple reallocs?). Is there any other method to it?

The simplest, most readable and most efficient answer is:
return std:string(s.begin(), s.end());
The implementation may choose to detect the length of the range up-front and only allocate once; both libc++ and libstdc++ do this when given a forward iterator range.
The string class also offers you reserve, just like vector, to manage the capacity:
std::string result
result.reserve(s.size());
for (unsigned char c : s) result.push_back(c); // or std::copy
return result;
It also offers assign, append and insert member functions, but since those offer the strong exception guarantee, they may have to allocate a new buffer before destroying the old one (thanks to #T.C. for pointing out this crucial detail!). The libc++ implementation does not reallocate if the existing capacity suffices, while GCC5's libstdc++ implementation reallocates unconditionally.

std::string has a constructor for that:
auto s = std::string(begin(u_setAlphabet), end(u_setAlphabet));

It is better to use the constructor that acepts iterators. For example
std::string getAlphabet() {
return { u_setAlphabet.begin(), u_setAlphabet.end() };
}

Both return std::string(u_setAlphabet.begin(), u_setAlphabet.end()); and return { u_setAlphabet.begin(), u_setAlphabet.end(); are the same in C++11. I prefer #VladfromMoscow solution because we do not need to make any assumption about the returned type of the temporary object.

Related

What the easiest way to convert part of a vector to string?

I want to convert part of a vector to a string, I found this
std::string myString(Buffer.begin(), Buffer.end()); (Buffer is the vector)
But here I converted the whole vector. What the easiest way if I want to skip first 5 chars of the vector and convert the rest? Like ''.join(Buffer[5::]) if it was python.
Just add your offset to begin. Since a vector has random access iterators, you can just use
std::string myString(Buffer.begin() + 5, Buffer.end());
If you are not sure about what type of iterators your container is using, you can use std::next like
std::string myString(std::next(Buffer.begin(), 5), Buffer.end());
which will handle all iterator types, it also may be an O(N) operation.
You also need to make sure that the size of the container is at least 5 before you do this otherwise you'll have undefined behavior if the container is too small.
Like this for example:
if (Buffer.size() >= 5) {
std::string myString(
Buffer.begin() + 5,
Buffer.end());
A more general approach that works with non-random access iterators as well (such as iterators of std::list for example):
std::string myString(
std::next(Buffer.begin(), 5),
Buffer.end());
Or using ranges:
auto subrange = Buffer | std::ranges::views::drop(5);
std::string myString(
std::ranges::begin(subrange),
std::ranges::end(subrange));
A safe (avoiding accesses out of bounds), general approach could be
std::string myString(std::next(Buffer.cbegin(), std::min(5, Buffer.size()),
Buffer.cend()); // ^^^^^^^^^^^^^^^^^^^^^^^^^

Erase inside a std::string by std::string_view

I need to find and then erase a portion of a string (a substring). string_view seems such a good idea, but I cannot make it work with string::erase:
// guaranteed to return a view into `str`
auto gimme_gimme_gimme(const std::string& str) -> std::string_view;
auto after_midnight(std::string& str)
{
auto man = gimme_gimme_gimme(str);
str.erase(man); // way to hopeful, not a chance though
str.erase(man.begin(), man.end()); // nope
str.erase(std::distance(str.begin(), man.begin()), man.size()); // nope
str.erase(std::distance(str.data(), man.data()), man.size()); // nope again
// for real???
}
Am I overthinking this? Given a std::string_view into a std::string how to erase that part of the string? Or am I misusing string_view?
The string view could indeed be empty, or it could be a view to the outside of the container. Your suggested erase overload, as well as the implementation of the function in your answer relies on a pre-condition that the string view is to the same string object.
Of course, the iterator overloads are very much analogous and rely on the same pre-condition. But such pre-condition is conventional for iterators, but non-conventional for string views.
I don't think that string view is an ideal way to represent the sub range in this case. Instead, I would suggest using a relative sub range based on the indices. For example:
struct sub_range {
size_t begin;
size_t count;
constexpr size_t past_end() noexcept {
return begin + count;
}
};
It is a matter of taste whether to use end (i.e. past_end) or count for the second member, and to provide the other as a function. Regardless, there should be no confusion because the member will have a name. Using count is somewhat more conventional with indices.
Another choice is whether to use signed or unsigned indices. Signed indices can be used to represent backwards ranges. std::string interface doesn't understand such ranges however.
Example usage:
auto gimme_gimme_gimme(const std::string& str) -> sub_range;
auto after_midnight(std::string& str)
{
auto man = gimme_gimme_gimme(str);
str.erase(man.begin, man.distance);
}
Am I overthinking this?
You're under thinking it, unless I'm missing something obvious. To make the code compile you need this:
auto gimme_gimme_gimme(const std::string& str) -> std::string_view;
auto after_midnight(std::string& str)
{
auto man = gimme_gimme_gimme(str);
str.erase(std::distance(std::as_const(str).data(), man.data()), man.size()); // urrr... growling in pain
}
But wait!! There's more! Notice I said "to make it compile". The code is error prone!! Because...
std::string::data cannot be nullptr but an empty string_view can be represented as (valid pointer inside the string + size 0) or as (nullptr + size 0). The problem arises if the string_view::data is nulltpr because of the std::distance used.
So you need to make sure that the string_view always points inside the string, even if the view is empty. Or do extra checks on the erase side.

Erase element from set with const key

Unordered set keys are read only, so why in this case I can erase element:
std::unordered_set<std::string> s;
s.emplace("sth");
s.erase("sth");
and in this not:
std::unordered_set<std::string const> s;
const std::string str("sth");
s.emplace(str);
s.erase(str);
If set itself would be const It would make sense but with const keys I don't quite understand that. This assertion fails:
static_assert(!is_reference<_Tp>::value && !is_const<_Tp>::value, "");
Why would somebody who wrote that assertion, check if key is not const?
EDIT:
In fact, the code above compiles fine for std::set. For std::unordered_set, the failure is directly at instantiation. A minimal example to reproduce:
// define a customized hash ...
int main() { sizeof(std::unordered_set<int const>); }
The reason that you cannot erase the element is not because of const-correctness.
It is because you cannot have a container of const things. It is not permitted.
You broke the contract of unordered_set.
The static_assert detected that.
I think the thing that you're missing is that std::set actually allocates its own strings. so when you say s.emplace("sth"), it creates a new "sth" string, it doesn't use yours (it uses it to construct a new one).
Why are these newly allocated strings const ? Because you're not suppose to modify them directly, or else you'll break the set. If you change "aaa" to "zzz" directly then the set will still think that "zzz" is the first element in the set, before "bbb".
So why is that assert there ?
Because std does not have an allocator for const objects - so when it tries to allocate a const object it will fail.
he VS2017 error is more obvious:
"The C++ Standard forbids containers of const elements because allocator is ill-formed."

Interpret a std::string as a std::vector of char_type?

I have a template<typename T> function that takes a const vector<T>&. In said function, I have vectors cbegin(), cend(), size(), and operator[].
As far as I understand it, both string and vector use contiguous space, so I was wondering if I could reuse the function for both data types in an elegant manner.
Can a std::string be reinterpreted as a std::vector of (the appropriate) char_type? If so, what would the limitations be?
If you make your template just for type const T& and use the begin(), end(), etc, functions which both vector and string share then your code will work with both types.
Go STL way and use iterators. Accept iterator to begin and iterator to end. It will work with all possible containers, including non-containers like streams.
There is no guarantee the layout of string and vector will be the same. They theoretically could be, but they probably aren't in any common implementation. Therefore, you can't do this safely. See Zan's answer for a better solution.
Let me explain: If I am a standard library implementer and decide to implement std::string like so....
template ...
class basic_string {
public:
...
private:
CharT* mData;
size_t mSize;
};
and decide to implement std::vector like so...
template ...
class vector {
public:
...
private:
T* mEnd;
T* mBegin;
};
When you reinterpret_cast<string*>(&myVector) you wind up interpreting the pointer to the end of your data as the pointer to the start of your data, and the pointer to the start of your data to the size of your data. If the padding between members is different, or there are extra members, it could get even weirder and more broken than that too.
So yes, in order for this to possibly work they both need to store contiguous data, but they also need quite a bit else to be the same between the implementations for it to work.
std::experimental::array_view<const char> n4512 represents a contiguous buffer of chars.
Writing your own is not hard, and it solves this problem and (in my experience) many more.
Both string and vector are compatible with an array view.
This lets you move your implementation into a .cpp file (and not expose it), gives you the same performance as doing it with std::vector<T> const& and probably the same implementation, avoids duplicating code, and uses light weight contiguous buffer type erasure (which is full of tasty keywords).
If the key point is that you want to access a continuous area in memory where instances of a specific char type are stored then you could define your function as
void myfunc(const CType *p, int size) {
...
}
to make it clear that you assume they must be adjacent in memory.
Then for example to pass the content of a vector the code is simply
myfunc(&myvect[0], myvect.size());
and for a string
myfunc(mystr.data(), mystr.size());
or
myfunc(buffer, n);
for an array.
You can't directly typecast a std::vector to a std::string or vice versa. But using the iterators that STL containers provide does allow you to iterate both a vector and a string in the same way. And if your function requires random access of the container in question then either would work.
std::vector<char> str1 {'a', 'b', 'c'};
std::string str2 = "abc";
template<typename Iterator>
void iterator_function(Iterator begin, Iterator end)
{
for(Iterator it = begin; it != end; ++it)
{
std::cout << *it << std::endl;
}
}
iterator_function(str1.begin(), str1.end());
iterator_function(str2.begin(), str2.end());
Both of those last two function calls would print the same thing.
Now if you wanted to write a generic version that parsed only characters only stored in a string or in a vector you could write something that iterated the internal array.
void array_function(const char * array, unsigned length)
{
for(unsigned i = 0; i < length; ++i)
{
std::cout << array[i] << std::endl;
}
}
Both functions would do the same thing in the following scenarios.
std::vector<char> str1 {'a', 'b', 'c'};
std::string str2 = "abc";
iterator_function(str1.begin(), str1.end());
iterator_function(str2.begin(), str2.end());
array_function(str1.data(), str1.size());
array_function(str2.data(), str2.size());
There are always multiple ways to solve a problem. Depending on what you have available any number of solutions might work. Try both and see which works better for your application. If you don't know the iterator type then the char typed array iteration is useful. If you know you will always have the template type to pass in then the template iterator method might be more useful.
The way your question is put at the moment is a bit confusing. If you mean to be asking "is it safe to cast a std::vector type to a std::string type or vice versa if the vector happens to contain char values of the appropriate type?", the answer is: no way, don't even think about it! If you're asking: "can I access the contiguous memory of non-empty sequences of char type if they're of the type std::vector or std::string?" then the answer is, yes you can (with the data() member function).

How can one implement a custom string class using the STL?

In C++ - The Complete Reference, the author gives us a challenge after showing how he implements a custom C++ string class. Excerpt from the book:
A Challenge:
Try implementing StrType (the string class) using the STL. That is, use a container to store the characters that comprise a string. Use iterators to operate on the strings, and use the algorithms to perform the various string manipulations.
I understand the basic concept here, but am having trouble implementing it. should I do std::vector < char > and push_back for every char or something like that? What about the string manipulations? Need some help. Sample code will be accepted gratefully, or you can explain how I may be able to implement this.
Yes, std::vector<char> sounds like a great idea. It will save you from the troubles of writing a custom destructor, copy constructor and copy assignment operator. Plus all the iterator member functions (begin, end and co.) can just delegate to the std::vector<char> versions.
can u give some code on how to do string manipulations? e.g concatenation ?
Sure thing, here is how I would overload operator+= and operator+ for the string type:
class StrType
{
std::vector<char> vec;
public:
// ...
StrType& operator+=(const StrType& rhs)
{
vec.insert(vec.end(), rhs.vec.begin(), rhs.vec.end());
return *this;
}
};
StrType operator+(StrType lhs, const StrType& rhs)
{
lhs += rhs;
return lhs;
}
There's probably a more efficient version of operator+, but you can figure that out on your own.
Using std::vector<char> would probably be the best container to use in this case (random access iterators and low overhead make it an attractive choice for a string).
Further to your comment on FredOverflow's answer, you can perform a string concatenation as follows:
std::vector<char> firstString;
firstString.push_back('A');
firstString.push_back('B');
std::vector<char> secondString;
secondString.push_back('X');
secondString.push_back('Y');
firstString.insert( firstString.end(), secondString.begin(), secondString.end() );
for( auto it = firstString.begin(); it != firstString.end(); ++it )
{
std::cout << (*it);
}
In this case this would print out: ABXY. You can see it here: http://ideone.com/OmdoU