As an exercise, I'm trying to write a class like a std::vector without using a template. The only type it holds is std::string.
Below is the strvec.h file:
class StrVec
{
public:
//! Big 3
StrVec():
element(nullptr), first_free(nullptr), cap(nullptr)
{}
StrVec(const StrVec& s);
StrVec&
operator =(const StrVec& rhs);
~StrVec();
//! public members
void push_back(const std::string &s);
std::size_t size() const { return first_free - element; }
std::size_t capacity() const { return cap - element; }
std::string* begin() const { return element; }
std::string* end() const { return first_free; }
void reserve(std::size_t n);
void resize(std::size_t n);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^
private:
//! data members
std::string* element; // pointer to the first element
std::string* first_free; // pointer to the first free element
std::string* cap; // pointer to one past the end
std::allocator<std::string> alloc;
//! utilities
void reallocate();
void chk_n_alloc() { if (size() == capacity()) reallocate(); }
void free();
void wy_alloc_n_move(std::size_t n);
std::pair<std::string*, std::string*>
alloc_n_copy (std::string* b, std::string* e);
};
The three string*, element, first_free, and cap can be thought of as:
[0][1][2][3][unconstructed elements]
^ ^ ^
element first_free cap
When implementing the member resize(size_t n), I have a problem. Say, v.resize(3) is called. As a result the pointer first_free must be moved forward one place and point to [3]. Something like:
[0][1][2][3][unconstructed elements]
^ ^ ^
element first_free cap
My question is how should I deal with [3]? Leave it there untouched? Or destroy it like:
if(n < size())
{
for(auto p = element + n; p != first_free; /* empty */)
alloc.destroy(p++);
first_free = element + n;
}
Is the code alloc.destroy( somePointer) necessary here?
Yes, definitely, you should call destroy on elements that are removed from the vector when resize() is called with an argument smaller than the current size of the vector. That's what std::vector does, too.
Note that destroy only calls the destructor on those elements; it does not deallocate any space (which would be wrong).
Since you are dealing with std::string, you probably think you could do without destruction if you are sure that you re-initialize the same std::string object later with a new value. But firstly, you can't be sure that a new string will be stored in the same place later, and secondly, for the new string a new object would be created (using placement-new, not copy-assignment), leaking the memory of the previous string (whose destructor would never have been called).
What you should do depends on how you've initialised element, as you need your code to be consistent.
if you use new std::string[n] to create the array of strings, then they will all be pre-initialised, and when you necessarily use delete[] to deallocate them later their destructors will all be run. For that reason, you must not call the destructors manually in the intervening time unless you are certain you'll placement-new a valid object there again.
if you use something like static_cast<std::string*>(new char[sizeof(std::string) * n]) to create a buffer of un-initialised memory, then you must take full responsibility for calling the constructor and destructor of every element at appropriate times
With the first option, you wouldn't need to do anything for resize(3), but could call .clear() on the string to potentially free up some memory if you wanted.
With the second option, you must trigger the destructor for [3] (unless you're keeping some other record of which element eventually need destruction, which seems a clumsy model).
The issues are identical to just having memory for a single string that is "in use" at different times during the program. Do you spend the time to construct it before first use then assign to it, or do you leave it uninitialised then copy-construct it with placement new? Do you clear it when unused or destruct it? Either model can work with careful implementation. The first approach tends to be easier to implement correctly, the second model slightly more efficient when the array capacity is much greater than the number of element that end up being used.
Related
I wonder if the range constructor of std::vector does copy the data, or does it just reference it?
Have a look at this example:
vector<int> getVector() {
int arr[10];
for(int i=0; i<10; ++i) arr[i] = i;
return vector<int>(arr, arr+10);
}
Would this cause a bug (due to handing out a reference to the stack which is destroyed later) or is it fine, since it copies the data in the constructor?
Edit #1
For clarification: I'm looking for a more or less official resource that points out, which of the following pseudo code implementations of the constructor are valid. I know the signature of the constructor is different... but, you should get the idea.
Version A (just uses the given data internally)
template<typename T>
class vector {
private:
T* data;
int size;
public:
vector<T>(T* start, T* end) {
data = start;
size = (end - start);
}
};
Version B (explicitly copies the data)
template<typename T>
class vector {
private:
T* data;
int size;
public:
vector<T>(T* start, T* end) {
for(T* it = start; it < end; ++it) push_back(*it);
}
};
When in doubt, check the reference. The answer can be derived from Complexity section, although I'd agree there is no explicit confirmation:
Complexity: Makes only N calls to the copy constructor of T (where N
is the distance between first and last) and no reallocations if
iterators first and last are of forward, bidirectional, or random
access categories. It makes order N calls to the copy constructor of T
and order logN reallocations if they are just input iterators.
Like all constructors of std::vector<int>, this copies the integers. The same holds for methods like push_back and insert
This is why std::vector actually has two template arguments. The second one is defaulted to std::allocator; it's the allocator used to allocate memory for the 10 integers (and perhaps a few more so that the vector can grow - see capacity)
[Edit]
The actual code is most like Version B, but probably similar to
template<typename T>
class vector {
private:
T* _Data = nullptr;
size_t _Capacity = 0;
size_t _Used = 0;
public:
vector<T>(T* start, T* end) {
_Used = (end-begin);
reserve(_Used); // Sets _Data, _Capacity
std::uninitialized_copy(begin, end, _Data);
}
};
The C++ standard library is specified in a somewhat strange way.
It is specified saying what each method requires and what each method guarantees. It is not specified as in "vector is a container of values that it owns", even though that is the real underlying abstraction here.
Formally, what you are doing is safe not because "the vector copies", but because none of the preconditions of any of the methods of std vector are violated in the copy of the std vector your function returns.
Similarly, the values are set to be certain ones because of the postconditions of the constructor, and then the pre and post conditions of the copy constructor and/or C++17 prvalue "elision" rules.
But trying to reason about C++ code in this way is madness.
A std::vector semantically is a regular type with value semantics that owns its own elements. Regular types can be copied, and the copies behave sane even if the original object is destroyed.
Unless you make a std::vector<std::reference_wrapper<int>> you are safe, and you are unsafe for the reference wrapper because you stored elements which are not regular value types.
The vector can not be defined as a vector of references as for example std::vector<int &>. So the code is valid. The vector does not contain references to elements of the array. It creates new elements of the type int (as the template argument of the vector) not a vector of references.
I'm currently refactoring and change existing code to C++11 and I wonder if have memory leak. My code has a struct with a std::vector in it as well as a method to shrink() this vector down to its negative elements.
struct mystruct_t {
int other_stuff;
std::vector <int> loc;
// Adds elements to loc vector
void add(int pos){
loc.push_back(pos);
}
// Shrink the list
void shrink () {
std::vector<int> tmp;
for (unsigned int i = 0; i < loc.size(); ++i) {
if (loc[i] < 0) tmp.push_back (loc[i]);
}
loc = tmp;
std::vector<int>(loc).swap (loc);
}
mystruct_t(): otherstuff(0) {};
};
In another function I create a new instance of this struct like this:
mystruct_t c = new mystruct_t;
c->add(2);
c->add(3);
...
And later I call the shrink() method of this struct.
c->shrink()
Now I'm not sure what's happening with the "old" loc vector after the shrink function?
Will it get destroyed automatically or do I have to destroyed by hand? And if the later, how would I do that?
I also tried to change shrink() to more C++11 style by change it to:
void shrink (){
std::vector<int> tmp;
for (auto &currLoc : loc) {
if (currLoc < 0) tmp.push_back (currLoc);
}
loc = std::move(tmp);
}
But the question remains the same what is happening to the "old" loc vector additionally this seems to increase the memory usage. I'm new to C++11 and not sure if I totally misunderstand the concept?
Now I'm not sure what's happening with the "old" loc vector after the shrink function?
There is no "old" loc vector. Through the lifetime of a mystruct_t object, it has exactly one member vector loc. You never get a new member or throw away an old one.
When you copy assign to the member (loc = tmp;), the buffer - cotained within the vector - is renewed. The vector owns the buffer, and the vector takes care that it is destroyed properly. Same applies when you move assign in the c++11 version.
Will it get destroyed automatically
If you refer to the memory allocated by the vector, then yes.
or do I have to destroyed by hand?
You have to destroy by hand only whatever you created by hand. You didn't call new, so you don't call delete.
additionally this seems to increase the memory usage.
Your c++11 version lacks the "shrink to fit" part of the original (std::vector<int>(loc).swap (loc);). In c++11 you can do:
loc = std::move(tmp);
loc.shrink_to_fit();
In the pre c++11 version, can get rid of the copy assignment and simply construct the temporary from tmp, and swap it with loc:
std::vector<int> tmp;
// copy the objects you want
std::vector<int>(tmp).swap(loc);
Operation std::move just casting values, so there is no additional memory usage.
When you use std::move compiler will remove head address of first object, and just reassign memory to second object. So it's very fast operation, etc just changing the head of data.
There are some questions quite similar around here, but they couldn't help me get my mind around it.
Also, I'm giving a full example code, so it might be easier for others to understand.
I have made a vector container (couldn't use stl for memory reasons) that used to use only operator= for push_back*, and once I came accross placement new, I decided to introduce an additional "emplace_back" to it**.
*(T::operator= is expected to deal with memory management)
**(the name is taken from a similar function in std::vector that I've encountered later, the original name I gave it was a mess).
I read some stuff about the danger of using placement new over operator new[] but couldn't figure out if the following is ok or not, and if not, what's wrong with it, and what should I replace it with, so I'd appreciate your help.
This is of couse a simplified code, with no iterators, and no extended functionality, but it makes the point :
template <class T>
class myVector {
public :
myVector(int capacity_) {
_capacity = capacity_;
_data = new T[_capacity];
_size = 0;
}
~myVector() {
delete[] _data;
}
bool push_back(T const & t) {
if (_size >= _capacity) { return false; }
_data[_size++] = t;
return true;
}
template <class... Args>
bool emplace_back(Args const & ... args) {
if (_size >= _capacity) { return false; }
_data[_size].~T();
new (&_data[_size++]) T(args...);
return true;
}
T * erase (T * p) {
//assert(/*p is not aligned*/);
if (p < begin() || p >= end()) { return end(); }
if (p == &back()) { --_size; return end(); }
*p = back();
--_size;
return p;
}
// The usual stuff (and more)
int capacity() { return _capacity; }
int size() { return _size; }
T * begin() { return _data; }
T * end() { return _data + _size; }
T const * begin() const { return _data; }
T const * end() const { return _data + _size; }
T & front() { return *begin(); }
T & back() { return *(end() - 1); }
T const & front() const { return *begin(); }
T const & back() const { return *(end() - 1); }
T & operator[] (int i) { return _data[i]; }
T const & operator[] (int i) const { return _data[i]; }
private:
T * _data;
int _capacity;
int _size;
};
Thanks
I read some stuff about the danger of using placement new over
operator new[] but couldn't figure out if the following is ok or not,
and if not, what's wrong with it [...]
For operator new[] vs. placement new, it's only really bad (as in typically-crashy type of undefined behavior) if you mix the two strategies together.
The main choice you typically have to make is to use one or the other. If you use operator new[], then you construct all the elements for the entire capacity of the container in advance and overwrite them in methods like push_back. You don't destroy them on removal in methods like erase, just kind of keep them there and adjust the size, overwrite elements, and so forth. You both construct and allocate a multiple elements all in one go with operator new[], and destroy and deallocate them all in one go using operator delete[].
Why Placement New is Used For Standard Containers
First thing to understand if you want to start rolling your own vectors or other standard-compliant sequences (that aren't simply linked structures with one element per node) in a way that actually destroys elements when they are removed, constructs elements (not merely overwrite them) when added, is to separate the idea of allocating the memory for the container and constructing the elements for it in place. So quite to the contrary, in this case, placement new isn't bad. It's a fundamental necessity to achieve the general qualities of the standard containers. But we can't mix it with operator new[] and operator delete[] in this context.
For example, you might allocate the memory to hold 100 instances of T in reserve, but you don't want to default construct them as well. You want to construct them in methods like push_back, insert, resize, the fill ctor, range ctor, copy ctor, etc. -- methods that actually add elements and not merely the capacity to hold them. That's why we need placement new.
Otherwise we lose the generality of std::vector which avoids constructing elements that aren't there, can copy construct in push_backs rather than simply overwriting existing ones with operator=, etc.
So let's start with the constructor:
_data = new T[_capacity];
... this will invoke the default constructors for all the elements. We don't want that (neither the default ctor requirement nor this expense), as the whole point of using placement new is to construct elements in-place of allocated memory, and this would have already constructed all elements. Otherwise any use of placement new anywhere will try to construct an already-constructed element a second time, and will be UB.
Instead you want something like this:
_data = static_cast<T*>(malloc(_capacity * sizeof(T)));
This just gives us a raw chunk of bytes.
Second, for push_back, you're doing:
_data[_size++] = t;
That's trying to use the assignment operator, and, after our previous modification, on an uninitialized/invalid element which hasn't been constructed yet. So we want:
new(_data + _size) T(t);
++size;
... that makes it use the copy constructor. It makes it match up with what push_back is actually supposed to do: creating new elements in the sequence instead of simply overwriting existing ones.
Your erase method needs some work even at the basic logic level if you want to handle removals from the middle of the container. But just from the resource management standpoint, if you use placement new, you want to manually invoke destructors for removed elements. For example:
if (p == &back()) { --_size; return end(); }
... should be more like:
if (p == &back())
{
--size;
(_data + _size)->~T();
return end();
}
Your emplace_back manually invokes a destructor but it shouldn't do this. emplace_back should only add, not remove (and destroy) existing elements. It should be quite similar to push_back but simply invoking the move ctor.
Your destructor does this:
~myVector() {
delete[] _data;
}
But again, that's UB when we take this approach. We want something more like:
~myVector() {
for (int j=0; j < _size; ++j)
(_data + j)->~T();
free(_data);
}
There's still a whole lot more to cover like exception-safety which is a whole different can of worms.
But this should get you started with respect to proper usage of placement new in a data structure against some memory allocator (malloc/free in this exemplary case).
Last but not least:
(couldn't use stl for memory reasons)
... this might be an unusual reason. Your implementation doesn't necessarily use any less memory than a vector with reserve called in advance to give it the appropriate capacity. You might shave off a few bytes for on a per-container-level (not on a per-element level) with the choice of 32-bit integrals and no need to store an allocator, but it's going to be a very small memory savings in exchange for a boatload of work.
This kind of thing can be a useful learning exercise though to help you build some data structures outside the standard in a more standard-compliant way (ex: unrolled lists which I find quite useful).
I ended up having to reinvent some vectors and vector-like containers for ABI reasons (we wanted a container we could pass through our API that was guaranteed to have the same ABI regardless of what compiler was used to build a plugin). Even then, I would have much preferred simply using std::vector.
Note that if you just want to take control of how vector allocates memory, you can do that by specifying your own allocator with a compliant interface. This might be useful, for example, if you want a vector which allocates 128-bit aligned memory for use with aligned move instructions using SIMD.
So I'm trying to implement a pop_back() function for my Vector class, but I'm not getting the expected results:
Here's my current function:
template <typename T>
void Vector<T>::pop_back() {
if(vsize > 0){
array[vsize].~T();
--vsize;
}
}
Why doesn't this delete the last element in the array?
Here's my .h:
template <typename T>
class Vector {
public:
Vector();
~Vector();
void push_back(const T &e);
int size() const;
void pop_back();
void allocate_new();
T operator[](int index);
private:
Vector(const Vector<T> & v);
Vector<T> & operator=(const Vector<T> &);
int vsize;
int capacity;
T* array;
};
Calling the destructor of an object in an array won't do anything to the array other than putting the element of the array into a funny state. In particular, if do something like this:
T* array = new T[2];
array[1].~T();
delete[] array; // ERROR: double destruction
you can't really get rid of the array at all without first restoring the destroyed array element.
The way a "real" implementation deals with the situation is to allocate raw memory, e.g., using void* memory = operator new[](size); (with a suitbale size) or, if it is a standard C++ library container, using the appropriate allocator functions. The container than constructs and destructs objects in the memory as needed. The actual representation in the destroyed memory may still not really change as most destructors just get rid of any resources held and leave the bits in the object otherwise unchanged, giving the appearance as if there is a life object in the memory although it is not.
The last element of the array would be in array[vsize - 1].
You shouldn't call the destructor of an object unless it was created using a placement-new expression. What you should instead do is reduce the size and leave the actual object untouched. Unless it has dynamic-storage duration, in which case call delete/delete[].
This question already has answers here:
What is move semantics?
(11 answers)
Closed 9 years ago.
What exactly is the purpose of this "move" semantic? I understand if you don't pass in by reference a copy is made of non-primitive types, but how does "move" change anything? Why would we want to "move" the data? Why cant it just be kept at the same address and not copied? If it is sent to another address, isn't this just a "copy and delete"?
In short, I don't really get what move semantics is achieving exactly.
Move semantics combines the advantages of passing by value and passing by reference. You allocate classes statically, so you don't have to take responsibility for their lifetime, and you can pass them as parameters and return from functions easily. On the other hand, in the case, when ordinarily objects would have been copied, they are moved (only their internals are copied). This operation may be implemented a lot less time-costlier than copying (because you do know, that rhs object won't be used anymore).
MyObj * f()
{
// Ok, but caller has to take care of
// freeing the result
return new MyObj();
}
MyObj f()
{
// Assuming, that MyObj does not have move-ctor
// This may be time-costly
MyObj result;
return result;
}
MyObj f()
{
// This is both fast and safe
MyObj result;
return std::move(result);
// Note, if MyObj implements a move-ctor,
// usually you don't have to call std::move.
}
Why cant it just be kept at the same address and not copied
This is actually what move semantics generally does. It often keeps the resource (often memory, but could be file handles, etc.) in the exact same state, but it updates the references in the objects.
Imagine two vectors, src and dest. The src vector contains a large block of data which is allocated on the heap, and dest is empty. When src is moved to dest all that happens is that dest is updated to point to the block of memory on the heap, whilst src is updated to point to whatever dest was pointing to, in this case, nothing.
Why is this useful? Because it means that vector can be written with the confidence that only one vector will ever point to the block of memory it allocates. This means that the destructor can ensure that it cleans up the memory that has been allocated.
This can be extended for objects which manage other resources, such as file handles. It is now possible to write objects that can own a file handle. These objects can be movable, but not copyable. Because STL containers support movable objects, these can be put into containers far easier than they could in C++03. They file handle, or other resource, is guaranteed to only have on reference to it and the destructor can close it appropriately.
I'd answer with a simple example for vector algebra:
class Vector{
size_t dim_;
double *data_;
public:
Vector(const Vector &arg)
: dim_(arg.dim_)
, data_(new double[dim_])
{
std::copy_n(arg.data_, dim_, data_);
}
Vector(Vector &&arg)
: dim_(arg.dim_)
, data_(arg.data_)
{
arg.data_ = nullptr;
}
~Vector()
{
delete[] data_;
}
Vector& operator+= (const Vector &arg)
{
if (arg.dim_ != dim_) throw error;
for (size_t idx = 0; idx < dim_; ++idx) data_[idx] += arg.data_[idx];
return *this;
}
};
Vector operator+ (Vector a, const Vector &b)
{
a += b;
return a;
}
extern Vector v1, v2;
int main()
{
Vector v(v1 + v2);
}
The addition returns a new vector by value. As it's an r-value, it will be moved into v, which means that no extra copies of the potentially huge array data_ will happen.