So I'm trying to implement a pop_back() function for my Vector class, but I'm not getting the expected results:
Here's my current function:
template <typename T>
void Vector<T>::pop_back() {
if(vsize > 0){
array[vsize].~T();
--vsize;
}
}
Why doesn't this delete the last element in the array?
Here's my .h:
template <typename T>
class Vector {
public:
Vector();
~Vector();
void push_back(const T &e);
int size() const;
void pop_back();
void allocate_new();
T operator[](int index);
private:
Vector(const Vector<T> & v);
Vector<T> & operator=(const Vector<T> &);
int vsize;
int capacity;
T* array;
};
Calling the destructor of an object in an array won't do anything to the array other than putting the element of the array into a funny state. In particular, if do something like this:
T* array = new T[2];
array[1].~T();
delete[] array; // ERROR: double destruction
you can't really get rid of the array at all without first restoring the destroyed array element.
The way a "real" implementation deals with the situation is to allocate raw memory, e.g., using void* memory = operator new[](size); (with a suitbale size) or, if it is a standard C++ library container, using the appropriate allocator functions. The container than constructs and destructs objects in the memory as needed. The actual representation in the destroyed memory may still not really change as most destructors just get rid of any resources held and leave the bits in the object otherwise unchanged, giving the appearance as if there is a life object in the memory although it is not.
The last element of the array would be in array[vsize - 1].
You shouldn't call the destructor of an object unless it was created using a placement-new expression. What you should instead do is reduce the size and leave the actual object untouched. Unless it has dynamic-storage duration, in which case call delete/delete[].
Related
I wrote a template class with one int member and a generic-typed pointer member. The pointer member will be used as an array, and it will store sideCount amount of elements. When instantiating an object, it will take an int value for sideCount and an array of any type that is the size of sideCount and is already initialized with values.
I wrote 2 versions of this class. For the first one, I didn't specifically allocate any memory for my pointer member:
template <class T>
class die_v1
{
private:
int sideCount;
T* valuesOfSides;
public:
die_v1() : sideCount(0), valuesOfSides(nullptr) {}
die_v1(int sc, T* vos) : sideCount(sc), valuesOfSides(vos) {}
};
For the second one, I dynamically allocated memory for valuesOfSides using new[]:
template <class T>
class die_v2
{
private:
int sideCount;
T* valuesOfSides;
public:
die_v2(int sc, T* vos) : sideCount(sc)
{
valuesOfSides = new T[sideCount];
for (int i = 0; i < sideCount; ++i)
valuesOfSides[i] = vos[i];
}
~die_v2() { delete[] valuesOfSides; }
};
My question is, do I need to specifically allocate memory for valuesOfSides as I did in the second version?
The reason I am asking this is, for the second version, I know exactly what is going on. I'm allocating memory for valuesOfSides with the size of sideCount and I'm assigning it with the data in the parameter array. For the first version, I just know it works when I initialize it with the member initializer list, does it have a size? If so, is it the size of the array that I sent from main()? It looks more clean and efficient since I don't have to allocate any memory, and I don't need a destructor.
Does the die_v2 provide any code stability compared to die_v1? Or is it the same thing with more steps?
I wonder if the range constructor of std::vector does copy the data, or does it just reference it?
Have a look at this example:
vector<int> getVector() {
int arr[10];
for(int i=0; i<10; ++i) arr[i] = i;
return vector<int>(arr, arr+10);
}
Would this cause a bug (due to handing out a reference to the stack which is destroyed later) or is it fine, since it copies the data in the constructor?
Edit #1
For clarification: I'm looking for a more or less official resource that points out, which of the following pseudo code implementations of the constructor are valid. I know the signature of the constructor is different... but, you should get the idea.
Version A (just uses the given data internally)
template<typename T>
class vector {
private:
T* data;
int size;
public:
vector<T>(T* start, T* end) {
data = start;
size = (end - start);
}
};
Version B (explicitly copies the data)
template<typename T>
class vector {
private:
T* data;
int size;
public:
vector<T>(T* start, T* end) {
for(T* it = start; it < end; ++it) push_back(*it);
}
};
When in doubt, check the reference. The answer can be derived from Complexity section, although I'd agree there is no explicit confirmation:
Complexity: Makes only N calls to the copy constructor of T (where N
is the distance between first and last) and no reallocations if
iterators first and last are of forward, bidirectional, or random
access categories. It makes order N calls to the copy constructor of T
and order logN reallocations if they are just input iterators.
Like all constructors of std::vector<int>, this copies the integers. The same holds for methods like push_back and insert
This is why std::vector actually has two template arguments. The second one is defaulted to std::allocator; it's the allocator used to allocate memory for the 10 integers (and perhaps a few more so that the vector can grow - see capacity)
[Edit]
The actual code is most like Version B, but probably similar to
template<typename T>
class vector {
private:
T* _Data = nullptr;
size_t _Capacity = 0;
size_t _Used = 0;
public:
vector<T>(T* start, T* end) {
_Used = (end-begin);
reserve(_Used); // Sets _Data, _Capacity
std::uninitialized_copy(begin, end, _Data);
}
};
The C++ standard library is specified in a somewhat strange way.
It is specified saying what each method requires and what each method guarantees. It is not specified as in "vector is a container of values that it owns", even though that is the real underlying abstraction here.
Formally, what you are doing is safe not because "the vector copies", but because none of the preconditions of any of the methods of std vector are violated in the copy of the std vector your function returns.
Similarly, the values are set to be certain ones because of the postconditions of the constructor, and then the pre and post conditions of the copy constructor and/or C++17 prvalue "elision" rules.
But trying to reason about C++ code in this way is madness.
A std::vector semantically is a regular type with value semantics that owns its own elements. Regular types can be copied, and the copies behave sane even if the original object is destroyed.
Unless you make a std::vector<std::reference_wrapper<int>> you are safe, and you are unsafe for the reference wrapper because you stored elements which are not regular value types.
The vector can not be defined as a vector of references as for example std::vector<int &>. So the code is valid. The vector does not contain references to elements of the array. It creates new elements of the type int (as the template argument of the vector) not a vector of references.
My teammates are writing a fixed-size implementation of std::vector for a safety-critical application. We're not allowed to use heap allocation, so they created a simple array wrapper like this:
template <typename T, size_t NUM_ITEMS>
class Vector
{
public:
void push_back(const T& val);
...more vector methods
private:
// Internal storage
T storage_[NUM_ITEMS];
...implementation
};
A problem we encountered with this implementation is that it requires elements present default constructors (which is not a requirement of std::vector and created porting difficulties). I decided to hack on their implementation to make it behave more like std::vector and came up with this:
template <typename T, size_t NUM_ITEMS>
class Vector
{
public:
void push_back(const T& val);
...more vector methods
private:
// Internal storage
typedef T StorageType[NUM_ITEMS];
alignas(T) char storage_[NUM_ITEMS * sizeof(T)];
// Get correctly typed array reference
StorageType& get_storage() { return reinterpret_cast<T(&)[NUM_ITEMS]>(storage_); }
const StorageType& get_storage() const { return reinterpret_cast<const T(&)[NUM_ITEMS]>(storage_); }
};
I was then able to just search and replace storage_ with get_storage() and everything worked. An example implementation of push_back might then look like:
template <typename T, size_t NUM_ITEMS>
void Vector<T, NUM_ITEMS>::push_back(const T& val)
{
get_storage()[size_++] = val;
}
In fact, it worked so easily that it got me thinking.. Is this a good/safe use of reinterpret_cast? Is the code directly above a suitable alternative to placement new, or are there risks associated with copy/move assignment to an uninitialized object?
EDIT: In response to a comment by NathanOliver, I should add that we cannot use the STL, because we cannot compile it for our target environment, nor can we certify it.
The code you've shown is only safe for POD types (Plain Old Data), where the object's representation is trivial and thus assignment to an unconstructed object is ok.
If you want this to work in all generality (which i assume you do due to using a template), then for a type T it is undefined behavior to use the object prior to construction it. That is, you must construct the object before your e.g. assignment to that location. That means you need to call the constructor explicitly on demand. The following code block demonstrates an example of this:
template <typename T, size_t NUM_ITEMS>
void Vector<T, NUM_ITEMS>::push_back(const T& val)
{
// potentially an overflow test here
// explicitly call copy constructor to create the new object in the buffer
new (reinterpret_cast<T*>(storage_) + size_) T(val);
// in case that throws, only inc the size after that succeeds
++size_;
}
The above example demonstrates placement new, which takes the form new (void*) T(args...). It calls the constructor but does not actually perform an allocation. The visual difference is the inclusion of the void* argument to operator new itself, which is the address of the object to act on and call the constructor for.
And of course when you remove an element you'll need to destroy that explicitly as well. To do this for a type T, simply call the pseudo-method ~T() on the object. Under templated context the compiler will work out what this means, either an actual destructor call, or no-op for e.g. int or double. This is demonstrated below:
template<typename T, size_t NUM_ITEMS>
void Vector<T, NUM_ITEMS>::pop_back()
{
if (size_ > 0) // safety test, you might rather this throw, idk
{
// explicitly destroy the last item and dec count
// canonically, destructors should never throw (very bad)
reinterpret_cast<T*>(storage_)[--size_].~T();
}
}
Also, I would avoid returning a refernce to an array in your get_storage() method, as it has length information and would seem to imply that all elements are valid (constructed) objects, which of course they're not. I suggest you provide methods for getting a pointer to the start of the contiguous array of constructed objects, and another method for getting the number of constructed objects. These are the .data() and .size() methods of e.g. std::vector<T>, which would make use of your class less jarring to seasoned C++ users.
Is this a good/safe use of reinterpret_cast?
Is the code directly above a suitable alternative to placement new
No. No.
or are there risks associated with copy/move assignment to an uninitialized object?
Yes. The behaviour is undefined.
Assuming memory is uninitialised, copying the vector has undefined behaviour.
No object of type T has started its lifetime at the memory location. This is super bad when T is not trivial.
The reinterpretation violates the strict aliasing rules.
First is fixed by value-initialising the storage. Or by making the vector non-copyable and non-movable.
Second is fixed by using placement new.
Third is technically fixed by using using the pointer returned by placement new, but you can avoid storing that pointer by std::laundering after reinterpreting the storage.
Consider this implementation of std::vector::reserve() from the book "The C++ Programming Language, 4th ed., Bjarne Stroustrup:
template<class T, class A>
void vector<T,A>::reserve(size_type newalloc)
{
if (newalloc<=capacity()) return;
vector_base<T,A> b {vb.alloc,newalloc}; // get new storage
// (see PS of question for details on vb data member)
T* src = elem; // ptr to the start of old storage
T* dest = b.elem; // ptr to the start of new storage
T* end = elem+size(); // past-the-end ptr to old storage
for (; src!=end; ++src, ++dest) {
new(static_cast<void*>(dest)) T{move(*src)}; // move construct
src–>~T(); // destroy
}
swap(vb,b); // install new base (see PS if needed)
} // implicitly release old space(when b goes out of scope)
Note that in the loop, for each element in vector, at least one call is made to a ctor and a dtor(possibly triggering more such calls if the element's class has bases, or if the class or its bases have data members with ctors). (In the book, for-loop is actually a separate function, but I injected it into the reserve() here for simplicity.)
Now consider my suggested alternative:
template<class T, class A>
void vector<T,A>::reserve(size_type newalloc)
{
if (newalloc<=capacity()) return;
vector_base<T,A> b {vb.alloc,newalloc}; // get new space
memcpy(b.elem, elem, sz); // copy raw memory
// (no calls to ctors or dtors)
swap(vb,b); // install new base
} // implicitly release old space(when b goes out of scope)
To me, it seems like the end result is the same, minus the calls to ctors/dtors.
Is there a situation where this alternative would fail, and if so, where is the flaw?
P.S. I don't think it is much relevant, but here are the data members of vector and vector_base classes:
// used as a data member in std::vector
template<class T, class A = allocator<T> >
struct vector_base { // memory structure for vector
A alloc; // allocator
T* elem; // start of allocation
T* space; // end of element sequence, start of space allocated for possible expansion
T* last; // end of allocated space
vector_base(const A& a, typename A::size_type n)
: alloc{a}, elem{alloc.allocate(n)}, space{elem+n}, last{elem+n} { }
~vector_base() { alloc.deallocate(elem,last–elem); } // releases storage only, no calls
// to dtors: vector's responsibility
//...
};
// std::vector
template<class T, class A = allocator<T> >
class vector {
vector_base<T,A> vb; // the data is here
void destroy_elements();
public:
//...
};
This might fail:
memcpy() will work only if you have a vector of POD.
It will fail for all other kind of objects as it doesn't respect the semantic of the objects it copies (copy construction).
Example of issues:
If the constructor of the object sets some internal pointers to internal members, your memcpy() will copy the value of the original pointer, which will not be updated correctly and continue to point to a memory region that will be released.
If the object contains a shared_ptr, the object count will become inconsistent (memcpy() will duplicate the pointer without incrementing its reference count, then swap() will make sure that the original shared pointer will be in b, which will be released, so that the shared pointer reference count will be decremented).
As pointed out by T.C in the comments, as soon as your vector stores non-POD data the memcpy() results in UB (undefined behaviour).
As an exercise, I'm trying to write a class like a std::vector without using a template. The only type it holds is std::string.
Below is the strvec.h file:
class StrVec
{
public:
//! Big 3
StrVec():
element(nullptr), first_free(nullptr), cap(nullptr)
{}
StrVec(const StrVec& s);
StrVec&
operator =(const StrVec& rhs);
~StrVec();
//! public members
void push_back(const std::string &s);
std::size_t size() const { return first_free - element; }
std::size_t capacity() const { return cap - element; }
std::string* begin() const { return element; }
std::string* end() const { return first_free; }
void reserve(std::size_t n);
void resize(std::size_t n);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^
private:
//! data members
std::string* element; // pointer to the first element
std::string* first_free; // pointer to the first free element
std::string* cap; // pointer to one past the end
std::allocator<std::string> alloc;
//! utilities
void reallocate();
void chk_n_alloc() { if (size() == capacity()) reallocate(); }
void free();
void wy_alloc_n_move(std::size_t n);
std::pair<std::string*, std::string*>
alloc_n_copy (std::string* b, std::string* e);
};
The three string*, element, first_free, and cap can be thought of as:
[0][1][2][3][unconstructed elements]
^ ^ ^
element first_free cap
When implementing the member resize(size_t n), I have a problem. Say, v.resize(3) is called. As a result the pointer first_free must be moved forward one place and point to [3]. Something like:
[0][1][2][3][unconstructed elements]
^ ^ ^
element first_free cap
My question is how should I deal with [3]? Leave it there untouched? Or destroy it like:
if(n < size())
{
for(auto p = element + n; p != first_free; /* empty */)
alloc.destroy(p++);
first_free = element + n;
}
Is the code alloc.destroy( somePointer) necessary here?
Yes, definitely, you should call destroy on elements that are removed from the vector when resize() is called with an argument smaller than the current size of the vector. That's what std::vector does, too.
Note that destroy only calls the destructor on those elements; it does not deallocate any space (which would be wrong).
Since you are dealing with std::string, you probably think you could do without destruction if you are sure that you re-initialize the same std::string object later with a new value. But firstly, you can't be sure that a new string will be stored in the same place later, and secondly, for the new string a new object would be created (using placement-new, not copy-assignment), leaking the memory of the previous string (whose destructor would never have been called).
What you should do depends on how you've initialised element, as you need your code to be consistent.
if you use new std::string[n] to create the array of strings, then they will all be pre-initialised, and when you necessarily use delete[] to deallocate them later their destructors will all be run. For that reason, you must not call the destructors manually in the intervening time unless you are certain you'll placement-new a valid object there again.
if you use something like static_cast<std::string*>(new char[sizeof(std::string) * n]) to create a buffer of un-initialised memory, then you must take full responsibility for calling the constructor and destructor of every element at appropriate times
With the first option, you wouldn't need to do anything for resize(3), but could call .clear() on the string to potentially free up some memory if you wanted.
With the second option, you must trigger the destructor for [3] (unless you're keeping some other record of which element eventually need destruction, which seems a clumsy model).
The issues are identical to just having memory for a single string that is "in use" at different times during the program. Do you spend the time to construct it before first use then assign to it, or do you leave it uninitialised then copy-construct it with placement new? Do you clear it when unused or destruct it? Either model can work with careful implementation. The first approach tends to be easier to implement correctly, the second model slightly more efficient when the array capacity is much greater than the number of element that end up being used.