How is vector implemented in C++ - c++

I am thinking of how I can implement std::vector from the ground up.
How does it resize the vector?
realloc only seems to work for plain old stucts, or am I wrong?

it is a simple templated class which wraps a native array. It does not use malloc/realloc. Instead, it uses the passed allocator (which by default is std::allocator).
Resizing is done by allocating a new array and copy constructing each element in the new array from the old one (this way it is safe for non-POD objects). To avoid frequent allocations, often they follow a non-linear growth pattern.
UPDATE: in C++11, the elements will be moved instead of copy constructed if it is possible for the stored type.
In addition to this, it will need to store the current "size" and "capacity". Size is how many elements are actually in the vector. Capacity is how many could be in the vector.
So as a starting point a vector will need to look somewhat like this:
template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_;
typename A::size_type capacity_;
typename A::size_type size_;
A allocator_;
};
The other common implementation is to store pointers to the different parts of the array. This cheapens the cost of end() (which no longer needs an addition) ever so slightly at the expense of a marginally more expensive size() call (which now needs a subtraction). In which case it could look like this:
template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_; // points to first element
T* end_capacity_; // points to one past internal storage
T* end_; // points to one past last element
A allocator_;
};
I believe gcc's libstdc++ uses the latter approach, but both approaches are equally valid and conforming.
NOTE: This is ignoring a common optimization where the empty base class optimization is used for the allocator. I think that is a quality of implementation detail, and not a matter of correctness.

Resizing the vector requires allocating a new chunk of space, and copying the existing data to the new space (thus, the requirement that items placed into a vector can be copied).
Note that it does not use new [] either -- it uses the allocator that's passed, but that's required to allocate raw memory, not an array of objects like new [] does. You then need to use placement new to construct objects in place. [Edit: well, you could technically use new char[size], and use that as raw memory, but I can't quite imagine anybody writing an allocator like that.]
When the current allocation is exhausted and a new block of memory needs to be allocated, the size must be increased by a constant factor compared to the old size to meet the requirement for amortized constant complexity for push_back. Though many web sites (and such) call this doubling the size, a factor around 1.5 to 1.6 usually works better. In particular, this generally improves chances of re-using freed blocks for future allocations.

From Wikipedia, as good an answer as any.
A typical vector implementation consists, internally, of a pointer to
a dynamically allocated array,[2] and possibly data members holding
the capacity and size of the vector. The size of the vector refers to
the actual number of elements, while the capacity refers to the size
of the internal array. When new elements are inserted, if the new size
of the vector becomes larger than its capacity, reallocation
occurs.[2][4] This typically causes the vector to allocate a new
region of storage, move the previously held elements to the new region
of storage, and free the old region. Because the addresses of the
elements change during this process, any references or iterators to
elements in the vector become invalidated.[5] Using an invalidated
reference causes undefined behaviour

Like this:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h
(official gcc mirror on github)

///Implement Vector class
class MyVector {
int *int_arr;
int capacity;
int current;
public:
MyVector() {
int_arr = new int[1];
capacity = 1;
current = 0;
}
void Push(int nData);
void PushData(int nData, int index);
void PopData();
int GetData(int index);
int GetSize();
void Print();
};
void MyVector::Push(int data)
{
if (current == capacity){
int *temp = new int[2 * capacity];
for (int i = 0; i < capacity; i++)
{
temp[i] = int_arr[i];
}
delete[] int_arr;
capacity *= 2;
int_arr = temp;
}
int_arr[current] = data;
current++;
}
void MyVector::PushData(int data, int index)
{
if (index == capacity){
Push(index);
}
else
int_arr[index] = data;
}
void MyVector::PopData(){
current--;
}
int MyVector::GetData(int index)
{
if (index < current){
return int_arr[index];
}
}
int MyVector::GetSize()
{
return current;
}
void MyVector::Print()
{
for (int i = 0; i < current; i++) {
cout << int_arr[i] << " ";
}
cout << endl;
}
int main()
{
MyVector vect;
vect.Push(10);
vect.Push(20);
vect.Push(30);
vect.Push(40);
vect.Print();
std::cout << "\nTop item is "
<< vect.GetData(3) << std::endl;
vect.PopData();
vect.Print();
cout << "\nTop item is "
<< vect.GetData(1) << endl;
return 0;
}

It allocates a new array and copies everything over. So, expanding it is quite inefficient if you have to do it often. Use reserve() if you have to use push_back().

You'd need to define what you mean by "plain old structs."
realloc by itself only creates a block of uninitialized memory. It does no object allocation. For C structs, this suffices, but for C++ it does not.
That's not to say you couldn't use realloc. But if you were to use it (note you wouldn't be reimplementing std::vector exactly in this case!), you'd need to:
Make sure you're consistently using malloc/realloc/free throughout your class.
Use "placement new" to initialize objects in your memory chunk.
Explicitly call destructors to clean up objects before freeing your memory chunk.
This is actually pretty close to what vector does in my implementation (GCC/glib), except it uses the C++ low-level routines ::operator new and ::operator delete to do the raw memory management instead of malloc and free, rewrites the realloc routine using these primitives, and delegates all of this behavior to an allocator object that can be replaced with a custom implementation.
Since vector is a template, you actually should have its source to look at if you want a reference – if you can get past the preponderance of underscores, it shouldn't be too hard to read. If you're on a Unix box using GCC, try looking for /usr/include/c++/version/vector or thereabouts.

You can implement them with resizing array implementation.
When the array becomes full, create an array with twice as much the size and copy all the content to the new array. Do not forget to delete the old array.
As for deleting the elements from vector, do resizing when your array becomes a quarter full. This strategy makes prevents any performance glitches when one might try repeated insertion and deletion at half the array size.
It can be mathematically proved that the amortized time (Average time) for insertions is still linear for n insertions which is asymptotically the same as you will get with a normal static array.

realloc only works on heap memory. In C++ you usually want to use the free store.

Related

Fixed allocation std::vector

I'm an embedded software developer and as such I can't always use all the nice C++ features. One of the most difficult things is avoiding dynamic memory allocation as it is somewhat universal with all STL containers.
The std::vector is however very useful when working with variable datasets. The problem though is that the allocation(e.g. std::reserve) isn't done at initialization or fixed. This means that memory fragmentation can occur when a copy occurs.
It would be great to have every vector have an allocated memory space which is the max size the vector can grow to. This would create deterministic behaviour and make it possible to map the memory usage of the microcontroller at compilation time. A call to push_back when the vector is at it's max size would create a std::bad_alloc.
I have read that an alternative version of std::allocator can be written to create new allocation behaviour. Would it be possible to create this kind of behaviour with std::allocator or would an alternative solution be a better fit?
I would really like to keep using the STL libraries and amend to them instead of recreating my own vector as I'm more likely to make mistakes than their implementation.
sidenote #1:
I can't use std::array as 1: it isn't provided by my compiler and 2: it does have a static allocation but I then still have to manage the boundary between my data and buffer inside the std::array. This means rewriting a std::vector with my allocation properties which is what I'm trying to get away from.
You can implement or reuse boost's static_vector; A variable-size array container with fixed capacity.
And also: LLVM's small vector without LLVM dependencies here. This creates objects at the stack until a compile-time constant is reached, then it moves to the heap.
You can always use a C-style array (same as underlying in std::array) as vectors aren't supposed to be static
int arr[5]; // static array of 5 integers
To have it more useful you can wrap it in a class template to hide the C-style
Example:
template<class type, std::size_t capacaty>
class StaticVector {
private:
type arr[capacaty];
std::size_t m_size;
public:
StaticVector() : m_size(0) {}
type at(std::size_t index) {
if (index >=0 && index < m_size) {
return arr[index];
}
return type();
}
void remove(std::size_t index) {
if (index >=0 && index < m_size) {
for (std::size_t i=index; i < m_size-1; i++) {
arr[i] = arr[i+1];
}
m_size--;
}
}
void push_back(type val) {
if (m_size < capacaty) {
arr[m_size] = val;
m_size++;
}
}
std::size_t size() {
return m_size;
}
};
Example with it in use: https://onlinegdb.com/BkBgSTlZH

How can I correctly push back a series of objects in a vector in C++?

The scope of the program is to create a Container object which stores in a vector Class objects. Then I want to print, starting from a precise Class object of the vector all its predecessors.
class Class{
public:
Class(){
for (int i = 0; i < 10; ++i) {
Class c;
c.setName(i);
if (i > 0) {
c.setNext(_vec,i-1);
}
_vec.push_back(c);
}
}
};
~Class();
void setName(const int& n);
void setNext( vector<Class>& vec, const int& pos);
Class* getNext();
string getName();
void printAllNext(){ //print all next Class objects including himself
cout << _name <<endl;
if (_next != nullptr) {
(*_next).printAllNext();
}
}
private:
Class* _next;
string _name;
};
class Container{
public:
Container(){
for (int i = 0; i < 10; ++i) {
Class c;
c.setName(i);
if (i > 0) {
c.setNext(_vec,i-1);
}
_vec.push_back(c);
};
~Container();
void printFromVec(const int& n){//print all objects of _vec starting from n;
_vec[n].printAllNext();
};
private:
vector<Class> _vec;
};
int main() {
Container c;
c.printFromVec(5);
}
The problem is that all _next pointers of Class objects are undefined or random.
I think the problem is with this part of code:
class Container{
public:
Container(){
for (int i = 0; i < 10; ++i) {
Class c;
c.setName(i);
if (i > 0) {
c.setNext(_vec,i-1);
}
_vec.push_back(c);
};
Debugging I noticed that pointers of already created objects change their values.
What is the problem? How can I make it work?
Although there is really error in the code (likely wrong copypaste), the problem is really following: std::vector maintains inside dynamically allocated array of objects. It starts with certain initial size. When you push to vector, it fills entries of array. When all entries are filled but you attempt pushing more elements, vector allocates bigger chunk of memory and moves or copies (whichever you element data type supports) objects to a new memory location. That's why address of object changes.
Now some words on what to do.
Solution 1. Use std::list instead of std::vector. std::list is double linked list, and element, once added to list, will be part of list item and will not change its address, there is no reallocation.
Solution 2. Use vector of shared pointers. In this case you will need to allocate each object dynamically and put address into shared pointer object, you can do both at once by using function std::make_shared(). Then you push shared pointer to vector, and store std::weak_ptr as pointer to previous/next one.
Solution 3. If you know maximum number of elements in vector you may ever have, you can leave all as is, but do one extra thing before pushing very first time - call reserve() on vector with max number of elements as parameters. Vector will allocate array of that size and keep it until it is filled and more space needed. But since you allocated maximum possible size you expect to ever have, reallocation should never happen, and so addresses of objects will remain same.
Choose whichever solution you think fits most for your needs.
#ivan.ukr Offered a number of solutions for keeping the pointers stable. However, I believe that is the wrong problem to solve.
Why do we need stable pointers? So that Class objects can point to the previous object in a container.
Why do we need the pointers to previous? So we can iterate backwards.
That’s the real problem: iterating backwards from a point in the container. The _next pointer is an incomplete solution to the real problem which is iteration.
If you want to iterate a vector, use iterators. You can read about them on the cppreference page for std::vector. I don’t want to write the code for you but I’ll give you some hints.
To get an iterator referring to the ith element, use auto iter = _vec.begin() + i;.
To print the object that this iterator refers to, use iter->print() (you’ll have to rename printAllNext to print and have it just print this object).
To move an iterator backwards, use --iter.
To check if an iterator refers to the first element, use iter == _vec.begin().
You could improve this further by using reverse iterators but I’ll leave that up to you.

"Moving" sequential containers to pointers

I'm building a buffer for network connections where you can explicitly allocate memory or you can supply it on your own via some sequential container(eg.:std::vector,std::array)these memory chunks are stored in a list what we use later for read/write operations. (the chunks are needed for handle multiple read/write requests)
I have a question about the last part, I want to make a pointer to the container's data and then tell the container to not care about it's data anymore.
So something like move semantics.
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
std::vector<int> _v(std::move(v));
Where _v has all the values of v and v left in a safe state.
The problem is if I just make a pointer for v.data() after the lifetime of the container ends, the data pointed by the pointer releases with the container.
For example:
// I would use span to make sure It's a sequential container
// but for simplicity i use a raw pointer
// gsl::span<int> s;
int *p;
{
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
// s = gsl::make_span(v);
p = v.data();
}
for(int i = 0; i < 10; ++i)
std::cout << p[i] << " ";
std::cout << std::endl;
Now p contains some memory trash and i would need the memory previously owned by the vector.
I also tried v.data() = nullptr but v.data() is rvalue so it's not possible to assign it. Do you have any suggestions, or is this possible?
edit.:
To make it more clear what i'm trying to achieve:
class readbuf_type
{
struct item_type // representation of a chunk
{
uint8_t * const data;
size_t size;
inline item_type(size_t psize)
: size(psize)
, data(new uint8_t[psize])
{}
template <std::ptrdiff_t tExtent = gsl::dynamic_extent>
inline item_type(gsl::span<uint8_t,tExtent> s)
: size(s.size())
, data(s.data())
{}
inline ~item_type()
{ delete[] data; }
};
std::list<item_type> queue; // contains the memory
public:
inline size_t read(uint8_t *buffer, size_t size); // read from queue
inline size_t write(const uint8_t *buffer, size_t size); // write to queue
inline void *get_chunk(size_t size)
{
queue.emplace_back(size);
return queue.back().data;
}
template <std::ptrdiff_t tExtent = gsl::dynamic_extent>
inline void put_chunk(gsl::span<uint8_t,tExtent> arr)
{
queue.emplace_back(arr);
}
} readbuf;
I have the get_chunkfunction what basically just allocates memory with the size, and I have put_chunk what I'm struggling with, the reason i need this because before you can write to this queue you need to allocate memory and then copy all the elements from the buffer(vector,array) you're trying to write from to the queue.
Something like:
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
// instead of this
readbuf.get_chunk(v.size);
readbuf.write(v.data(), v.size());
// we want this
readbuf.put_chunk({v});
Since we're developing for distributed systems memory is crucial and that's why we want to avoid the unnecessary allocation, copying.
ps.This is my first post, so sorry if i wasn't precise in the first place..
No, it is not possible to "steal" the buffer of the standard vector in the manner that you suggest - or any other standard container for that matter.
You've already shown one solution: Move the buffer into another vector, instead of merely taking the address (or another non-owning reference) of the buffer. Moving from the vector transfers the ownership of the internal buffer.
It would be possible to implement such custom vector class, whose buffer could be stolen, but there is a reason why vector doesn't make it possible. It can be quite difficult to prove the correctness of your program if you release resources willy-nilly. Have you considered how to prevent the data from leaking? The solution above is much simpler and easier to verify for correctness.
Another approach is to re-structure your program in such way that no references to the data of your container outlive the container itself (or any invalidating operation).
Unfortunately the memory area of the vector cannot be detached from the std::vector object. The memory area can be deleted even if you insert some data to the std::vector object. Therefore use of this memory area later is not safe, unless you are sure that this particular std::vector object exists and is not modified.
The solution to this problem is to allocate a new memory area and copy the content of the vector to this newly allocated memory area. The newly allocated memory area can be safely accessed without worrying about the state of the std::vector object.
std::vector<int> v = {1, 2, 3, 4};
int* p = new int[v.size()];
memcpy(p, v.data(), sizeof(int) * v.size());
Don't forget to delete the memory area after you are finished using this memory area.
delete [] p;
Your mistake is in thinking that the pointer "contains" memory. It doesn't contain anything, trash or ints or otherwise. It is a pointer. It points to stuff. You have deleted that stuff and not transferred it anywhere else, so it can't work any more.
In general, you will need a container to put this information in, be it another vector, or even your own hand-made array. Just having a pointer to data does not mean you have data.
Furthermore, since it is impossible to ask a vector to relinquish its buffer to a non-vector thing, a vector is really your only chance in this particular case. It's not quite clear why that's not a good enough solution for you. :)
Not sure what you try to achieve but I would use moving semantic like this:
#include <iostream>
#include <memory>
#include <vector>
int main() {
std::unique_ptr<std::vector<int>> p;
{
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
p = std::move(make_unique<std::vector<int>>(v));
}
for(int i = 0; i < 10; ++i)
std::cout << (*p)[i] << " ";
std::cout << std::endl;
return 0;
}

Is there a way to print the amount of heap memory an object has allocated?

In a running program, how can I track/print the amount of heap memory an object has allocated?
For example:
#include <iostream>
#include <vector>
int main(){
std::vector<int> v;
std::cout << heap_sizeof(v) << '\n';
for (int i = 0; i < 1000; ++i){
v.push_back(0);
}
std::cout << heap_sizeof(v) << '\n';
}
Is there an implementation that could substitute heap_sizeof()?
With everything as it's designed out of the box, no, that's not possible. You do have a couple of options for doing that on your own though.
If you need this exclusively for standard containers, you can implement an allocator that tracks the memory that's been allocated (and not freed) via that allocator.
If you want this capability for everything allocated via new (whether a container or not) you can provide your own implementation of operator new on a global and/or class-specific basis, and have it (for example) build an unordered map from pointers to block sizes to tell you the size of any block it's allocated (and with that, you'll have to provide a function to retrieve that size). Depending on the platform, this might also be implemented using platform-specific functions. For example, when you're building for Microsoft's compiler (well, library, really) your implementation of operator new wouldn't have to do anything special at all, and the function to retrieve a block's size would look something like this:
size_t block_size(void const *block) {
return _msize(block);
}
Yet another possibility would be to increase the allocation size of each requested block by the size of an integer large enough to hold the size. In this case, you'd allocate a bigger chunk of data than the user requested, and store the size of that block at the beginning of the block that was returned. When the user requests the size of a block, you take the correct (negative) offset from the pointer they pass, and return the value you stored there.
First, v is allocated on the stack, not on the heap.
To get the total amount of space used by it, I suggest using this function: (Found on this article, and modified a bit)
template <typename T>
size_t areaof (const vector<T>& x)
{
return sizeof (vector<T>) + x.capacity () * sizeof (T);
}
If you want not to count the size of the std::vector object itself, the delete the part with sizeof:
template <typename T>
size_t heap_sizeof (const vector<T>& x)
{
return x.capacity () * sizeof (T);
}
If you are not concerned with accounting for what each object allocates and are more concerned with how much memory has been allocated/freed between to point in time, you can use the malloc statistics functions. Each malloc has its own version. On linux you can usemallocinfo().

Deleting array without calling destructors

I am working on an application with high performance and memory needs. With that I mean 80 cores and 500 GB of RAM. To save some memory, I use my own dynamic array (16 B overhead) as opposed to std::vector (24 B overhead), which matters if you have billions of them.
My question relates to expanding that array which looks like this:
//private
template <class ArrType>
void DynamicArray<ArrType>::reallocate(unsigned newCapacity) {
if (newCapacity < _size) return;
if (capacity == newCapacity) return;
ArrType * newArray = new ArrType[newCapacity];
capacity = newCapacity;
//for (unsigned i = 0; i < _size; i++) {
// newArray[i] = array[i];
//}
memcpy(newArray, array, _size * sizeof(ArrType));
if(array) delete [] array;
array = newArray;
}
As you can see, pretty standard reallocation, but I tested memcpy and it was about 10 times faster than using a for cycle. The problem is when I call delete, it will call destructors for objects of ArrType, which is a problem when ArrType has its own dynamic allocations. The copy in newArray will use deleted memory. Is there any way to delete the old array without calling destructors?
Replace your memcpy with:
std::move(array, array + _size, newArray);
And require that the type ArrType must have a correct move or copy assignment operator.
But in real life, just use vector<ArrType>.
In fact vector is better than this: rather than allocating an array (which runs a constructor if the type has one) and then move-assigning (which over-writes what new just did) it allocates raw memory and then uses the move constructor with placement new.
So, if you absolutely positively need a version of vector that uses a smaller type for size_type than the one in your implementation I suppose the thing to do is to re-implement vector under a new name with that change. You can use the source in your implementation to help you: that way you will have solutions in front of you to this problem and all the other problems involved.