Can i forbid vector to reallocate objects? - c++

Let's say we have a vector of double (values) and a vector containing objects that store pointers to the elements of the vector of double (vars):
class Var
{
public:
explicit Var(double* v) : _value(v) {};
~Var() {};
const double get() const { return *_value; };
void set(const double v) { *_value = v; };
private:
double* _value;
};
struct Vars
{
Vars()
{
//values.reserve(1000'000);
}
vector<double> values{};
vector<Var> vars{};
void push(const double v)
{
values.push_back(v);
vars.emplace_back(&values.back());
}
};
(objects after adding are never deleted)
It is known that when a vector reallocates objects, all pointers break.
I can pre-call reserve(), but the problem is that I don't know how many objects will be stored, maybe 5, or maybe 500'000 and shrink_to_fit() will break pointers anyway.
I can use a deque in this case, but I'm wondering if I can prevent the vector from reallocating memory when I call shrink_to_fit (), or some other way?

The key is
objects after adding are never deleted
Rather than store pointers/iterators/references to elements of the vector, store a pointer1 to the vector and an index.
class Var
{
public:
explicit Var(std::vector<double> * vec, size_t index)
: vec(vec), index(index) {};
double get() const { return vec->at(index); };
void set(double v) { vec->at(index) = v; };
private:
std::vector<double> * vec;
size_t index;
};
struct Vars
{
vector<double> values;
vector<Var> vars;
void push(double v)
{
vars.emplace_back(&values, values.size());
values.push_back(v);
}
};
pointer and not reference so we are a SemiRegular type (and can easily define == to be Regular)

std::vector does not allow you to return part of its buffer to the heap while keeping the rest; there is no API in the std::allocator "concept" to do that.
std::vector guarantees contiguous storage, so it cannot be allocated in pieces.
If you don't need contiguous storage, use either std::deque or a hand-rolled duplicate (deque leaves some performance-critical parameters free for implementations to choose and doesn't expose the ability to configure them to end-users; so if your deque isn't appropriate for your use case, you may have to roll your own).
std::deque is a dynamic array of fixed sized buffers, plus a small amount of end-management state. It allows for stable storage with a dynamic sized container and O(1) random access, and the container overhead is a small percentage of the amount stored (the scaling overhead is a pointer per block, and block size is many times larger than a pointer on every implementation I have seen).
It has the disadvantage compared to vector that the storage isn't contiguous, and that iterator/[] lookup has an extra pointer indirect, but it would be a strange contiguous storage container that fit the rest of your requirements.

Can i forbid vector to reallocate objects?
You can avoid reallocation by not performing operations which cause the vector to (potentially) reallocate.
Other than making the vector const, there is no way to prevent those operations from being performed. One solution is to make the vector a private member (and never return a pointer / reference to non-const pointing to that member), in which case you only need to be concerned that your member functions do not perform those operations.
the problem is that I don't know how many objects will be stored, maybe 5, or maybe 500'000
If you don't need the pointers in between insertions, you can perform all insertions in one go (regardless of how many), then get the pointers afterwards, and then no longer modify the vector.
In the general case where above is not an option, if you need the pointers to remain valid, vector is not an appropriate choice of data structure. A list or deque would work depening on how you use the container.
Regarding the example: Having a vector of objects, and vector of (wrappers containing) pointers to those objects seems to be rather pointless.

Related

How do you restrict `resize()` from being called after constructing a vector?

I'm building a class that exposes a sequential container, with a fixed length, but the length isn't known at compile-time.
So when an instance of my class is constructed, a parameter is passed in, to indicate how big the vector needs to be.
But the length needs to be fixed after construction.
I need to guarantee that the resize() function cannot be invoked, while still allowing other parts of my code to modify individual elements within the vector.
(In other words, I can't simply expose the vector as vector<T> const&)
The same goes for any other function which modifies the length, such as insert(), push_back(), etc.
These functions need to be restricted or hidden.
Is this possible?
Or do I really need to build my own fixed_vector wrapper class to hide the undesired functions?
Since C++20 you can return a std::span to the range in the vector. This allows access to the size and modifiable access to the elements, but not the vector's modifiers.
For example:
#include<vector>
#include<span>
class A {
std::vector<int> vec;
public:
/*...*/
auto getVec() {
return std::span(vec);
}
};
The return value can be used as a range, but there is no access to the container interface.
Depending on the types and initialization required, you may also be able to use an array std::unique_ptr instead of a std::vector if you know the size won't change. However that doesn't store the size, which you would then need to store yourself:
#include<vector>
#include<span>
class A {
std::size_t vec_size;
std::unique_ptr<int[]> vec;
public:
A(std::size_t size) : vec_size(size), vec(std::make_unique<int[]>(size)) { }
auto getVec() {
return std::span(vec, vec_size);
}
};
This may be slightly more space efficient since it doesn't require accounting for a difference in vector size and capacity.

What does std::vector::swap actually do?

What triggered this question is some code along the line of:
std::vector<int> x(500);
std::vector<int> y;
std::swap(x,y);
And I was wondering if swapping the two requires twice the amount of memory that x needs.
On cppreference I found for std::vector::swap (which is the method that the last line effectively calls):
Exchanges the contents of the container with those of other. Does not invoke any move, copy, or swap operations on individual elements.
All iterators and references remain valid. The past-the-end iterator is invalidated.
And now I am more confused than before. What does std::vector::swap actually do when it does not move, copy or swap elements?
How is it possible that iterators stay valid?
Does that mean that something like this is valid code?
std::vector<int> x(500);
std::vector<int> y;
auto it = x.begin();
std::swap(x,y);
std::sort(it , y.end()); // iterators from different containers !!
vector internally stores (at least) a pointer to the actual storage for the elements, a size and a capacity.† std::swap just swaps the pointers, size and capacity (and ancillary data if any) around; no doubling of memory or copies of the elements are made because the pointer in x becomes the pointer in y and vice-versa, without any new memory being allocated.
The iterators for vector are generally lightweight wrappers around pointers to the underlying allocated memory (that's why capacity changes generally invalidate iterators), so iterators produced for x before the swap seamlessly continue to refer to y after the swap; your example use of sort is legal, and sorts y.
If you wanted to swap the elements themselves without swapping storage (a much more expensive operation, but one that leaves preexisting iterators for x refering to x), you could use std::swap_range, but that's a relatively uncommon use case. The memory usage for that would depend on the implementation of swap for the underlying object; by default, it would often involve a temporary, but only for one of the objects being swapped at a time, not the whole of one vector.
† Per the comments, it could equivalently use pointers to the end of the used space and the end of the capacity, but either approach is logically equivalent, and just microoptimizes in favor of slightly different expected use cases; storing all pointers optimizes for use of iterators (a reasonable choice), while storing size_type optimizes for .size()/.capacity() calls.
I will write a toy vector.
struct toy_vector {
int * buffer = 0;
std::size_t valid_count = 0;
std::size_t buffer_size = 0;
int* begin() { return buffer; }
int* end() { return buffer+valid_count; }
std::size_t capacity() const { return buffer_size; }
std::size_t size() const { return valid_count; }
void swap( toy_vector& other ) {
std::swap( buffer, other.buffer );
std::swap( valid_count, other.valid_count );
std::swap( buffer_size, other.buffer_size );
}
That is basically it. I'll implement a few methods so you see we have enough tools to work with:
int& operator[](std::size_t i) { return buffer[i]; }
void reserve(int capacity) {
if (capacity <= buffer_size)
return;
toy_vector tmp;
tmp.buffer = new int[capacity];
for (std::size_t i = 0; i < valid_count; ++i)
tmp.buffer[i] = std::move(buffer[i]);
tmp.valid_count = valid_count;
tmp.buffer_size = capacity;
swap( tmp );
}
void push_back(int x) {
if (valid_count+1 > buffer_size) {
reserve( (std::max)((buffer_size*3/2), buffer_size+1) );
buffer[valid_count] = std::move(x);
++valid_count;
}
// real dtor is more complex.
~toy_vector() { delete[] buffer; }
};
actual vectors have exception safety issues, more concerns about object lifetime and allocators (I'm using ints, so don't care if I properly construct/destroy them), and might store 3 pointers instead of a pointer and 2 sizes. But swapping those 3 pointers is just as easy as swapping the pointer and 2 sizes.
Iterators in real vectors tend not to be raw pointers (but they can be). As you can see above, the raw pointer iterators to the vector a become raw pointer iterators into vector b when you do an a.swap(b); non-raw pointer vector iterators are basically fancy wrapped pointers, and have to follow the same semantics.
The C++ standard does not explicitly mandate an implementation that looks like this, but it was based off an implementation that looks like this, and it implicitly requires one that is almost identical to this (I'm sure someone could come up with a clever standard compliant vector that doesn't look like this; but every vector in every standard library I have seen has looked like this.)

Is it possible to have an array of smart pointers that automatically updates its values with their index?

In C++, is there a way to write an array of smart pointers that automatically updates the pointed-to values with their index in the array? The pointed-to values have a member to store the index, similar to an intrusive refcount.
I am interested in writing a heap with updatable priorities. If the values in the heap were always updated to point to their index inside the heap storage, without special knowledge inside the heap algorithm, it would be easy to follow that link back into the heap when changing the value's priority. Knowing the position of the changed item, the heap invariant could then be quickly restored.
This is my attempt at a basic implementation. I would prefer to parameterize Containers reference to the global array without making instances larger than one pointer, and it would be good to improve safety. It would be more useful if it was also a random access iterator.
class Contained {
public:
uintptr_t index;
};
class Container {
public:
Contained *value;
Container& operator=(Container& other);
};
Container foobars[4];
Container& Container::operator=(Container& other) {
this->value = other.value;
this->value->index = ((uintptr_t)this - (uintptr_t)foobars) / sizeof(this->value);
return *this;
}

Generic vector class, make 0 size reference to generic array members;

I have a simple template vector class like this:
template <typename T, size_t N>
class Vec {
public:
T v[N];
//T const& x = v[0];
...
}
Can I make references to the array members without size cost? Becuse if I write the commented out code, it will allocate the size for the pointer, is there a workaround or a #define or some kind of magic?
No, there is no way to add a reference-type member to a class for 0 size cost. A reference is just a fancier, safer, and more convenient pointer. It still points to some specific memory location and needs to store the address of that location.
Can I make references to the array members without size cost?
Yes. References with automatic storage duration do not (always) need to require storage. Depending on the case, they may need to be stored on the stack, but will not grow the size of Vec. So, you can use a function that returns the reference:
T const& Vec::first() const { return v[0]; }
Incidentally, std::vector and other containers also provide similar functionality.

STL: Stores references or values?

I've always been a bit confused about how STL containers (vector, list, map...) store values. Do they store references to the values I pass in, or do they copy/copy construct +store the values themselves?
For example,
int i;
vector<int> vec;
vec.push_back(i);
// does &(vec[0]) == &i;
and
class abc;
abc inst;
vector<abc> vec;
vec.push_back(inst);
// does &(vec[0]) == &inst;
Thanks
STL Containers copy-construct and store values that you pass in. If you want to store objects in a container without copying them, I would suggest storing a pointer to the object in the container:
class abc;
abc inst;
vector<abc *> vec;
vec.push_back(&inst);
This is the most logical way to implement the container classes to prevent accidentally storing references to variables on defunct stack frames. Consider:
class Widget {
public:
void AddToVector(int i) {
v.push_back(i);
}
private:
vector<int> v;
};
Storing a reference to i would be dangerous as you would be referencing the memory location of a local variable after returning from the method in which it was defined.
That depends on your type. If it's a simple value type, and cheap to copy, then storing values is probably the answer. On the other hand, if it's a reference type, or expensive to copy, you'd better store a smart pointer (not auto_ptr, since its special copy semantics prevent it from being stored in a container. Go for a shared_ptr). With a plain pointer you're risking memory leakage and access to freed memory, while with references you're risking the latter. A smart pointer avoids both.