"Moving" sequential containers to pointers - c++

I'm building a buffer for network connections where you can explicitly allocate memory or you can supply it on your own via some sequential container(eg.:std::vector,std::array)these memory chunks are stored in a list what we use later for read/write operations. (the chunks are needed for handle multiple read/write requests)
I have a question about the last part, I want to make a pointer to the container's data and then tell the container to not care about it's data anymore.
So something like move semantics.
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
std::vector<int> _v(std::move(v));
Where _v has all the values of v and v left in a safe state.
The problem is if I just make a pointer for v.data() after the lifetime of the container ends, the data pointed by the pointer releases with the container.
For example:
// I would use span to make sure It's a sequential container
// but for simplicity i use a raw pointer
// gsl::span<int> s;
int *p;
{
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
// s = gsl::make_span(v);
p = v.data();
}
for(int i = 0; i < 10; ++i)
std::cout << p[i] << " ";
std::cout << std::endl;
Now p contains some memory trash and i would need the memory previously owned by the vector.
I also tried v.data() = nullptr but v.data() is rvalue so it's not possible to assign it. Do you have any suggestions, or is this possible?
edit.:
To make it more clear what i'm trying to achieve:
class readbuf_type
{
struct item_type // representation of a chunk
{
uint8_t * const data;
size_t size;
inline item_type(size_t psize)
: size(psize)
, data(new uint8_t[psize])
{}
template <std::ptrdiff_t tExtent = gsl::dynamic_extent>
inline item_type(gsl::span<uint8_t,tExtent> s)
: size(s.size())
, data(s.data())
{}
inline ~item_type()
{ delete[] data; }
};
std::list<item_type> queue; // contains the memory
public:
inline size_t read(uint8_t *buffer, size_t size); // read from queue
inline size_t write(const uint8_t *buffer, size_t size); // write to queue
inline void *get_chunk(size_t size)
{
queue.emplace_back(size);
return queue.back().data;
}
template <std::ptrdiff_t tExtent = gsl::dynamic_extent>
inline void put_chunk(gsl::span<uint8_t,tExtent> arr)
{
queue.emplace_back(arr);
}
} readbuf;
I have the get_chunkfunction what basically just allocates memory with the size, and I have put_chunk what I'm struggling with, the reason i need this because before you can write to this queue you need to allocate memory and then copy all the elements from the buffer(vector,array) you're trying to write from to the queue.
Something like:
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
// instead of this
readbuf.get_chunk(v.size);
readbuf.write(v.data(), v.size());
// we want this
readbuf.put_chunk({v});
Since we're developing for distributed systems memory is crucial and that's why we want to avoid the unnecessary allocation, copying.
ps.This is my first post, so sorry if i wasn't precise in the first place..

No, it is not possible to "steal" the buffer of the standard vector in the manner that you suggest - or any other standard container for that matter.
You've already shown one solution: Move the buffer into another vector, instead of merely taking the address (or another non-owning reference) of the buffer. Moving from the vector transfers the ownership of the internal buffer.
It would be possible to implement such custom vector class, whose buffer could be stolen, but there is a reason why vector doesn't make it possible. It can be quite difficult to prove the correctness of your program if you release resources willy-nilly. Have you considered how to prevent the data from leaking? The solution above is much simpler and easier to verify for correctness.
Another approach is to re-structure your program in such way that no references to the data of your container outlive the container itself (or any invalidating operation).

Unfortunately the memory area of the vector cannot be detached from the std::vector object. The memory area can be deleted even if you insert some data to the std::vector object. Therefore use of this memory area later is not safe, unless you are sure that this particular std::vector object exists and is not modified.
The solution to this problem is to allocate a new memory area and copy the content of the vector to this newly allocated memory area. The newly allocated memory area can be safely accessed without worrying about the state of the std::vector object.
std::vector<int> v = {1, 2, 3, 4};
int* p = new int[v.size()];
memcpy(p, v.data(), sizeof(int) * v.size());
Don't forget to delete the memory area after you are finished using this memory area.
delete [] p;

Your mistake is in thinking that the pointer "contains" memory. It doesn't contain anything, trash or ints or otherwise. It is a pointer. It points to stuff. You have deleted that stuff and not transferred it anywhere else, so it can't work any more.
In general, you will need a container to put this information in, be it another vector, or even your own hand-made array. Just having a pointer to data does not mean you have data.
Furthermore, since it is impossible to ask a vector to relinquish its buffer to a non-vector thing, a vector is really your only chance in this particular case. It's not quite clear why that's not a good enough solution for you. :)

Not sure what you try to achieve but I would use moving semantic like this:
#include <iostream>
#include <memory>
#include <vector>
int main() {
std::unique_ptr<std::vector<int>> p;
{
std::vector<int> v = {9,8,7,6,5,4,3,2,1,0};
p = std::move(make_unique<std::vector<int>>(v));
}
for(int i = 0; i < 10; ++i)
std::cout << (*p)[i] << " ";
std::cout << std::endl;
return 0;
}

Related

How to deallocate a portion of a continuous memory block?

let's assume I have a structure MyStruct and i want to allocate a "big" chunk of memory like this:
std::size_t memory_chunk_1_size = 10;
MyStruct * memory_chunk_1 = reinterpret_cast <MyStruct *> (new char[memory_chunk_1_size * sizeof(MyStruct)]);
and because of the "arbitrary reasons" I would like to "split" this chunk of memory into two smaller chunks without; moving data, resizing the "dynamic array", deallocating/allocating/reallocating memory, etc.
so I am doing this:
std::size_t memory_chunk_2_size = 5; // to remember how many elements there are in this chunk;
MyStruct * memory_chunk_2 = &memory_chunk_1[5]; // points to the 6th element of memory_chunk_1;
memory_chunk_1_size = 5; // to remember how many elements there are in this chunk;
memory_chunk_1 = memory_chunk_1; // nothing changes still points to the 1st element.
Unfortunately, when I try to deallocate the memory, I'm encountering an error:
// release memory from the 2nd chunk
for (int i = 0; i < memory_chunk_2_size ; i++)
{
memory_chunk_2[i].~MyStruct();
}
delete[] reinterpret_cast <char *> (memory_chunk_2); // deallocates memory from both "memory_chunk_2" and "memory_chunk_1"
// release memory from the 1st chunk
for (int i = 0; i < memory_chunk_1_size ; i++)
{
memory_chunk_1[i].~MyStruct(); // Throws exception.
}
delete[] reinterpret_cast <char *> (memory_chunk_1); // Throws exception. This part of the memory was already dealocated.
How can I delete only a selected number of elements (to solve this error)?
Compilable example:
#include <iostream>
using namespace std;
struct MyStruct
{
int first;
int * second;
void print()
{
cout << "- first: " << first << endl;
cout << "- second: " << *second << endl;
cout << endl;
}
MyStruct() :
first(-1), second(new int(-1))
{
cout << "constructor #1" << endl;
print();
}
MyStruct(int ini_first, int ini_second) :
first(ini_first), second(new int(ini_second))
{
cout << "constructor #2" << endl;
print();
}
~MyStruct()
{
cout << "destructor" << endl;
print();
delete second;
}
};
int main()
{
// memory chunk #1:
std::size_t memory_chunk_1_size = 10;
MyStruct * memory_chunk_1 = reinterpret_cast <MyStruct *> (new char[memory_chunk_1_size * sizeof(MyStruct)]);
// initialize:
for (int i = 0; i < memory_chunk_1_size; i++)
{
new (&memory_chunk_1[i]) MyStruct(i,i);
}
// ...
// Somewhere here I decided I want to have two smaller chunks of memory instead of one big,
// but i don't want to move data nor reallocate the memory:
std::size_t memory_chunk_2_size = 5; // to remember how many elements there are in this chunk;
MyStruct * memory_chunk_2 = &memory_chunk_1[5]; // points to the 6th element of memory_chunk_1;
memory_chunk_1_size = 5; // to remember how many elements there are in this chunk;
memory_chunk_1 = memory_chunk_1; // nothing changes still points to the 1st element.
// ...
// some time later i want to free memory:
// release memory from the 2nd chunk
for (int i = 0; i < memory_chunk_2_size ; i++)
{
memory_chunk_2[i].~MyStruct();
}
delete[] reinterpret_cast <char *> (memory_chunk_2); // deallocates memory from both "memory_chunk_2" and "memory_chunk_1"
// release memory from the 1st chunk
for (int i = 0; i < memory_chunk_1_size ; i++)
{
memory_chunk_1[i].~MyStruct(); // Throws exception.
}
delete[] reinterpret_cast <char *> (memory_chunk_1); // Throws exception. This part of the memory was already dealocated.
// exit:
return 0;
}
This kind of selective deallocation is not supported by the C++ programming language, and is probably never going to be supported.
If you intend to deallocate individual portions of memory, those individual portions need to be individually allocated in the first place.
It is possible that a specific OS or platform might support this kind of behavior, but it would be with OS-specific system calls, not through C++ standard language syntax.
Memory allocated with malloc or new cannot be partially deallocate. Many heaps use bins of different sized allocations for performance and to prevent fragmentation so allowing partial frees would make such a strategy impossible.
That of course does not prevent you writing your own allocator.
The simplest way I could think of by means of standard c++ would follow this idiomatic code:
std::vector<int> v1(1000);
auto block_start = v1.begin() + 400;
auto block_end = v1.begin() + 500;
std::vector<int> v2(block_start,block_end);
v1.erase(block_start,block_end);
v1.shrink_to_fit();
If a compiler is intelligent enough to translate such pattern to the most efficient low level OS and CPU memory management operations, is an implementation detail.
Here's the working example.
Let's be honest: this is a very bad practice ! Trying to cast new and delete and in addition call yourself destructor between the two is an evidence of low-level manual memory management.
Alternatives
The proper way to manage dynamic memory structures in contiguous blocks in C++ is to use std::vector instead of manual arrays or manual memory management, and let the library proceed. You can resize() a vector to delete the unneeded elements. You can shrink_to_fit() to say that you no longer need the extra free capacity, although it's not guaranteed that unneeded memory is released.
The use of C memory allocation functions and in particular realloc() is to be avoided, as it is very error prone, and it works only with trivially copiable objects.
Edit: Your own container
If you want to implement your own special container and must allows this kind of dynamic behaviour, due to unusual special constraints, then you should consider writing your own memory management function that would manage a kind of "private heap".
Heap management is often implement via a linked list of free chunks.
One strategy could be to allocate a new chunk when there's no sufficient contiguous memory left in your private heap. You could then offer a more permissive myfree() function that reinserts a freed or partially freed chunk into that linked list. Of course, this requires to iterate through the linked list to find if the released memory is contiguous to any other chunk of free memory in the private heap, and merge the blocks if adjacent.
I see that MyStruct is very small. Another approach could then be to write a special allocation function optimised for small fixed size blocks. There is an example in Loki's small object library that is described in depth in "Modern C++ Design".
Finally, you could perhaps have a look at the Pool library of Boost, which also offers a chunk based approach.

Delete a vector without deleting the array

I'm trying to populate a vector of doubles in C++ and pass the associated array to Fortran. But I'm having trouble freeing the rest of the memory associated with the vector. I'd like to avoid copying. Here's what I have:
std::vector<double> *vec = new std::vector<double>();
(*vec).push_back(1.0);
(*vec).push_back(2.0);
*arr = (*vec).data(); //arr goes to Fortran
How do I delete vec while keeping arr intact? Is there a way to nullify the pointer to arr in vec so that I can then delete vec?
Update
I see that I didn't give enough information here. A couple things:
I'm actually calling a C++ function in Fortran using iso_c_binding
I don't know how large the vec needs to be. The vector class looks good for this situation
I might try Guillaume's suggestion eventually, but for now, I'm passing vec to the Fortran and calling another C++ function to delete it once I'm done with the data
You need to rethink your program design.
Somehow, somewhere, you need to keep an array alive while Fortran is using it. So whatever context you're using to access Fortran should probably be responsible for ownership of this array.
class fortran_context {
/*Blah blah blah whatever API you're using to access Fortran*/
void * arr;
std::vector<double> vec; //Don't allocate this as a pointer; makes no sense!
public:
fortran_context() {
arr = //Do whatever is necessary to setup Fortran stuff. I'm assuming your
//api has some kind of "get_array_pointer" function that you'll use.
}
~fortran_context() {
//Do the cleanup for the fortran stuff
}
//If you want to spend time figuring out a robust copy constructor, you may.
//Personally, I suspect it's better to just delete it, and make this object non-copyable.
fortran_context(fortran_context const&) = delete;
std::vector<double> & get_vector() {
return vec;
}
std::vector<double> const& get_vector() const {
return vec;
}
void assign_vector_to_array() {
*arr = vec.data();
}
void do_stuff_with_fortran() {
assign_vector_to_array();
//???
}
};
int main() {
fortran_context context;
auto & vec = context.get_vector();
vec.push_back(1.0);
vec.push_back(2.0);
context.do_stuff_with_fortran();
return 0;
} //Cleanup happens automatically due to correct implementation of ~fortran_context()
I've abstracted a lot of this because I don't know what API you're using to access Fortran, and I don't know what kind of work you're doing with this array. But this is, by far, the safest way to ensure that
The vector's allocated memory exists so long as you are doing stuff in Fortran
The memory associated with the vector will be cleaned up properly when you're done.
How do I delete vec while keeping arr intact? Is there a way to nullify the pointer to arr in vec so that I can then delete vec?
The library does not provide any built-in capability to do that. You have to do the bookkeeping work yourself.
Allocate memory for the data and copy data from the vector.
Send the data to FORTRAN.
Decide when it is safe to deallocate the data and then delete them.
// Fill up data in vec
std::vector<double> vec;
vec.push_back(1.0);
vec.push_back(2.0);
// Allocate memory for arr and copy the data from vec
double* arr = new double[vec.size()];
std::copy(vec.begin(), vec.end(), arr);
// use arr
// delete arr
delete [] arr;
What you are asking for is not possible. std::vector, being a well behaved template class will release the internals that it owns and manages when it is destroyed.
You will have to keep vector alive while you are using its contents, which makes perfect sense. Otherwise, you will have to make a copy.
Also, I don't see why you are allocating the vector on the heap, it doesn't seem needed at all.
How do I delete vec while keeping arr intact? Is there a way to nullify the pointer to arr in vec so that I can then delete vec?
You don't.
I think you misuse or misunderstood what vector is for. It is not meant to expose memory management of the underlying array, but to represent a dynamically sized array as a regular type.
If you need to explicitly manage memory, I'd suggest you to use std::unique_ptr<T[]>. Since unique pointers offers a way to manage memory and to release it's resource without deleting it, I think it's a good candidate to meet your needs.
auto arr = std::make_unique<double[]>(2);
arr[0] = 1.;
arr[1] = 2.;
auto data = arr.release();
// You have to manage `data` memory manually,
// since the unique pointer released it's resource.
// arr is null here
// data is a pointer to an array, must be deleted manually later.
delete[] data;

Does Pointer to Dynamic Container Persist?

I know dynamic data structures like vectors, lists, maps and sets allocate their elements on the heap autonomously, therefore I cannot take a pointer to an element and expect it to stay valid, if the data structure gets modified. But can I make a pointer to the structure itself and know that it will always stay valid? I would assume the structure has some kind of anchor in the stack, that will always have the same address or something, regardless of where its elements are allocated... ?
so, can I safely do something like this with STL dynamic containers?
int main()
{
std::set<int> s;
std::set<int>* s_ptr = &s;
for (int i = 0; i < 1000000; ++i)
{
s.insert(i);
}
std::cout << s_ptr->size() << std::endl;
}
In my test this did work. But because of UB I cannot rely on that.
Your usage of the pointer is safe. The pointer will be valid as long as s is alive. In this case, s will be alive until the end of the function.

How does an allocator create and destroy an array?

How does an allocator create and destroy and array, for example
int* someInt = someAllocator(3);
Where without the allocator it would just be
int* someInt = new int[3];
Where the allocator is responsible for create each element and ensuring the constructor will be called.
How is the internals for an allocator written without the use of new? Could someone provide and example of the function?
I do not want to just use std::vector as I am trying to learn how an allocator will create an array.
The problem of general memory allocation is a surprisingly tricky one. Some consider it solved and some unsolvable ;) If you are interested in internals, start by taking a look at Doug Lea's malloc.
The specialized memory allocators are typically much simpler - they trade the generality (e.g. by making the size fixed) for simplicity and performance. Be careful though, using general memory allocation is usually better than a hodge-podge of special allocators in realistic programs.
Once a block of memory is allocated through the "magic" of the memory allocator, it can be initialized at container's pleasure using placement new.
--- EDIT ---
The placement new is not useful for "normal" programming - you'd only need it when implementing your own container to separate memory allocation from object construction. That being said, here is a slightly contrived example for using placement new:
#include <new> // For placement new.
#include <cassert>
#include <iostream>
class A {
public:
A(int x) : X(x) {
std::cout << "A" << std::endl;
}
~A() {
std::cout << "~A" << std::endl;
}
int X;
};
int main() {
// Allocate a "dummy" block of memory large enough for A.
// Here, we simply use stack, but this could be returned from some allocator.
char memory_block[sizeof(A)];
// Construct A in that memory using placement new.
A* a = new(memory_block) A(33);
// Yup, it really is constructed!
assert(a->X == 33);
// Destroy the object, wihout freeing the underlying memory
// (which would be disaster in this case, since it is on stack).
a->~A();
return 0;
}
This prints:
A
~A
--- EDIT 2 ---
OK, here is how you do it for the array:
int main() {
// Number of objects in the array.
const size_t count = 3;
// Block of memory big enough to fit 'count' objects.
char memory_block[sizeof(A) * count];
// To make pointer arithmetic slightly easier.
A* arr = reinterpret_cast<A*>(memory_block);
// Construct all 3 elements, each with different parameter.
// We could have just as easily skipped some elements (e.g. if we
// allocated more memory than is needed to fit the actual objects).
for (int i = 0; i < count; ++i)
new(arr + i) A(i * 10);
// Yup, all of them are constructed!
for (int i = 0; i < count; ++i) {
assert(arr[i].X == i * 10);
}
// Destroy them all, without freeing the memory.
for (int i = 0; i < count; ++i)
arr[i].~A();
return 0;
}
BTW, if A had a default constructor, you could try call it on all elements like this...
new(arr) A[count];
...but this would open a can of worms you really wouldn't want to deal with.
I've written about it in my second example here:
How to create an array while potentially using placement new
The difference is that the t_allocator::t_array_record would be managed by the allocator rather than the client.

How is vector implemented in C++

I am thinking of how I can implement std::vector from the ground up.
How does it resize the vector?
realloc only seems to work for plain old stucts, or am I wrong?
it is a simple templated class which wraps a native array. It does not use malloc/realloc. Instead, it uses the passed allocator (which by default is std::allocator).
Resizing is done by allocating a new array and copy constructing each element in the new array from the old one (this way it is safe for non-POD objects). To avoid frequent allocations, often they follow a non-linear growth pattern.
UPDATE: in C++11, the elements will be moved instead of copy constructed if it is possible for the stored type.
In addition to this, it will need to store the current "size" and "capacity". Size is how many elements are actually in the vector. Capacity is how many could be in the vector.
So as a starting point a vector will need to look somewhat like this:
template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_;
typename A::size_type capacity_;
typename A::size_type size_;
A allocator_;
};
The other common implementation is to store pointers to the different parts of the array. This cheapens the cost of end() (which no longer needs an addition) ever so slightly at the expense of a marginally more expensive size() call (which now needs a subtraction). In which case it could look like this:
template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_; // points to first element
T* end_capacity_; // points to one past internal storage
T* end_; // points to one past last element
A allocator_;
};
I believe gcc's libstdc++ uses the latter approach, but both approaches are equally valid and conforming.
NOTE: This is ignoring a common optimization where the empty base class optimization is used for the allocator. I think that is a quality of implementation detail, and not a matter of correctness.
Resizing the vector requires allocating a new chunk of space, and copying the existing data to the new space (thus, the requirement that items placed into a vector can be copied).
Note that it does not use new [] either -- it uses the allocator that's passed, but that's required to allocate raw memory, not an array of objects like new [] does. You then need to use placement new to construct objects in place. [Edit: well, you could technically use new char[size], and use that as raw memory, but I can't quite imagine anybody writing an allocator like that.]
When the current allocation is exhausted and a new block of memory needs to be allocated, the size must be increased by a constant factor compared to the old size to meet the requirement for amortized constant complexity for push_back. Though many web sites (and such) call this doubling the size, a factor around 1.5 to 1.6 usually works better. In particular, this generally improves chances of re-using freed blocks for future allocations.
From Wikipedia, as good an answer as any.
A typical vector implementation consists, internally, of a pointer to
a dynamically allocated array,[2] and possibly data members holding
the capacity and size of the vector. The size of the vector refers to
the actual number of elements, while the capacity refers to the size
of the internal array. When new elements are inserted, if the new size
of the vector becomes larger than its capacity, reallocation
occurs.[2][4] This typically causes the vector to allocate a new
region of storage, move the previously held elements to the new region
of storage, and free the old region. Because the addresses of the
elements change during this process, any references or iterators to
elements in the vector become invalidated.[5] Using an invalidated
reference causes undefined behaviour
Like this:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h
(official gcc mirror on github)
///Implement Vector class
class MyVector {
int *int_arr;
int capacity;
int current;
public:
MyVector() {
int_arr = new int[1];
capacity = 1;
current = 0;
}
void Push(int nData);
void PushData(int nData, int index);
void PopData();
int GetData(int index);
int GetSize();
void Print();
};
void MyVector::Push(int data)
{
if (current == capacity){
int *temp = new int[2 * capacity];
for (int i = 0; i < capacity; i++)
{
temp[i] = int_arr[i];
}
delete[] int_arr;
capacity *= 2;
int_arr = temp;
}
int_arr[current] = data;
current++;
}
void MyVector::PushData(int data, int index)
{
if (index == capacity){
Push(index);
}
else
int_arr[index] = data;
}
void MyVector::PopData(){
current--;
}
int MyVector::GetData(int index)
{
if (index < current){
return int_arr[index];
}
}
int MyVector::GetSize()
{
return current;
}
void MyVector::Print()
{
for (int i = 0; i < current; i++) {
cout << int_arr[i] << " ";
}
cout << endl;
}
int main()
{
MyVector vect;
vect.Push(10);
vect.Push(20);
vect.Push(30);
vect.Push(40);
vect.Print();
std::cout << "\nTop item is "
<< vect.GetData(3) << std::endl;
vect.PopData();
vect.Print();
cout << "\nTop item is "
<< vect.GetData(1) << endl;
return 0;
}
It allocates a new array and copies everything over. So, expanding it is quite inefficient if you have to do it often. Use reserve() if you have to use push_back().
You'd need to define what you mean by "plain old structs."
realloc by itself only creates a block of uninitialized memory. It does no object allocation. For C structs, this suffices, but for C++ it does not.
That's not to say you couldn't use realloc. But if you were to use it (note you wouldn't be reimplementing std::vector exactly in this case!), you'd need to:
Make sure you're consistently using malloc/realloc/free throughout your class.
Use "placement new" to initialize objects in your memory chunk.
Explicitly call destructors to clean up objects before freeing your memory chunk.
This is actually pretty close to what vector does in my implementation (GCC/glib), except it uses the C++ low-level routines ::operator new and ::operator delete to do the raw memory management instead of malloc and free, rewrites the realloc routine using these primitives, and delegates all of this behavior to an allocator object that can be replaced with a custom implementation.
Since vector is a template, you actually should have its source to look at if you want a reference – if you can get past the preponderance of underscores, it shouldn't be too hard to read. If you're on a Unix box using GCC, try looking for /usr/include/c++/version/vector or thereabouts.
You can implement them with resizing array implementation.
When the array becomes full, create an array with twice as much the size and copy all the content to the new array. Do not forget to delete the old array.
As for deleting the elements from vector, do resizing when your array becomes a quarter full. This strategy makes prevents any performance glitches when one might try repeated insertion and deletion at half the array size.
It can be mathematically proved that the amortized time (Average time) for insertions is still linear for n insertions which is asymptotically the same as you will get with a normal static array.
realloc only works on heap memory. In C++ you usually want to use the free store.