I want to implement an array that can increment as new values are added. Just like in Java. I don't have any idea of how to do this. Can anyone give me a way ?
This is done for learning purposes, thus I cannot use std::vector.
Here's a starting point: you only need three variables, nelems, capacity and a pointer to the actual array. So, your class would start off as
class dyn_array
{
T *data;
size_t nelems, capacity;
};
where T is the type of data you want to store; for extra credit, make this a template class. Now implement the algorithms discussed in your textbook or on the Wikipedia page on dynamic arrays.
Note that the new/delete allocation mechanism does not support growing an array like C's realloc does, so you'll actually be moving data's contents around when growing the capacity.
I would like to take the opportunity to interest you in an interesting but somewhat difficult topic: exceptions.
If you start allocating memory yourself and subsequently playing with raw pointers, you will find yourself in the difficult position of avoiding memory leaks.
Even if you are entrusting the book-keeping of the memory to a right class (say std::unique_ptr<char[]>), you still have to ensure that operations that change the object leave it in a consistent state should they fail.
For example, here is a simple class with an incorrect resize method (which is at the heart of most code):
template <typename T>
class DynamicArray {
public:
// Constructor
DynamicArray(): size(0), capacity(0), buffer(0) {}
// Destructor
~DynamicArray() {
if (buffer == 0) { return; }
for(size_t i = 0; i != size; ++i) {
T* t = buffer + i;
t->~T();
}
free(buffer); // using delete[] would require all objects to be built
}
private:
size_t size;
size_t capacity;
T* buffer;
};
Okay, so that's the easy part (although already a bit tricky).
Now, how do you push a new element at the end ?
template <typename T>
void DynamicArray<T>::resize(size_t n) {
// The *easy* case
if (n <= size) {
for (; n < size; ++n) {
(buffer + n)->~T();
}
size = n;
return;
}
// The *hard* case
// new size
size_t const oldsize = size;
size = n;
// new capacity
if (capacity == 0) { capacity = 1; }
while (capacity < n) { capacity *= 2; }
// new buffer (copied)
try {
T* newbuffer = (T*)malloc(capacity*sizeof(T));
// copy
for (size_t i = 0; i != oldsize; ++i) {
new (newbuffer + i) T(*(buffer + i));
}
free(buffer)
buffer = newbuffer;
} catch(...) {
free(newbuffer);
throw;
}
}
Feels right no ?
I mean, we even take care of a possible exception raised by T's copy constructor! yeah!
Do note the subtle issue we have though: if an exception is thrown, we have changed the size and capacity members but still have the old buffer.
The fix is obvious, of course: we should first change the buffer, and then the size and capacity. Of course...
But it is "difficult" to get it right.
I would recommend using an alternative approach: create an immutable array class (the capacity should be immutable, not the rest), and implement an exception-less swap method.
Then, you'll be able to implement the "transaction-like" semantics much more easily.
An array which grows dynamically as we add elements are called dynamic array, growable array, and here is a complete implementation of a dynamic array .
In C and C++ array notation is basically just short hand pointer maths.
So in this example.
int fib [] = { 1, 1, 2, 3, 5, 8, 13};
This:
int position5 = fib[5];
Is the same thing as saying this:
int position5 = int(char*(fib)) + (5 * sizeof(int));
So basically arrays are just pointers.
So if you want to auto allocate you will need to write some wrapper functions to call malloc() or new, ( C and C++ respectively).
Although you might find vectors are what you are looking for...
Related
Say for example I have a function that takes some argument and a size_t length to initialize an array on stack inside a function.
Considering the following:
Strictly the length can only be on the range of 1 to 30 (using a fixed max buffer length of 30 is not allowed).
The array only stays inside the function and is only used to compute a result.
int foo(/*some argument, ..., ... */ size_t length) {
uint64_t array[length];
int some_result = 0;
// some code that uses the array to compute something ...
return some_result;
}
In normal cases I would use an std::vector, new or *alloc functions for this but... I'm trying to optimize since this said function is being repeatedly called through out the life time of the program, making the heap allocations a large overhead.
Initially using an array on stack with fixed size is the solution that I have come up with, but I cannot do this, for some reasons that I cannot tell since it would be rude.
Anyway I wonder If I can get away with this approach without encountering any problem in the future?
In the rare cases where I've done some image processing with large fixed sized temp buffers or just wanted to avoid the runtime for redundant alloc/free calls, I've made my own heap.
It doesn't make a lot of sense for small allocations, where you could just use the stack, but you indicated your instructor said not to do this. So you could try something like this:
template<typename T>
struct ArrayHeap {
unordered_map<size_t, list<shared_ptr<T[]>>> available;
unordered_map<uint64_t*, pair<size_t, shared_ptr<T[]>>> inuse;
T* Allocate(size_t length) {
auto &l = available[length];
shared_ptr<T[]> ptr;
if (l.size() == 0) {
ptr.reset(new T[length]);
} else {
ptr = l.front();
l.pop_front();
}
inuse[ptr.get()] = {length, ptr};
return ptr.get();
}
void Deallocate(T* allocation) {
auto itor = inuse.find(allocation);
if (itor == inuse.end()) {
// assert
} else {
auto &p = itor->second;
size_t length = p.first;
shared_ptr<T[]> ptr = p.second;
inuse.erase(allocation);
// optional - you can choose not to push the pointer back onto the available list
// if you have some criteria by which you want to reduce memory usage
available[length].push_back(ptr);
}
}
};
In the above code, you can Allocate a buffer of a specific length. The first time invoked for a given length value, it will incur the overhead of allocating "new". But when the buffer is returned to the heap, the second allocation for the buffer of the same length, it will be fast.
Then your function can be implemented like this:
ArrayHeap<uint64_t> global_heap;
int foo(/*some argument, ..., ... */ size_t length) {
uint64_t* array = global_heap.Allocate(length);
int some_result = 0;
// some code that uses the array to compute something ...
global_heap.Deallocate(array);
return some_result;
}
Personally I would use a fixed size array on the stack, but if there are reasons to prohibit that then check if there are any against the alloca() method.
man 3 alloca
I've searched for similar questions but I couldn't find any satisfactory for my needs.
I'm a Computer Science student currently studying Algorithms and Data Structures. For my exam, I had to implement a collection of templatized data structures in C++. I was not allowed to use the STL as this was an exam question as to how to implement a library similar to the STL.
My implementation works however I would like to ask you for an advice about dynamic memory allocation.
Some of these data structures use a dynamic array (actually a raw pointer) to store elements, which automatically grows when full and shrinks under a certain load factor threshold (doubling and halving its size, respectively). For the sake of simplicity (and also because I'm not supposed to use them), I didn't use any of "modern stuff" such as smart pointers or move constructor/operator=, and basically I relied on C++98 features. I used new [ ] and delete [ ], but I read everywhere that it is a bad practice.
My question is: what is the proper way to handle dynamic memory allocation for array-based data structures in C++?
Here's an example of what I did (the array has been previously allocated by new [ ]):
template <typename T>
void ArrayList<T>::pushBack(T item)
{
if (size < capacity) { // if there's room in the array
array[size] = item; // simply add the new item
} else { // otherwise allocate a bigger array
capacity *= 2;
T *temp = new T[capacity];
// copy elements from the old array to the new one
for (int i = 0; i < size; ++i)
temp[i] = array[i];
delete [] array;
temp[size] = item;
array = temp;
}
++size;
}
No, you still don't need new and delete. The only reason to still use new in C++ is to perform aggregate initialization, which std::make_unique does not support, and you never need delete at all.
Your code sample then becomes:
template <typename T>
void ArrayList<T>::pushBack(T item)
{
if (size < capacity) { // if there's room in the array
array[size] = item; // simply add the new item
} else { // otherwise allocate a bigger array
capacity *= 2;
auto temp = std::make_unique<T[]>(capacity);
// copy elements from the old array to the new one
for (int i = 0; i < size; ++i)
temp[i] = array[i];
temp[size] = item;
array = std::move(temp);
}
++size;
}
Which can also be factored down by swapping the two sections:
template <typename T>
void ArrayList<T>::pushBack(T item)
{
if (size >= capacity) { // if there's no room in the array, reallocate
capacity *= 2;
auto temp = std::make_unique<T[]>(capacity);
// copy elements from the old array to the new one
for (int i = 0; i < size; ++i)
temp[i] = array[i];
temp[size] = item;
array = std::move(temp);
}
array[size] = item; // simply add the new item
++size;
}
Further possible improvements: move the elements when reallocating instead of copying them, use a standard algorithm instead of the manual for loop.
I believe, for this project, that it is indeed proper to use new and delete; my Data Structures teacher uses the very same style of memory allocation. Seemingly, the general reason that people disapprove of using allocated memory, is that it can be hard to manage properly. It's just important to remember to delete all memory that you are no longer using -- don't want to have any orphaned RAM on your hands!
I am trying to implement a thread-safe lockless container, analogous to std::vector, according to this https://software.intel.com/en-us/blogs/2008/07/24/tbbconcurrent_vector-secrets-of-memory-organization
From what I understood, to prevent re-allocations and invalidating all iterators on all threads, instead of a single contiguous array, they add new contiguous blocks.
Each block they add is with a size of increasing powers of 2, so they can use log(index) to find the proper segment where an item at [index] is supposed to be.
From what I gather, they have a static array of pointers to segments, so they can quickly access them, however they don't know how many segments the user wants, so they made a small initial one and if the amount of segments exceeds the current count, they allocate a huge one and switch to using that one.
The problem is, adding a new segment can't be done in a lockless thread safe manner or at least I haven't figured out how. I can atomically increment the current size, but only that.
And also switching from the small to the large array of segment pointers involves a big allocation and memory copies, so I can't understand how they are doing it.
They have some code posted online, but all the important functions are without available source code, they are in their Thread Building Blocks DLL. Here is some code that demonstrates the issue:
template<typename T>
class concurrent_vector
{
private:
int size = 0;
int lastSegmentIndex = 0;
union
{
T* segmentsSmall[3];
T** segmentsLarge;
};
void switch_to_large()
{
//Bunch of allocations, creates a T* segmentsLarge[32] basically and reassigns all old entries into it
}
public:
concurrent_vector()
{
//The initial array is contiguous just for the sake of cache optimization
T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3
segmentsSmall[0] = initialContiguousBlock;
segmentsSmall[1] = initialContiguousBlock + 2;
segmentsSmall[2] = initialContiguousBlock + 2 + 4;
}
void push_back(T& item)
{
if(size > 2 + 4 + 8)
{
switch_to_large(); //This is the problem part, there is no possible way to make this thread-safe without a mutex lock. I don't understand how Intel does it. It includes a bunch of allocations and memory copies.
}
InterlockedIncrement(&size); //Ok, so size is atomically increased
//afterwards adds the item to the appropriate slot in the appropriate segment
}
};
I would not try to make the segmentsLarge and segmentsSmall a union. Yes this wastes one more pointer. Then the pointer, lets call it just segments can initially point to segmentsSmall.
On the other hand the other methods can always use the same pointer which makes them simpler.
And switching from small to large can be accomplished by one compare exchange of a pointer.
I am not sure how this could be accomplished safely with a union.
The idea would look something like this (note that I used C++11, which the Intel library predates, so they likely did it with their atomic intrinsics).
This probably misses quite a few details which I am sure the Intel people have thought more about, so you will likely have to check this against the implementations of all other methods.
#include <atomic>
#include <array>
#include <cstddef>
#include <climits>
template<typename T>
class concurrent_vector
{
private:
std::atomic<size_t> size;
std::atomic<T**> segments;
std::array<T*, 3> segmentsSmall;
unsigned lastSegmentIndex = 0;
void switch_to_large()
{
T** segmentsOld = segments;
if( segmentsOld == segmentsSmall.data()) {
// not yet switched
T** segmentsLarge = new T*[sizeof(size_t) * CHAR_BIT];
// note that we leave the original segment allocations alone and just copy the pointers
std::copy(segmentsSmall.begin(), segmentsSmall.end(), segmentsLarge);
for(unsigned i = segmentsSmall.size(); i < numSegments; ++i) {
segmentsLarge[i] = nullptr;
}
// now both the old and the new segments array are valid
if( segments.compare_exchange_strong(segmentsOld, segmentsLarge)) {
// success!
return;
} else {
// already switched, just clean up
delete[] segmentsLarge;
}
}
}
public:
concurrent_vector() : size(0), segments(segmentsSmall.data())
{
//The initial array is contiguous just for the sake of cache optimization
T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3
segmentsSmall[0] = initialContiguousBlock;
segmentsSmall[1] = initialContiguousBlock + 2;
segmentsSmall[2] = initialContiguousBlock + 2 + 4;
}
void push_back(T& item)
{
if(size > 2 + 4 + 8) {
switch_to_large();
}
// here we may have to allocate more segments atomically
++size;
//afterwards adds the item to the appropriate slot in the appropriate segment
}
};
In order to use placement new instead of automatically attempting to call the default constructor, I'm allocating an array using reinterpret_cast<Object*>(new char[num_elements * sizeof(Object)]) instead of new Object[num_elements].
However, I'm not sure how I should be deleting the array so that the destructors get called correctly. Should I loop through the elements, call the destructor manually for each element, and then cast the array to a char* and use delete[] on that, like this:
for (size_t i = 0; i < num_elements; ++i) {
array[i].~Object();
}
delete[] reinterpret_cast<char*>(array);
Or is it sufficient if I don't call the destructor manually for each element, and simply rely on delete[] to do that since the type of the array is Object*, like delete[] array?
What I'm worried about, is that not every platform might be able to determine the amount of elements in the array correctly that way, because I didn't allocate the array using a type of the right size. An answer to a question about "how delete[] knows the size of the operand" suggests that a possible implementation of delete[] would be to store the number of allocated elements (rather than the amount of bytes).
If delete[] is indeed implemented that way, that would suggest that using just delete[] array would try to delete too many elements, because the array was created with more char elements than how many Object elements fit in it. So in that case, the only reliable way to delete the array would be to manually call the destructors, cast the array to a char*, and then use delete[].
However, another logical way to implement it would be to store the size of the array in bytes, rather than the amount of elements, and then when calling delete[], divide the size of the array by the size of the type to get the amount of elements to call the destructor of. If this method is used, then just using delete[] array where array has a type of Object* would be sufficient.
So my question is: can I rely on delete[] to correctly call the destructors of the elements in the operand array, if the array was originally not allocated with the right type?
This is the code I'm using:
template <typename NumberType>
NeuronLayer<NumberType>::NeuronLayer(size_t num_inputs, size_t num_neurons, const NumberType *weights)
: neurons(reinterpret_cast<Neuron<NumberType>*>(new char[num_neurons * sizeof(Neuron<NumberType>)])),
num_neurons(num_neurons), num_weights(0) {
for (size_t i = 0; i < num_neurons; ++i) {
Neuron<NumberType> &neuron = neurons[i];
new(&neuron) Neuron<NumberType>(num_inputs, weights + num_weights);
num_weights += neuron.GetNumWeights();
}
}
and
template <typename NumberType>
NeuronLayer<NumberType>::~NeuronLayer() {
delete[] neurons;
}
or
template <typename NumberType>
NeuronLayer<NumberType>::~NeuronLayer() {
for (size_t i = 0; i < num_neurons; ++i) {
neurons[i].~Neuron();
}
delete[] reinterpret_cast<char*>(neurons);
}
Calling delete[] on an Object* will call the destructor once for every object allocated by new[]. new Object[N] typically stores N before the actual array, and delete[] certainly knows where to look.
Your code doesn't store that count. And it can't, since it's an unspecified implementation detail where and how the count is stored. As you speculate, there are two obvious ways: element count and array size, and one obvious location (before the array). Even so, there could be alignment issues, and you can't predict what type is used for the size.
Also, new unsigned char[N] is a special case since delete[] doesn't need to call destructors of char. In that case new[] doesn't need to store N at all. So you can't even bank on that size being stored, even if new Object[N] would have stored a size.
Here is portable code that manages a dynamic array of objects. It's essentially std::vector:
void * addr = ::operator new(sizeof(Object) * num_elements);
Object * p = static_cast<Object *>(addr);
for (std::size_t i = 0; i != num_elements; ++i)
{
::new (p + i) Object(/* some initializer */);
}
// ...
for (std::size_t i = 0; i != num_elements; ++i)
{
std::size_t ri = num_elements - i - 1;
(p + ri)->~Object();
}
::operator delete(addr);
This is general pattern how you should organize dynamic storage if you want to have very low-level control. The upshot is that dynamic arrays should never have been a language feature and are much better implemented in library. As I said above, this code is pretty much identical to the existing standard library gadget called std::vector<Object>.
I wrote a program, which computes the flow shop scheduling problem.
I need help with optimizing the slowest parts of my program:
Firstly there is array 2D array allocation:
this->_perm = new Chromosome*[f];
//... for (...)
this->_perm[i] = new Chromosome[fM1];
It works just fine, but a problem occurs, when I try to delete array:
delete [] _perm[i];
It takes extremely long to execute line above. Chromosome is array of about 300k elements - allocating it takes less than a second but deleting takes far more than a minute.
I would appreciate any suggestions of improving delete part.
On a general note, you should never manually manage memory in C++. This will lead to leaks, double-deletions and all kinds of nasty inconveniences. Use proper resource-handling classes for this. For example, std::vector is what you should use for managing a dynamically allocated array.
To get back to your problem at hand, you first need to know what delete [] _perm[i] does: It calls the destructor for every Chromosome object in that array and then frees the memory. Now you do this in a loop, which means this will call all Chromosome destructors and perform f deallocations. As was already mentioned in a comment to your question, it is very likely that the Chromosome destructor is the actual culprit. Try to investigate that.
You can, however, change your memory handling to improve the speed of allocation and deallocation. As Nawaz has shown, you could allocate one big chunk of memory and use that. I'd use a std::vector for a buffer:
void f(std::size_t row, std::size_t col)
{
int sizeMemory = sizeof(Chromosome) * row * col;
std::vector<unsigned char> buffer(sizeMemory); //allocation of memory at once!
vector<Chromosome*> chromosomes(row);
// use algorithm as shown by Nawaz
std::size_t j = 0 ;
for(std::size_t i = 0 ; i < row ; i++ )
{
//...
}
make_baby(chromosomes); //use chromosomes
in_place_destruct(chromosomes.begin(), chromosomes.end());
// automatic freeing of memory holding pointers in chromosomes
// automatic freeing of buffer memory
}
template< typename InpIt >
void in_place_destruct(InpIt begin, InpIt end)
{
typedef std::iterator_traits<InpIt>::value_type value_type; // to call dtor
while(begin != end)
(begin++)->~value_type(); // call dtor
}
However, despite handling all memory through std::vector this still is not fully exception-safe, as it needs to call the Chromosome destructors explicitly. (If make_baby() throws an exception, the function f() will be aborted early. While the destructors of the vectors will delete their content, one only contains pointers, and the other treats its content as raw memory. No guard is watching over the actual objects created in that raw memory.)
The best solution I can see is to use a one-dimensional arrays wrapped in a class that allows two-dimensional access to the elements in that array. (Memory is one-dimensional, after all, on current hardware, so the system is already doing this.) Here's a sketch of that:
class chromosome_matrix {
public:
chromosome_matrix(std::size_t row, std::size_t col)
: row_(row), col_(col), data_(row*col)
{
// data_ contains row*col constructed Chromosome objects
}
// note needed, compiler generated dtor will do the right thing
//~chromosome_matrix()
// these rely on pointer arithmetic to access a column
Chromosome* operator[](std::size_t row) {return &data_[row*col_];}
const Chromosome* operator[](std::size_t row) const {return &data_[row*col_];}
private:
std::size_t row_;
std::size_t col_;
std::vector<chromosomes> data_
};
void f(std::size_t row, std::size_t col)
{
chromosome_matrix cm(row, col);
Chromosome* column = ch[0]; // get a whole column
Chromosome& chromosome1 = column[0]; // get one object
Chromosome& chromosome2 = cm[1][2]; // access object directly
// make baby
}
check your destructors.
If you were allocating a built-in type (eg an int) then allocating 300,000 of them would be more expensive than the corresponding delete. But that's a relative term, 300k allocated in a single block is pretty fast.
As you're allocating 300k Chromosomes, the allocator has to allocate 300k * sizeof the Chromosome object, and as you say its fast - I can't see it doing much beside just that (ie the constructor calls are optimised into nothingness)
However, when you come to delete, it not only frees up all that memory, but it also calls the destructor for each object, and if its slow, I would guess that the destructor for each object takes a small, but noticeable, time when you have 300k of them.
I would suggest you to use placement new. The allocation and deallocation can be done just in one statement each!
int sizeMemory = sizeof(Chromosome) * row * col;
char* buffer = new char[sizeMemory]; //allocation of memory at once!
vector<Chromosome*> chromosomes;
chromosomes.reserve(row);
int j = 0 ;
for(int i = 0 ; i < row ; i++ )
{
//only construction of object. No allocation!
Chromosome *pChromosome = new (&buffer[j]) Chromosome[col];
chromosomes.push_back(pChromosome);
j = j+ sizeof(Chromosome) * col;
}
for(int i = 0 ; i < row ; i++ )
{
for(int j = 0 ; j < col ; j++ )
{
//only destruction of object. No deallocation!
chromosomes[i][j].~Chromosome();
}
}
delete [] buffer; //actual deallocation of memory at once!
std::vector can help.
Special memory allocators too.