Difference between an allocator and a built in array in c++? - c++

I have lately been trying to create custom containers that are similar to some of the library containers (i.e vector, list). and while I was using an allocator to allocate dynamic memory I noticed that the idea behind allocators and built in arrays are the same. allocators reserve a certain amount of raw, unconstructed dynamic memory and return a pointer to the first free location in that pool of memory. and built in arrays pretty much do the same thing. so if we have an std::allocator for strings called alloc
this codealloc.allocate(7) and this code string* array = new string[7] should have the same effect. and if we want to construct the raw memory we can call std::allocator::construct passing it the pointer returned from the allocate function, or we can have something like array[0] = string("something") to do the same thing. correct?
so what is there a difference between how an allocator work and how a built in array work?

You're right that they're fundamentally related, but not in that way. new string[7] could indeed be decomposed into allocate and construct (with a few extra bits for EH and other details).
Separating them out in the allocator interface allows much more fine-grained control for containers so that they can, for example, have memory with non-constructed objects in them, which is often vital for correct performance guarantees or semantics.
Additionally, The allocator interface is, of course, an interface with many possible implementations, such as memory arenas or object pools, which new string[7] really doesn't offer.
Finally, new T[] is shit and don't ever use it. The allocator interface is designed to be used only by fairly experienced programmers in quite limited ways- as a component of a better library component. new T[] is a language feature that everybody can just use, with terrible results.

An array is collinear container of slots for items in memory. The array is a range.
An allocator is an function object (or function) that reserves memory. The allocator can designate space from an array, stack, heap, or other areas of memory. The allocator can also be used to allocate space outside of the memory area, such as a hard drive or other device (maybe a server, cloud, etc.)
The space allocated for an array is usually determined by the compiler during the build phase.
An allocator is used for dynamic (during run-time) allocation of objects.

Related

Is it safe to "dissolve" c++ arrays on the heap?

I am currently implementing my own vector container and I encountered a pretty interesting Issue(At leas for me). It may be a stupid question but idk.
My vector uses an heap array of pointers to heap allocated objects of unknown type (T**).
I did this because I wanted the pointers and references to individual elements to stay same, even after resizing.
This comes at performance cost when constructing and copying, because I need to create the array on the heap and each object of the array on the heap too. (Heap allocation is slower than on the stack, right?)
T** arr = new *T[size]{nullptr};
and then for each element
arr[i] = new T{data};
Now I wonder if it would be safe, beneficial(faster) and possible, if instead of allocating each object individually, I could create a second array on the heap and save the pointer of each object in the first one.Then use (and delete) these objects later as if they were allocated separately.
=> Is allocating arrays on the heap faster than allocating each object individually?
=> Is it safe to allocate objects in an array and forgetting about the array later? (sounds pretty dumb i think)
Link to my github repo: https://github.com/LinuxGameGeek/personal/tree/main/c%2B%2B/vector
Thanks for your help :)
First a remark, you should not think of the comparison heap/stack in terms of efficiency, but on object lifetime:
automatic arrays (what you call on stack) end their life at the end of the block where they are defined
dynamic arrays (whay you call on heap) exists until they are explicitly deleted
Now it is always more efficient to allocate a bunch of objects in an array than to allocate them separately. You save a number of internal calls and various data structure to maintain the heap. Simply you can only deallocate the array and not the individual objects.
Finally, except for trivially copyable objects, only the compiler and not the programmer knows about the exact allocation detail. For example (and for common implementations) an automatic string (so on stack) contains a pointer to a dynamic char array (so on heap)...
Said differently, unless you plan to only use you container for POD or trivially copyable objects, do not expect to handle all the allocation and deallocation yourself: non trivial objects have internal allocations.
Heap allocation is slower than on the stack, right?
Yes. Dynamic allocation has a cost.
Is allocating arrays on the heap faster than allocating each object individually?
Yes. Multiple allocations have that cost multiplied.
I wonder if it would be ... possible, if instead of allocating each object individually, I could create a second array on the heap and save the pointer of each object in the first one
It would be possible, but not trivial. Think hard how you would implement element erasure. And then think about how you would implement other features such as random access correctly into the container with arrays that contain indices from which elements have been erased.
... safe
It can be implemented safely.
... beneficial(faster)
Of course, reducing allocations from N to 1 would be beneficial by itself. But it comes at the cost of some scheme to implement the erasure. Whether this cost is greater than the benefit of reduced allocations depends on many things such as how the container is used.
Is it safe to allocate objects in an array and forgetting about the array later?
"Forgetting" about an allocation seems like a way to say "memory leak".
You could achieve similar advantages with a custom "pool" allocator. Implementing support for custom allocators to your container might be more generally useful.
P.S. Boost already has a "ptr_vector" container that supports custom allocators. No need to reinvent the wheel.
I did this because I wanted the pointers and references to individual
elements to stay same, even after resizing.
You should just use std::vector::reserve to prevent reallocation of vector data when it is resized.
Vector is quite primitive, but is is highly optimized. It will be extremely hard for you to beat it with your code. Just inspect its API and try its all functionalities. To create something better advanced knowledge of template programing is required (which apparently you do not have yet).
What you are trying to come up with is a use of placement new allocation for a deque-like container. It's a viable optimization, but usually its done to reduce allocation calls and memory fragmentation, e.g. on some RT or embedded systems. The array maybe even a static array in that case. But if you also require that instances of T would occupy adjacent space, that's a contradicting requirement, resorting them would kill any performance gains.
... beneficial(faster)
Depends on T. E.g. there is no point to do that to something like strings or shared pointers. Or anything that actually allocates resources elsewhere, unless T allows to change that behaviour too.
I wonder if it would be ... possible, if instead of allocating each
object individually, I could create a second array on the heap and
save the pointer of each object in the first one
Yes it is possible, even with standard ISO containers, thanks to allocators.
There is concern of thread safety or awareness if this "array" appears to be shared resource between multiple writer and reader threads. You might want to implement thread-local storages instead of using shared one and implement semaphores for crossover cases.
Usual application for that is to allocate not on heap but in statically allocated array, predetermined. Or in array that was allocated once at start of program.
Note that if you use placement new you should not use delete on created objects, you have to call destructor directly. placement new overload is not a true new as far as delete concerned. You may or may not cause error but you certainly will cause an crash if you used static array and you will cause heap corruption when deleting element that got same address as dynamically allocated array beginning
This comes at performance cost when constructing and copying, because I need to create the array on the heap and each object of the array on the heap too.
Copying a POD is extremely cheap. If you research perfect forwarding you can achieve the zero cost abstraction for constructors and the emplace_back() function. When copying, use std::copy() as it is very fast.
Is allocating arrays on the heap faster than allocating each object individually?
Each allocation requires you to ask the operating system for memory. Unless you are asking for a particularly large amount of memory you can assume each request will be a constant amount of time. Instead of asking for a parking space 10 times, ask for 10 parking spaces.
Is it safe to allocate objects in an array and forgetting about the array later? (sounds pretty dumb i think)
Depends what you mean by safe. If you can't answer this question on your own, then you must cleanup the memory and not leak under any circumstance.
An example of a time you might ignore cleaning up memory is when you know the program is going to end and cleaning up memory just to exit is kinda pointless. Still, you should clean it up. Read Serge Ballesta answer for more information about lifetime.

Why do fancy pointers exist?

I've always seen pointers as a specialization of iterators, used in the particular case where objects are stored contiguously in memory. However, I found out that the Allocator member type pointer (which then defines the same member type on iterators and containers) is not required to be a real pointer but it may be what's called a fancy pointer that, according to cppreference,
is used to access objects allocated in address spaces that differ from the homogeneous virtual address space.
This is what I used to call an iterator but not a pointer, but that's not the point. I'm wondering how an Allocator is not required to allocate contiguous memory. Imagine passing a custom allocator to std::vector that does not allocate objects in a contiguous manner. For me that's not a vector anymore. If I don't want objects to be contiguous in memory I use a list rather than a vector with a custom allocator. They just look like a big source of confusion, why were they introduced?
So, C++ as a language has a concept of a pointer. For any type T*, the language requires certain specific things out of it. In particular, sizeof(T*) is statically-determined, and all language pointers have the same size.
In the long-before time however, the things that T* could work with did not reflect hardware capabilities. Imagine if hardware had two pools of memory, but these pools of memory have different maximum sizes. So if the compiler mapped T* to the smaller pool, then one could not have a T* that points to the larger pool of memory. After all, the address space of the larger memory would exceed the size of T*.
This led to the use of things like LONG_POINTER and the like for "pointing" into memory outside of the T* accessible space.
The goal of the allocator::pointer concept is to be able to write an allocator that can allow any allocator-aware container to work with such non-T*-compatible pools of memory transparently.
Of course... that is no longer relevant for most hardware. Virtual memory, 32 and 64-bit addressing, and the like are now standard. As such, if there's memory that's not directly accessible and you want to address it, the expectation is that you'll convert such memory into virtual addresses via OS calls which can then be used with language pointers just fine. See mapping of files and GPU memory.
It was intended to solve a problem at the library level, but the problem was instead solved at the hardware/OS level. Thus, it's an unnecessary complication of allocators and allocator-aware containers, but it remains present because of decisions made nearly 25 years ago.
I've always seen pointers as a specialization of iterators
Iteration of arrays is one of the uses of pointers. It is not their only purpose. They are also used as a "handle" to identify dynamic allocations.
why were [fancy pointers] introduced?
The page that you linked explains a reason for their introduction:
Such pointers were introduced to support segmented memory architectures ...
These days, such architectures are quite rare, but programmers have found other use cases:
... and are used today to access objects allocated in address spaces that differ from the homogeneous virtual address space that is accessed by raw pointers. An example of a fancy pointer is the mapping address-independent pointer boost::interprocess::offset_ptr, which makes it possible to allocate node-based data structures such as std::set in shared memory and memory mapped files mapped in different addresses in every process.
The standard paper P0773R0 linked in the linked page has a more detailed list of purposes:
Scenario A. "Offset" pointers with a modified bit-level representation.
Scenario B. Small-data-area "near" pointers.
Scenario C. High-memory-area "far" pointers.
Scenario D. "Fat" pointers carrying metadata for dereference-time.
Scenario E. "Segmented" pointers carrying metadata for deallocate-time.
Note that not all standard library implementations support all use cases of fancy pointers for all standard containers as explored in P0773R0.

Does a vector (container) need to use an "allocator"?

I was looking into how custom containers are created, such as eastl's container and several other models and I see that they all use an "allocator", much like std::vector does with std::allocator. Which got me thinking, why do new implementations of a vector container use an allocator when they typically have an underlying memory management override for new and delete?
Being able to replace operator new() and operator delete() (and their array versions) at program level may be sufficient for small program. If you have programs consisting of many millions lines of code, running many different threads this isn't at all suitable. You often want or even need better control. To make the use of custom allocators effective, you also need to be able to allocate subobjects using the same objects as the outer allocator.
For example, consider the use of memory arena to be used when answering a request in some sort of a server which is probably running multiple threads. Getting memory from operator new() is probably fairly expensive because it involves allocating a lock and finding a suitable chunk of memory in a heap which is getting more and more fragmented. To avoid this, you just want to allocate a few chunks of memory (ideally just one but you may not know the needed size in advance) and put all objects there. An allocator can do this. To do so, you need to inform all entities allocating memory about this chunk of memory, i.e. you need to pass the allocator to everything possibly allocating memory. If you allocate e.g. a std::vector<std::string, A> the std::string objects should know about the allocator: just telling the std::vector<std::string, A> where and how to allocate memory isn't enough to avoid most memory allocations: you also need to tell it to the std::string (well, actually the std::basic_string<char, std::char_traits<char>, B> for a suitable allocator type B which is related to A).
That is, if you really mean to take control of your memory allocations, you definitely want to pass allocators to everything which allocates memory. Using replaced versions of the global memory management facilities may help you but it is fairly constrained. If you just want to write a custom container and memory allocation isn't much of your concern you don't necessarily need to bother. In big systems which are running for extensive periods of time memory allocation is one of the many concerns, however.
Allocators are classes that define memory models to be used by Standard Library containers.
Every Standard Library container has its own default allocator, However the users of the container can provide their own allocators over the default.
This is for additional flexibility.
It ensures that users can provide their own allocator which provides an alternate form of memory management(eg: Memory Pools) apart from the regular heap.
If you want to produce a standard-compatible container then the answer is of course yes... allocators are described in the standard so they are required.
In my personal experience however allocators are not that useful... therefore if you are developing a container for a specific use to overcome some structural limitation of the standard containers then I'd suggest to forget about allocators unless you really see a reason for using them.
If instead you are developing a container just because you think you can do better than the standard vector then my guess is that you are wasting your time. I don't like the allocator idea design (dropping on the type something that shouldn't be there) but luckily enough they can be just ignored. The only annoyance with allocators when you don't need them (i.e. always) is probably some more confusion in error messages.. that however are a mess anyway.

Does STL Vector use 'new' and 'delete' for memory allocation by default?

I am working on a plugin for an application, where the memory should be allocated by the Application and keep track of it. Hence, memory handles should be obtained from the host application in the form of buffers and later on give them back to the application. Now, I am planning on using STL Vectors and I am wondering what sort of memory allocation does it use internally.
Does it use 'new' and 'delete' functions internally? If so, can I just overload 'new' and 'delete' with my own functions? Or should I create my own template allocator which looks like a difficult job for me since I am not that experienced in creating custom templates.
Any suggestions/sample code are welcome. Memory handles can be obtained from the application like this
void* bufferH = NULL;
bufferH = MemReg()->New_Mem_Handle(size_of_buffer);
MemReg()->Dispose_Mem_Handle(bufferH); //Dispose it
vector uses std::allocator by default, and std::allocator is required to use global operator new (that is, ::operator new(size_t)) to obtain the memory (20.4.1.1). However, it isn't required to call it exactly once per call to allocator::allocate.
So yes, if you replace global operator new then vector will use it, although not necessarily in a way that really allows your implementation to manage memory "efficiently". Any special tricks you want to use could, in principle, be made completely irrelevant by std::allocator grabbing memory in 10MB chunks and sub-allocating.
If you have a particular implementation in mind, you can look at how its vector behaves, which is probably good enough if your planned allocation strategy is inherently platform-specific.
STL containers use an allocator they are given at construction time, with a default allocator that uses operator new and operator delete.
If you find the default is not working for you, you can provide a custom allocator that conforms to the container's requirements. There are some real-world examples cited here.
I would measure performance using the default first, and optimize only if you really need to. The allocator abstraction offers you a relatively clean way to fine-tune here without major redesign. How you use the vector could have far more performance impact than the underlying allocator (reserve() in advance, avoid insert and removal in the middle of the range of elements, handle copy construction of elements efficiently - the standard caveats).
std::vector uses the unitialized_* functions to construct its elements from raw memory (using placement new). It allocates storage using whatever allocator it was created with, and by default, that allocator uses ::operator new(size_t) and ::operator delete(void *p) directly (i.e., not a type specific operator new).
From this article, "The concept of allocators was originally introduced to provide an abstraction for different memory models to handle the problem of having different pointer types on certain 16-bit operating systems (such as near, far, and so forth)" ...
"The standard provides an allocator that internally uses the global operators 'new' and 'delete'"
The author also points out the alocator interface isn't that scary. As Neil Buchanan would say, "try it yourself!"
The actual std::allocator has been optimized for a rather large extent of size objects. It isn't the best when it comes to allocating many small objects nor is it the best for many large objects. That being said, it also wasn't written for multi-threaded applications.
May I suggest, before attempting to write your own you check out the Hoard allocator if you're going the multi-threaded route. (Or you can check out the equally appealing Intel TBB page too.)

Second argument to std::vector

Looking at vector, I realized that I have never used the second argument when creating vectors.
std::vector<int> myInts; // this is what I usually do
std::vector<int, ???> myOtherInts; // but is there a second argument there?
Looking at the link above it says that it is for:
Allocator object to be used instead of constructing a new one.
or, as for this one:
Allocator: Type of the allocator object used to define the storage allocation model. By default, the allocator class template for type T is used, which defines the simplest memory allocation model and is value-independent.
I guess it has to do with something with memory management. However, I am not sure how to use that.
Any pointers regarding this?
The default allocator, std::allocator<>, will handle all allocations made by std::vector<> (and others). It will make new allocations from the heap each time a new allocation is needed.
By providing a custom allocator, you can for instance allocate a big chunk of memory up front and then slice it up and hand out smaller pieces when separate allocations are needed. This will increase the allocation speed dramatically, which is good for example in games, at the cost of increased complexity as compared to the default allocator.
Some std type implementations have internal stack-based storage for small amounts of data. For instance, std::basic_string<> might use what is called a small string optimization, where only strings longer than some fixed length, say 16 characters (just an example!), gets an allocation from the allocator, otherwise an internal array is used.
Custom allocators are rarely used in general case. Some examples of where they can be useful:
Optimization for a specific pattern of allocations. For example, a concurrent program can pre-allocate a large chunk of memory via standard means at the beginning of task execution and then shave off pieces off it without blocking on the global heap mutex. When task is completed, entire memory block can be disposed of. To use this technique with STL containers, a custom allocator can be employed.
Embedded software, where a device has several ranges of memory with different properties (cached/noncached, fast/slow, volatile/persistent etc). A custom allocator can be used to place objects stored in an STL container in a specific memory region.
Maybe this will help: http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c4079
You may try google for: stl allocator.
Allocators (STL) help you to manage memory for your objects in vector class. you may use the custom allocator for different memory model( etc).
Hi you can find example of custom allocator http://www.codeproject.com/KB/cpp/allocator.aspx