Storage allocator - what is it? - c++

I know of storage classes in both C and C++ (static, extern, auto, register, C++ also adds mutable and some compiler-specific ones) but I can't figure out what a storage allocator is. I don't think it's referred to memory allocators implementable on STL, what is it in simple terms?

It's whatever is behind operator new and operator delete (not to be confused with the new operator and the delete operator). operator new allocates memory from the free store, and operator delete releases memory previously allocated by operator new for possible reuse. When code does foo *ptr = new foo (new operator), the compiler generates code that calls operator new to get the right number of bytes of storage, then calls the constructor for foo. When code does delete ptr (delete operator) the compiler calls the destructor for foo, then calls operator delete to release the memory.
Note that this is how the term is used in the C++03 standard. In the C++11 standard it is also used to refer to standard allocators.

In the C++ standard, that term is used to refer to the allocator class used by STL-style containers - either std::allocator, or a user-defined custom allocator that meets the requirements given by C++11 17.6.3.5.
However, it's not a formally defined term, and also appears once referring to the implementation of the free store - that is, the dynamic storage allocated by new.
[NOTE: I'm referring to the current (2011) language specification. As noted in the comments, historical versions of the specification apparently only used the term (informally) to refer to the free store]

Related

How can deleting a void pointer do anything other than invoke the global delete operator?

The C++ standard very clearly and explicitly states that using delete or delete[] on a void-pointer is undefined behavior, as quoted in this answer:
This implies that an object cannot be deleted using a pointer of type void* because there are no objects of type void.
However, as I understand it, delete and delete[] do just two things:
Call the appropriate destructor(s)
Invoke the appropriate operator delete function, typically the global one
There is a single-argument operator delete (as well as operator delete[]), and that single argument is void* ptr.
So, when the compiler encounters a delete-expression with a void* operand, it of course could maliciously do some completely unrelated operation, or simply output no code for that expression. Better yet, it could emit a diagnostic message and refuse to compile, though the versions of MSVS, Clang, and GCC I've tested don't do this. (The latter two emit a warning with -Wall; MSVS with /W3 does not.)
But there's really only one sensible way to deal with each of the above steps in the delete operation:
void* specifies no destructor, so no destructors are invoked.
void is not a type and therefore cannot have a specific corresponding operator delete, so the global operator delete (or the [] version) must be invoked. Since the argument to the function is void*, no type conversion is necessary, and the operator function must behavior correctly.
So, can common compiler implementations (which, presumably, are not malicious, or else we could not even trust them to adhere to the standard anyway) be relied on to follow the above steps (freeing memory without invoking destructors) when encountering such delete expressions? If not, why not? If so, is it safe to use delete this way when the actual type of the data has no destructors (e.g. it's an array of primitives, like long[64])?
Can the global delete operator, void operator delete(void* ptr) (and the corresponding array version), be safely invoked directly for void* data (assuming, again, that no destructors ought to be called)?
A void* is a pointer to an object of unknown type. If you do not know the type of something, you cannot possibly know how that something is to be destroyed. So I would argue that, no, there is not "really only one sensible way to deal with such a delete operation". The only sensible way to deal with such a delete operation, is to not deal with it. Because there is simply no way you could possibly deal with it correctly.
Therefore, as the original answer you linked to said: deleting a void* is undefined behavior ([expr.delete] §2). The footnote mentioned in that answer remains essentially unchanged to this day. I'm honestly a bit astonished that this is simply specified as undefined behavior rather than making it ill-formed, since I cannot think of any situation in which this could not be detected at compile time.
Note that, starting with C++14, a new expression does not necessarily imply a call to an allocation function. And neither does a delete expression necessarily imply a call to a deallocation function. The compiler may call an allocation function to obtain storage for an object created with a new expression. In some cases, the compiler is allowed to omit such a call and use storage allocated in other ways. This, e.g., enables the compiler to sometimes pack multiple objects created with new into one allocation.
Is it safe to call the global deallocation function on a void* instead of using a delete expression? Only if the storage was allocated with the corresponding global allocation function. In general, you can't know that for sure unless you called the allocation function yourself. If you got your pointer from a new expression, you generally don't know if that pointer would even be a valid argument to a deallocation function, since it may not even point to storage obtained from calling an allocation function. Note that knowing which allocation function must've been used by a new expression is basically equivalent to knowing the dynamic type of whatever your void* points to. And if you knew that, you could also just static_cast<> to the actual type and delete it…
Is it safe to deallocate the storage of an object with trivial destructor without explicitly calling the destructor first? Based on, [basic.life] §1.4, I would say yes. Note that, if that object is an array, you might still have to call the destructors of any array elements first. Unless they are also trivial.
Can you rely on common compiler implementations to produce the behavior you deem reasonable? No. Having a formal definition of what exactly you can rely on is literally the whole point of having a standard in the first place. Assuming you have a standard-conforming implementation, you can rely on the guarantees the standard gives you. You can also rely on any additional guarantees the documentation of a particular compiler may give you, so long as you use that particular version of that particular compiler to compile your code. Beyond that, all bets are off…
If you want to invoke the deallocation function, then just call the deallocation function.
This is good:
void* p = ::operator new(size);
::operator delete(p); // only requires that p was returned by ::operator new()
This is not:
void* p = new long(42);
delete p; // forbidden: static and dynamic type of *p do not match, and static type is not polymorphic
But note, this also is not safe:
void* p = new long[42];
::operator delete(p); // p was not obtained from allocator ::operator new()
While the Standard would allow an implementation to use the type passed to delete to decide how to clean up the object in question, it does not require that implementations do so. The Standard would also allow an alternative (and arguably superior) approach based on having the memory-allocating new store cleanup information in the space immediately preceding the returned address, and having delete implemented as a call to something like:
typedef void(*__cleanup_function)(void*);
void __delete(void*p)
{
*(((__cleanup_function*)p)[-1])(p);
}
In most cases, the cost of implementing new/delete in such fashion would be relatively trivial, and the approach would offer some semantic benefit. The only significant downside of such an approach is that it would require that implementations that document the inner workings of their new/delete implementation, and whose implementations can't support a type-agnostic delete, would have to break any code that relies upon their documented inner workings.
Note that if passing a void* to delete were a constraint violation, that would forbid implementations from providing a type-agnostic delete even if they would be easily capable of doing so, and even if some code written for them would relies upon such ability. The fact that code relies upon such an ability would make it portable only to implementations that can provide it, of course, but allowing implementations to support such abilities if they choose to do so is more useful than making it a constraint violation.
Personally, I would have liked to see the Standard offer implementations two specific choices:
Allow passing a void* to delete and delete the object using whatever type had been passed to new, and define a macro indicating support for such a construct.
Issue a diagnostic if a void* is passed to delete, and define a macro indicating it does not support such a construct.
Programmers whose implementations supported type-agnostic delete could then decide whether the benefit they could receive from such feature would justify the portability limitations imposed by using it, and implementers could decide whether the benefits of supporting a wider range of programs would be sufficient to justify the small cost of supporting the feature.
void* specifies no destructor, so no destructors are invoked.
That is most likely one of the reasons it's not permitted. Deallocating the memory that backs a class instance without calling the destructor for said class is just all around a really really bad idea.
Suppose, for example, the class contains a std::map that has a few hundred thousand elements in it. That represents a significant amount of memory. Doing what you're proposing would leak all of that memory.
A void doesn't have a size, so the compiler has no way of knowing how much memory to deallocate.
How should the compiler handle the following?
struct s
{
int arr[100];
};
void* p1 = new int;
void* p2 = new s;
delete p1;
delete p2;

What are the limitations of overloading, overriding and replacing new/delete?

I understand that there are 3 general ways to modify the behaviour of new and delete in C++:
Replacing the default new/delete and new[]/delete[]
Overriding or overloading the placement versions (overriding the one with a memory location passed to it, overloading when creating versions which pass other types or numbers of arguments)
Overloading class specific versions.
What are the restrictions for performing these modifications to the behaviour of new/delete?
In particular are there limitations on the signatures that new and delete can be used with?
It makes sense if any replacement versions must have the same signature (otherwise they wouldn't be replacement or would break other code, like the STL for example), but is it permissible to have global placement or class specific versions return smart pointers or some custom handle for example?
First off, don't confuse the new/delete expression with the operator new() function.
The expression is a language construct that performs construction and destruction. The operator is an ordinary function that performs memory (de)allocation.
Only the default operators (operator new(size_t) and operator delete(void *) can be used with the default new and delete expressions. All other forms are summarily called "placement" forms, and for those you can only use new, but you have to destroy objects manually by invoking the destructor. Placement forms are of rather limited and specialised need. By far the most useful placement form is global placement-new, ::new (addr) T, but the behavior of that cannot even be changed (which is presumably why it's the only popular one).
All new operators must return void *. These allocation functions are far more low-level than you might appreciate, so basically you "will know when you need to mess with them".
To repeat: C++ separates the notions of object construction and memory allocation. All you can do is provide alternative implementations for the latter.
When you overload new and delete within a class you are effectively modifying the way the memory is allocated and released for the class, asking for it to give you this control.
This may be done when a class wants to use some kind of pool to allocate its instances, either for optimisation or for tracking purposes.
Restrictions, as with pretty much any operator overload, is the parameter list you may pass, and the behaviour it is expected to adhere to.

Is C++ code allowed to call `::operator new()` directly?

According to C++ Standard paragraph 3.7.3/1 objects should be dynamically created with new expression and the C++ runtime should provide an allocation function ::operator new().
Once in a while it is necessary to call ::operator new() directly.
Does the C++ Standard allow such calls to ::operator new() function or is this (and related) function for internal use only?
It's perfectly acceptable to call operator new and operator delete directly; they are a part of the global namespace and act like a C++-ier version of malloc and free that interact with set_new_handler and the bad_alloc exceptions a bit nicer. The C++ ISO standard even contains a few examples of this. For example, §13.5/4 has this example:
Operator functions are usually not called directly; instead they are invoked to evaluate the operators they implement (13.5.1 - 13.5.7). They can be explicitly called, however, using the operator-function-id as the name of the function in the function call syntax (5.2.2). [Example:
complex z = a.operator+(b); // complex z = a+b;
void* p = operator new(sizeof(int)*n);
—end example]
Yes, it is allowed to call the global operator new function directly – though it's not as often required as you might believe. You must match allocation and deallocation functions, but if you have full control over both, then you can always use new[] and delete[] with char. However, that would be a new-expression and delete-expression, so you are only "required" to use the global functions themselves if you need a function pointer. (You would have to wrap the new-expression to get a function pointer, otherwise.)
If you replace these global functions so that new and new[] use different heaps, for example, then you might also want to explicitly use ::operator new, but this is rare.

What's the purpose of having a separate "operator new[]"?

Looks like operator new and operator new[] have exactly the same signature:
void* operator new( size_t size );
void* operator new[]( size_t size );
and do exactly the same: either return a pointer to a big enough block of raw (not initialized in any way) memory or throw an exception.
Also operator new is called internally when I create an object with new and operator new[] - when I create an array of objects with new[]. Still the above two special functions are called by C++ internally in exactly the same manner and I don't see how the two calls can have different meanings.
What's the purpose of having two different functions with exactly the same signatures and exactly the same behavior?
In Design and Evolution of C++ (section 10.3), Stroustrup mentions that if the new operator for object X was itself used for allocating an array of object X, then the writer of X::operator new() would have to deal with array allocation too, which is not the common usage for new() and add complexity. So, it was not considered to use new() for array allocation. Then, there was no easy way to allocate different storage areas for dynamic arrays. The solution was to provide separate allocator and deallocator methods for arrays: new[] and delete[].
The operators can be overridden (for a specific class, or within a namespace, or globally), and this allows you to provide separate versions if you want to treat object allocations differently from array allocations. For example, you might want to allocate from different memory pools.
I've had a reasonably good look at this, and to be blunt there's no reason from an interface standpoint.
The only possible reason that I can think of is to allow an optimization hint for the implementation, operator new[] is likely to be called upon to allocate larger blocks of memory; but that is a really, really tenuous supposition as you could new a very large structure or new char[2] which doesn't really count as large.
Note that operator new[] doesn't add any magic extra storage for the array count or anything. It is the job of the new[] operator to work out how much overhead (if any) is needed and to pass the correct byte count to operator new[].
[A test with gcc indicates that no extra storage is needed by new[] unless the type of the array members being constructed have a non-trivial desctructor.]
From an interface and contract standpoint (other than require the use of the correct corresponding deallocation function) operator new and operator new[] are identical.
One purpose is that they can be separately defined by the user. So if I want to initialize memory in single heap-allocated objects to 0xFEFEFEFE and memory in heap-allocated arrays to 0xEFEFEFEF, because I think it will help me with debugging, then I can.
Whether that's worth it is another matter. I guess if your particular program mostly uses quite small objects, and quite large arrays, then you could allocate off different heaps in the hope that this will reduce fragmentation. But equally you could identify the classes which you allocate large arrays of, and just override operator new[] for those classes. Or operator new could switch between different heaps based on the size.
There is actually a difference in the wording of the requirements. One allocates memory aligned for any object of the specified size, the other allocates memory aligned for any array of the specified size. I don't think there's any difference - an array of size 1 surely has the same alignment as an object - but I could be mistaken. The fact that by default the array version returns the same as the object version strongly suggests there is no difference. Or at least that the alignment requirements on an object are stricter than those on an array, which I can't make any sense of...
Standard says that new T calls operator new( ) and new T[ ] results in a call of operator new[]( ). You could overload them if you want. I believe that there is no difference between them by default. Standard says that they are replaceable (3.7.3/2):
The library provides default definitions for the global allocation and deallocation functions. Some global
allocation and deallocation functions are replaceable (18.4.1). A C + + program shall provide at most one
definition of a replaceable allocation or deallocation function. Any such function definition replaces the
default version provided in the library (17.4.3.4). The following allocation and deallocation functions
(18.4) are implicitly declared in global scope in each translation unit of a program
void* operator new(std::size_t) throw(std::bad_alloc);
void* operator new[](std::size_t) throw(std::bad_alloc);
void operator delete(void*) throw();
void operator delete[](void*) throw();

STL allocators and operator new[]

Are there STL implementations that use operator new[] as an allocator? On my compiler, making Foo::operator new[] private did not prevent me from creating a vector<Foo>... is that behavior guaranteed by anything?
C++ Standard, section 20.4.1.1. The default allocator allocate() function uses global operator new:
pointer allocate(size_type n, allocator<void>::const_pointerhint=0);
3 Notes: Uses ::operator new(size_t) (18.4.1).
std library implementations won't use T::operator new[] for std::allocator. Most of them use their own memory pooling infrastructure behind the scenes.
In general, if you want to stop Foo objects being dynamically allocated, you'll have to have make all the constructors private and provide a function that creates Foo objects. Of course, you won't be able to create them as auto variables either though.
std::vector uses an Allocator that's passed as a template argument, which defaults to std::allocate. The allocator doesn't work like new[] though -- it just allocates raw memory, and placement new is used to actually create the objects in that memory when you tell it to add the objects (e.g. with push_back() or resize()).
About the only way you could use new[] in an allocator would be if you abused things a bit, and allocated raw space using something like new char[size];. As abuses go, that one's fairly harmless, but it's still unrelated to your overload of new[] for the class.
If you want to prohibit the creation of your object make private constructor rather than operator new.
In addition to the other answers here, if you want to prevent anyone from creating a STL container for your type Foo, then simply make the copy-constructor for Foo private (also the move-constructor if you're working with C++11). All STL-container objects must have a valid copy or move constructor for the container's allocator to properly call placement new and construct a copy of the object in the allocated memory block for the container.