new operator allocation function order contiguity & initial values - c++

C++ standard section 3.7.3.1 says that
"The order, contiguity, and initial value of storage allocated by
successive calls to an allocation function is unspecified."
What is the meaning of order & contiguity? Why it is unspecified? Why initial value is also unspecified?

order
Means that the allocator is not constrained to returning addresses that are steadily increasing or decreasing (or any other pattern). This makes sense as memory is usually recycled and reused many times during a program's lifetime. It would be possible to ask from an allocator that the order of storage is defined in some way, but there would be very little gain (if any) opposed to considerable overhead, both in code execution and memory efficiency.
In other words, if B is allocated after A, and C is allocated after B, then A, B, and C may appear in any ordering (ABC, BAC, CAB, CBA, ...) in memory. You know that each of the addresses of A, B, and C is valid, but nothing more.
As pointed out by #Deduplicator, there exists a well-known issue in concurrent programming called "the ABA problem". This is, indirectly, a consequence of the fact that a newly allocated object could in principle have any address (note that this is somewhat advanced).
The ABA problem occurs when you use a compare-exchange instruction to atomically modify memory locations, for example in a lockfree list or queue. Your assumption is that the compare-exchange detects when someone else has concurrently modified the pointer that you're trying to modify. However, the truth is that it only detects whether the bit pattern of the address is different. Now it may happen that you free an object and push another object (asking the allocator for storage) to the container and the allocator gives you back the exact same address -- this is absolutely legal. Another thread uses the compare-exchange instruction, and it passes just fine. Bang!
To guard against this, lockfree structures often use pointers with built-in reference numbers.
contiguity
This is a similar constraint, which basically says that there may or may not be "holes" between whatever the allocator returns in subsequent allocations.
This means, if you allocate an object of size 8, and then allocate another object, the allocator might return address A and address A+8. But it might also return address A+10 instead for the second allocation at its own discretion.
This is something that regularly happens with almost every allocation because allocators often store metadata in addition to the actual object, and usually organize memory in terms of "buckets" down to a particular size (most often 16 bytes).
So if you allocate an integer (usually 4 bytes) or a pointer (usually 4 or 8 bytes), then there will be a "hole" between this and the next thing you allocate.
Again, it would be possible to require that an allocator returns objects in a contiguous manner, but this would mean a serious performance impact compared to something that is relatively cheap.
initial value
This means no more and no less than you must properly initialize your objects and cannot assume that they have any particular value. No, memory will not be zero-initialized automatically[1].
Requiring from the allocator to zero-initialize memory would be possible, but it would be less efficient than possible (although not as devastating as the other two constraints).
[1] Well, in some cases, it will be. The operating system will normally zero out all pages before giving a new page to a process for the first time. This is done for security, so no secret/confidential data is leaked. That is, however, an implementation detail which is out of the scope of the C++ standard.

It essentially means that the new operator can allocate memory from wherever in the system it deems necessary, and to not rely on any ordering of allocation for your program.

Related

CppCoreGuidelines: What are hot int copies?

I've been reading CppCoreGuidelines F.15 and I don't understand the following sentences from the table of parameter passing:
"Cheap" ≈ a handful of hot int copies
"Moderate cost" ≈ memcpy hot/contiguous ~1KB and no allocation
What does "hot int copy" mean?
"Hot" in this case likely refers to the likelihood of being cached. A particular piece of memory is "cold" if it is likely not in the cache, due to not having been touched recently within this thread of execution. Conversely, a piece of memory is "hot" if it likely has been touched recently, or is contiguous with memory that has been recently touched.
So it's talking about the cost of doing a memory copy of something that is currently in the cache and is therefore cheap in terms of actual memory bandwidth.
For example, consider a function that returns an array<int, 50>. If the values in that array were generated by the function itself, then those integers are "hot", since they're still almost certainly in the cache. So returning it by value is considered OK.
However, if there is some data structure that contains such a type, this function could have simply retrieved a pointer to that object. Returning it by value means doing several uncached memory accesses, since you have to copy to the return value. That is less than ideal from a memory cache perspective, so perhaps returning a pointer to the array would be more appropriate.
Obviously, uncached accesses will happen either way, but in the latter case, the caller gets to decide which accesses to perform and which not to perform.

Why can't we allocate dynamic memory on the stack?

Allocating stuff on the stack is awesome because than we have RAII and don't have to worry about memory leaks and such. However sometimes we must allocate on the heap:
If the data is really big (recommended) - because the stack is small.
If the size of the data to be allocated is only known at runtime (dynamic allocation).
Two questions:
Why can't we allocate dynamic memory (i.e. memory of size that is
only known at runtime) on the stack?
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
Edit: I know some compilers support Variable Length Arrays - which is dynamically allocated stack memory. But that's really an exception to the general rule. I'm interested in understanding the fundamental reasons for why generally, we can't allocate dynamic memory on the stack - the technical reasons for it and the rational behind it.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
It's more complicated to achieve this. The size of each stack frame is burned-in to your compiled program as a consequence of the sort of instructions the finished executable needs to contain in order to work. The layout and whatnot of your function-local variables, for example, is literally hard-coded into your program through the register and memory addresses it describes in its low-level assembly code: "variables" don't actually exist in the executable. To let the quantity and size of these "variables" change between compilation runs greatly complicates this process, though it's not completely impossible (as you've discovered, with non-standard variable-length arrays).
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable
This is just a consequence of the syntax. C++'s "normal" variables happen to be those with automatic or static storage duration. The designers of the language could technically have made it so that you can write something like Thing t = new Thing and just use a t all day, but they did not; again, this would have been more difficult to implement. How do you distinguish between the different types of objects, then? Remember, your compiled executable has to remember to auto-destruct one kind and not the other.
I'd love to go into the details of precisely why and why not these things are difficult, as I believe that's what you're after here. Unfortunately, my knowledge of assembly is too limited.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
Technically, this is possible. But not approved by the C++ standard. Variable length arrays(VLA) allows you to create dynamic size constructs on stack memory. Most compilers allow this as compiler extension.
example:
int array[n];
//where n is only known at run-time
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
We can. Whether you do it or not depends on implementation details of a particular task at hand.
example:
int i;
int *ptr = &i;
We can allocate variable length space dynamically on stack memory by using function _alloca. This function allocates memory from the program stack. It simply takes number of bytes to be allocated and return void* to the allocated space just as malloc call. This allocated memory will be freed automatically on function exit.
So it need not to be freed explicitly. One has to keep in mind about allocation size here, as stack overflow exception may occur. Stack overflow exception handling can be used for such calls. In case of stack overflow exception one can use _resetstkoflw() to restore it back.
So our new code with _alloca would be :
int NewFunctionA()
{
char* pszLineBuffer = (char*) _alloca(1024*sizeof(char));
…..
// Program logic
….
//no need to free szLineBuffer
return 1;
}
Every variable that has a name, after compilation, becomes a dereferenced pointer whose address value is computed by adding (depending on the platform, may be "subtracting"...) an "offset value" to a stack-pointer (a register that contains the address the stack actually is reaching: usually "current function return address" is stored there).
int i,j,k;
becomes
(SP-12) ;i
(SP-8) ;j
(SP-4) ;k
To let this "sum" to be efficient, the offsets have to be constant, so that they can be encode directly in the instruction op-code:
k=i+j;
become
MOV (SP-12),A; i-->>A
ADD A,(SP-8) ; A+=j
MOV A,(SP-4) ; A-->>k
You see here how 4,8 and 12 are now "code", not "data".
That implies that a variable that comes after another requires that "other" to retain a fixed compile-time defined size.
Dynamically declared arrays can be an exception, but they can only be that last variable of a function. Otherwise, all the variables that follows will have an offset that have to be adjusted run-time after that array allocation.
This creates the complication that dereferencing the addresses requires arithmetic (not just a plain offset) or the capability to modify the opcode as variables are declared (self modifying code).
Both the solution becomes sub-optimal in term of performance, since all can break the locality of the addressing, or add more calculation for each variable access.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
You can with Microsoft compilers using _alloca() or _malloca(). For gcc, it's alloca()
I'm not sure it's part of the C / C++ standards, but variations of alloca() are included with many compilers. If you need aligned allocation, such a "n" bytes of memory starting on a "m" byte boundary (where m is a power of 2), you can allocate n+m bytes of memory, add m to the pointer and mask off the lower bits. Example to allocate hex 1000 bytes of memory on a hex 100 boundary. You don't need to preserve the value returned by _alloca() since it's stack memory and automatically freed when the function exits.
char *p;
p = _alloca(0x1000+0x100);
(size_t)p = ((size_t)0x100 + (size_t)p) & ~(size_t)0xff;
Most important reason is that Memory used can be deallocated in any order but stack requires deallocation of memory in a fixed order i.e LIFO order.Hence practically it would be difficult to implement this.
Virtual memory is a virtualization of memory, meaning that it behaves as the resource it is virtualizing (memory). In a system, each process has a different virtual memory space:
32-bits programs: 2^32 bytes (4 Gigabytes)
64-bits programs: 2^64 bytes (16 Exabytes)
Because virtual space is so big, only some regions of that virtual space are usable (meaning that only some regions can be read/written just as if it were real memory). Virtual memory regions are initialized and made usable through mapping. Virtual memory does not consume resources and can be considered unlimited (for 64-bits programs) BUT usable (mapped) virtual memory is limited and use up resources.
For every process, some mapping is done by the kernel and other by the user code. For example, before even the code start executing, the kernel maps specific regions of the virtual memory space of a process for the code instructions, global variables, shared libraries, the stack space... etc. The user code uses dynamic allocation (allocation wrappers such as malloc and free), or garbage collectors (automatic allocation) to manage the virtual memory mapping at application-level (for example, if there is no enough free usable virtual memory available when calling malloc, new virtual memory is automatically mapped).
You should differentiate between mapped virtual memory (the total size of the stack, the total current size of the heap...) and allocated virtual memory (the part of the heap that malloc explicitly told the program that can be used)
Regarding this, I reinterpret your first question as:
Why can't we save dynamic data (i.e. data whose size is only known at runtime) on the stack?
First, as other have said, it is possible: Variable Length Arrays is just that (at least in C, I figure also in C++). However, it has some technical drawbacks and maybe that's the reason why it is an exception:
The size of the stack used by a function became unknown at compile time, this adds complexity to stack management, additional register (variables) must be used and it may impede some compiler optimizations.
The stack is mapped at the beginning of the process and it has a fixed size. That size should be increased greatly if variable-size-data is going to be placed there by default. Programs that do not make extensive use of the stack would waste usable virtual memory.
Additionally, data saved on the stack must be saved and deleted in Last-In-First-Out order, which is perfect for local variables within functions but unsuitable if we need a more flexible approach.
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable?
As this answer explains, we can.
Read a bit about Turing Machines to understand why things are the way they are. Everything was built around them as the starting point.
https://en.wikipedia.org/wiki/Turing_machine
Anything outside of this is technically an abomination and a hack.

To heap or stack array. A user-defined type conundrum

UPDATE
Rewrote the question to better fit what I'm trying to ask.
When designing a mostly short-lived container that isn't big enough to warrant a heap allocation but isn't small enough that move semantics will not make any difference. How should one decide on which way to design the container so that it is performant (size/space is secondary to).
As far as I can see, there are two methods:
Method (A) implement internal data as a pointer and use heap allocation. T* data = new T[]
Method (B) implement stack array data structure. T data[]
I am new to C++ and have been reading up on many articles on SO regarding stack vs heap. This is what I currently gathered...
Method (A)
advantages:
No full struct/class memory allocation until required (can use nullptr on an empty constructor until initialised / assigned), though pointer still require 32 bit / 64 bit.
Possible to make move constructible and move assignable.
disadvantage:
must allocate per object (unless heap memory pooling is used)
Method (B)
advantages:
Stack memory is faster than heap (because acquiring stack memory only involves moving stack point vs heap allocation of finding continous block in a contented environment.)
More likely to be in processor cache because memory locations are for "hot" data.
disadvantages:
Not possible to move
Must allocate memory immediately where it's declared.
If for example, given a user designed type that will be used to do a lot of calculations, which means a lot of temporaries. What would be the most performant way?
The heap array's ability to move, and when combined with nullptr the ability to do zero-copy seems to me like it would eliminate any advantage a stack would give, since moving involves only 1 pointer substition vs a medium sized array copy.
Are there factors I've missed or misunderstood? What would one generally need to know in order to decide which method to take?
Can the stack copy arrays of data faster than a heap move/pointer reassignment?
You shouldn't dynamically allocate if you're not sure whether you need it. Speed and space implications aside, it's simply more code to write and get right. Moreover, should some client code need indirection (e.g. to move efficiently, or to save space by omitting data), they can always add this from the outside: std::unique_ptr<ModeratelyLargeType> works fine.
Regarding the specific int24 type:
You should make it a struct containing three uint8_ts or some similar way of getting 24 bits directly into the object. I assume that's what you mean by "stack array". The "advantages" of method A and the corresponding "disadvantages" of method B you list are nonsense:
[A +] No memory allocation until required (can use nullptr on an empty constructor until initialised / assigned)
[B -] Must allocate memory immediately where it's declared.
Method A always needs to allocate a pointer, and the pointer is larger than the actual data (32 or 64 bits vs. 24 bits). Not only does method A use more than twice as much memory when actually used, it uses more memory even when not used. All of this without considering the space overhead of heap allocator metadata.
For completeness I should add, depending on how exactly you implement this, alignment may force the compiler to insert padding, making the structure larger than 24 bits but still no larger than a null pointer (so the above still stands). Likewise, it obviously won't use three quarters of a 32 bit register.
[A +] Possible to make move constructible and move assignable.
[B -] Not possible to move?
Well, you can move the pointer, but that's not more effective than copying three bytes. Again, the pointer is larger than the data it points at.

Does an allocation hint get used?

I was reading Why is there no reallocation functionality in C++ allocators? and Is it possible to create an array on the heap at run-time, and then allocate more space whenever needed?, which clearly state that reallocation of a dynamic array of objects is impossible.
However, in The C++ Standard Library by Josuttis, it states an Allocator, allocator, has a function allocate with the following syntax
pointer allocator::allocate(size_type num, allocator<void>::pointer hint = 0)
where the hint has an implementation defined meaning, which may be used to help improve performance.
Are there any implementations that take advantage of this?
I have gained significant performance advantages for iteration times on small scalar types in my plf::colony c++ container using hints with std::allocator under Visual Studio 2010-2013 (iteration speed increased by ~21%), and much smaller speedups under GCC 5.1. So it's safe to say that with those compilers and std::allocator, it makes a difference. But the difference will be compiler-dependent. I am not aware of the ratio of hint-ignoring to hint-observing allocators.
I'm not sure about specific implementations, but note that the allocator isn't allowed to return the hint pointer value before it's been passed to deallocate. So that can't be used as a primitive operation to form a reallocate.
The Standard says the hint must have been returned by a previous call to allocate. It says "The use of [the hint] is unspecified, but it is
intended as an aid to locality." So if you're allocating and releasing a sequence of similar-sized blocks on one thread, you might pass the previously-freed value to avoid cache contention between microprocessor caches.
Otherwise, when CPU B sees that you're using memory addresses still in CPU A's cache (even that memory contains objects that were destroyed according to C++), it must forward the junk data over the bus. Better to let CPU A and B each reuse their own respective cached addresses.
C++11 states, in 20.6.9.1 allocator members:
4 - [ Note: In a container member function, the address of an adjacent element is often a good choice to pass for the hint argument. — end note ]
[...]
6 - [...] The use of hint is unspecified, but intended as an aid
to locality if an implementation so desires.
Allocating new elements adjacent or close to existing elements in memory can aid performance by improving locality; because they are usually cached together, nearby elements will tend to travel together up the memory hierarchy and will not evict each other.

Why is it not possible to access the size of a new[]'d array?

When you allocate an array using new [], why can't you find out the size of that array from the pointer? It must be known at run time, otherwise delete [] wouldn't know how much memory to free.
Unless I'm missing something?
In a typical implementation the size of dynamic memory block is somehow stored in the block itself - this is true. But there's no standard way to access this information. (Implementations may provide implementation-specific ways to access it). This is how it is with malloc/free, this is how it is with new[]/delete[].
In fact, in a typical implementation raw memory allocations for new[]/delete[] calls are eventually processed by some implementation-specific malloc/free-like pair, which means that delete[] doesn't really have to care about how much memory to deallocate: it simply calls that internal free (or whatever it is named), which takes care of that.
What delete[] does need to know though is how many elements to destruct in situations when array element type has non-trivial destructor. And this is what your question is about - the number of array elements, not the size of the block (these two are not the same, the block could be larger than really required for the array itself). For this reason, the number of elements in the array is normally also stored inside the block by new[] and later retrieved by delete[] to perform the proper array element destruction. There are no standard ways to access this number either.
(This means that in general case, a typical memory block allocated by new[] will independently, simultaneously store both the physical block size in bytes and the array element count. These values are stored by different levels of C++ memory allocation mechanism - raw memory allocator and new[] itself respectively - and don't interact with each other in any way).
However, note that for the above reasons the array element count is normally only stored when the array element type has non-trivial destructor. I.e. this count is not always present. This is one of the reasons why providing a standard way to access that data is not feasible: you'd either have to store it always (which wastes memory) or restrict its availability by destructor type (which is confusing).
To illustrate the above, when you create an array of ints
int *array = new int[100];
the size of the array (i.e. 100) is not normally stored by new[] since delete[] does not care about it (int has no destructor). The physical size of the block in bytes (like, 400 bytes or more) is normally stored in the block by the raw memory allocator (and used by raw memory deallocator invoked by delete[]), but it can easily turn out to be 420 for some implementation-specific reason. So, this size is basically useless for you, since you won't be able to derive the exact original array size from it.
You most likely can access it, but it would require intimate knowledge of your allocator and would not be portable. The C++ standard doesn't specify how implementations store this data, so there's no consistent method for obtaining it. I believe it's left unspecified because different allocators may wish to store it in different ways for efficiency purposes.
It makes sense, as for example the size of the allocated block may not necessarily be the same size as the array. While it is true that new[] may store the number of elements (calling each elements destructor), it doesn't have to as it wouldn't be required for a empty destructor. There is also no standard way (C++ FAQ Lite 1, C++ FAQ Lite 2) of implementing where new[] stores the array length as each method has its pros and cons.
In other words, it allows allocations to as fast an cheap as possible by not specifying anything about the implementation. (If the implementation has to store the size of the array as well as the size of the allocated block every time, it wastes memory that you may not need).
Simply put, the C++ standard does not require support for this. It is possible that if you know enough about the internals of your compiler, you can figure out how to access this information, but that would generally be considered bad practice. Note that there may be a difference in memory layout for heap-allocated arrays and stack-allocated arrays.
Remember that essentially what you are talking about here are C-style arrays, too -- even though new and delete are C++ operators -- and the behavior is inherited from C. If you want a C++ "array" that is sized, you should be using the STL (e.g. std::vector, std::deque).