Why does shrink_to_fit (if the request is fulfilled) cause reallocation? - c++

Given a container v with v.size() == 3 and v.capacity() == 5, my understanding is that a call to v.shrink_to_fit() can be fulfilled and, if it is, it causes v.capacity() to become 3.
However, this comes at the cost of a reallocation.
Why? Isn't it possible to free the unused memory without reallocating a chunk of memory for what remains?
Probably this question has its roots into how more primitive commands like new/delete and malloc/free work.

The underlying memory management system defines what is possible, and typically, they don't allow to return parts of the allocated memory: if you got n bytes, you either return n bytes, or nothing.
Returning the last m bytes (with m < n), or worse, returning m bytes in the middle of the n bytes, would of course be possible to offer, but consider the extra complexity needed to handle this correctly.
Of course, there could be some out there that do offer it, but your C++ compiler and the language definition doesn't necessarily know which ones run below it in the OS, so they have to accept the possibility that a reallocation is needed. Note that they don't guarantee that it will be needed - they just expect it.

The container does not allocate/deallocate the memory on itself, but it is it's allocator who does it.
For the (vector's) allocator to be able to deallocate the memory, it needs to be provided the exact the same pointer as the pointer to the memory it has allocated for the vector's data.
At this is the begin of the vector data and not the begin of the "not more used" data.
Basically, we are talking about this allocator's allocation/deallocation methods:
pointer allocate( size_type n, const void * hint = 0 );
void deallocate( T* p, std::size_t n );
The argument T* p of the deallocate will be the same as the pointer returned from allocate ( == begin of vector's data). This is what the vector's implementation will pass to the deallocate.
It is surely imaginable to have a custom vector implementation, who would be able to pass any pointer in range [data, data+size] to allocators deallocate method. One could construct such an allocator to be able to deal with it. But then all other allocators would need to conform to this API, also the standard allocator.
Then something like this would need to be able to "work":
int* p = new int[100];
delete [] (p + 50); // imagine making this work
This would add extra complexity, performance and other issues.

In fact, the standards just allows the allocator implementation to re-allocate when shrinking if it wants to. I do not know whether it is common in C++ library implementations, but I can remember some databases allocators that used different pools of disk segments, some for smaller blocks, and some for larger blocks. The rationale was that there were many small blocks and few large ones, so that segregation helped in reducing the fragmentation for the larger blocks pools. If this happens in a program, it could make sense to define a custom allocator that implements that rule.
The fact that the standard says that shrink_to_fit may reallocate, just allows the use of such a custom allocator.

Related

What's happening when I do new T[1]?

Quick question here:
template <class T>
T* allocate(std::size_t n){
return new T[n];
}
So in the above code, when n==1, we're doing new T[1], so I have two worries:
1.I heard that when allocating array, extra memory are used to store the length of the array (unsure though), so would this be malicious when having new T[1] a lot, wasting a lot of memory?
2.Should I free this using delete[] or simply delete?
Yes, in a typical implementation when you use new T[n] some extra memory will be required to store the exact length of the array, but only for types with non-trivial destructors.
E.g. in a typical implementation new int[1] carries no memory overhead compared to new int, but new std::string[1] will carry memory overhead compared to new std::string.
The extra memory is just an extra size_t field, meaning that percentage-wise it depends on the size of the object you allocated. If sizeof(T) is comparable to sizeof(size_t) the overhead might be considerable.
But it all might also depend on the additional details of the implementation-specific memory allocation mechanism.
In other words, if this a part of application-specific code it makes sense to try it and see whether it has any negative impact on your program's memory consumption. Maybe it is not a problem at all. But if you are writing a generic library, then things like that are worth paying attention to.
Yes, you should use delete [].
RAM is cheap. It is true that if all your program does, all the time, is allocate arrays of 1 object, over and over again, then this would be slightly wasteful, true. But an occasional allocation of a 1-element array is not going to be the end of the world.
Use delete[]. If you used new[], then you have to use delete[]. Doesn't matter how much was newed. This is a fairly cut-and-try rule. No exceptions.
No memory is used to store the length of the array, or at least no memory you are allowed to access. The memory manager that granted you the memory likely has some hidden bookkeeping. Should you find this bookkeeping, leave it alone. Anything you do with it is certainly non-portable. It is also possible you will be provided memory in fixed size blocks. You will be given a block that is at least the size of T. Any extra memory in this block is effectively wasted.
Anything allocated with new[] must be released with delete[] or you you invoke undefined behaviour. Once again, do not mess with the low-level memory manager.

`std::string` allocations are my current bottleneck - how can I optimize with a custom allocator?

I'm writing a C++14 JSON library as an exercise and to use it in my personal projects.
By using callgrind I've discovered that the current bottleneck during a continuous value creation from string stress test is an std::string dynamic memory allocation. Precisely, the bottleneck is the call to malloc(...) made from std::string::reserve.
I've read that many existing JSON libraries such as rapidjson use custom allocators to avoid malloc(...) calls during string memory allocations.
I tried to analyze rapidjson's source code but the large amount of additional code and comments, plus the fact that I'm not really sure what I'm looking for, didn't help me much.
How do custom allocators help in this situation?
Is a memory buffer preallocated somewhere (where? statically?) and std::strings take available memory from it?
Are strings using custom allocators "compatible" with normal strings?
They have different types. Do they have to be "converted"? (And does that result in a performance hit?)
Code notes:
Str is an alias for std::string.
By default, std::string allocates memory as needed from the same heap as anything that you allocate with malloc or new. To get a performance gain from providing your own custom allocator, you will need to be managing your own "chunk" of memory in such a way that your allocator can deal out the amounts of memory that your strings ask for faster than malloc does. Your memory manager will make relatively few calls to malloc, (or new, depending on your approach) under the hood, requesting "large" amounts of memory at once, then deal out sections of this (these) memory block(s) through the custom allocator. To actually achieve better performance than malloc, your memory manager will usually have to be tuned based on known allocation patterns of your use cases.
This kind of thing often comes down to the age-old trade off of memory use versus execution speed. For example: if you have a known upper bound on your string sizes in practice, you can pull tricks with over-allocating to always accommodate the largest case. While this is wasteful of your memory resources, it can alleviate the performance overhead that more generalized allocation runs into with memory fragmentation. As well as making any calls to realloc essentially constant time for your purposes.
#sehe is exactly right. There are many ways.
EDIT:
To finally address your second question, strings using different allocators can play nicely together, and usage should be transparent.
For example:
class myalloc : public std::allocator<char>{};
myalloc customAllocator;
int main(void)
{
std::string mystring(customAllocator);
std::string regularString = "test string";
mystring = regularString;
std::cout << mystring;
return 0;
}
This is a fairly silly example and, of course, uses the same workhorse code under the hood. However, it shows assignment between strings using allocator classes of "different types". Implementing a useful allocator that supplies the full interface required by the STL without just disguising the default std::allocator is not as trivial. This seems to be a decent write up covering the concepts involved. The key to why this works, in the context of your question at least, is that using different allocators doesn't cause the strings to be of different type. Notice that the custom allocator is given as an argument to the constructor not a template parameter. The STL still does fun things with templates (such as rebind and Traits) to homogenize allocator interfaces and tracking.
What often helps is the creation of a GlobalStringTable.
See if you can find portions of the old NiMain library from the now defunct NetImmerse software stack. It contains an example implementation.
Lifetime
What is important to note is that this string table needs to be accessible between different DLL spaces, and that it is not a static object. R. Martinho Fernandes already warned that the object needs to be created when the application or DLL thread is created / attached, and disposed when the thread is destroyed or the dll is detached, and preferrably before any string object is actually used. This sounds easier than it actually is.
Memory allocation
Once you have a single point of access that exports correctly, you can have it allocate a memory buffer up-front. If the memory is not enough, you have to resize it and move the existing strings over. Strings essentially become handles to regions of memory in this buffer.
Placement new
Something that often works well is called the placement new() operator, where you can actually specify where in memory your new string object needs to be allocated. However, instead of allocating, the operator can simply grab the memory location that is passed in as an argument, zero the memory at that location, and return it. You can also keep track of the allocation, the actual size of the string etc.. in the Globalstringtable object.
SOA
Handling the actual memory scheduling is something that is up to you, but there are many possible ways to approach this. Often, the allocated space is partitioned in several regions so that you have several blocks per possible string size. A block for strings <= 4 bytes, one for <= 8 bytes, and so on. This is called a Small Object Allocator, and can be implemented for any type and buffer.
If you expect many string operations where small strings are incremented repeatedly, you may change your strategy and allocate larger buffers from the start, so that the number of memmove operations are reduced. Or you can opt for a different approach and use string streams for those.
String operations
It is not a bad idea to derive from std::basic_str, so that most of the operations still work but the internal storage is actually in the GlobalStringTable, so that you can keep using the same stl conventions. This way, you also make sure that all the allocations are within a single DLL, so that there can be no heap corruption by linking different kinds of strings between different libraries, since all the allocation operations are essentially in your DLL (and are rerouted to the GlobalStringTable object)
Custom allocators can help because most malloc()/new implementations are designed for maximum flexibility, thread-safety and bullet-proof workings. For instance, they must gracefully handle the case that one thread keeps allocating memory, sending the pointers to another thread that deallocates them. Things like these are difficult to handle in a performant way and drive the cost of malloc() calls.
However, if you know that some things cannot happen in your application (like one thread deallocating stuff another thread allocated, etc.), you can optimize your allocator further than the standard implementation. This can yield significant results, especially when you don't need thread safety.
Also, the standard implementation is not necessarily well optimized: Implementing void* operator new(size_t size) and void operator delete(void* pointer) by simply calling through to malloc() and free() gives an average performance gain of 100 CPU cycles on my machine, which proves that the default implementation is suboptimal.
I think you'd be best served by reading up on the EASTL
It has a section on allocators and you might find fixed_string useful.
The best way to avoid a memory allocation is don't do it!
BUT if I remember JSON correctly all the readStr values either gets used as keys or as identifiers so you will have to allocate them eventually, std::strings move semantics should insure that the allocated array are not copied around but reused until its final use. The default NRVO/RVO/Move should reduce any copying of the data if not of the string header itself.
Method 1:
Pass result as a ref from the caller which has reserved SomeResonableLargeValue chars, then clear it at the start of readStr. This is only usable if the caller actually can reuse the string.
Method 2:
Use the stack.
// Reserve memory for the string (BOTTLENECK)
if (end - idx < SomeReasonableValue) { // 32?
char result[SomeReasonableValue] = {0}; // feel free to use std::array if you want bounds checking, but the preceding "if" should insure its not a problem.
int ridx = 0;
for(; idx < end; ++idx) {
// Not an escape sequence
if(!isC('\\')) { result[ridx++] = getC(); continue; }
// Escape sequence: skip '\'
++idx;
// Convert escape sequence
result[ridx++] = getEscapeSequence(getC());
}
// Skip closing '"'
++idx;
result[ridx] = 0; // 0-terminated.
// optional assert here to insure nothing went wrong.
return result; // the bottleneck might now move here as the data is copied to the receiving string.
}
// fallback code only if the string is long.
// Your original code here
Method 3:
If your string by default can allocate some size to fill its 32/64 byte boundary, you might want to try to use that, construct result like this instead in case the constructor can optimize it.
Str result(end - idx, 0);
Method 4:
Most systems already has some optimized allocator that like specific block sizes, 16,32,64 etc.
siz = ((end - idx)&~0xf)+16; // if the allocator has chunks of 16 bytes already.
Str result(siz);
Method 5:
Use either the allocator made by google or facebooks as global new/delete replacement.
To understand how a custom allocator can help you, you need to understand what malloc and the heap does and why it is quite slow in comparison to the stack.
The Stack
The stack is a large block of memory allocated for your current scope. You can think of it as this
([] means a byte of memory)
[P][][][][][][][][][][][][][][][]
(P is a pointer that points to a specific byte of memory, in this case its pointing at the first byte)
So the stack is a block with only 1 pointer. When you allocate memory, what it does is it performs a pointer arithmetic on P, which takes constant time.
So declaring int i = 0; would mean this,
P + sizeof(int).
[i][i][i][i][P][][][][][][][][][][][],
(i in [] is a block of memory occupied by an integer)
This is blazing fast and as soon as you go out of scope, the entire chunk of memory is emptied simply by moving P back to the first position.
The Heap
The heap allocates memory from a reserved pool of bytes reserved by the c++ compiler at runtime, when you call malloc, the heap finds a length of contiguous memory that fits your malloc requirements, marks it as used so nothing else can use it, and returns that to you as a void*.
So, a theoretical heap with little optimization calling new(sizeof(int)), would do this.
Heap chunk
At first : [][][][][][][][][][][][][][][][][][][][][][][][][]
Allocate 4 bytes (sizeof(int)):
A pointer goes though every byte of memory, finds one that is of correct length, and returns to you a pointer.
After : [i][i][i][i][][][]][][][][][][][][][]][][][][][][][]
This is not an accurate representation of the heap, but from this you can already see numerous reasons for being slow relative to the stack.
The heap is required to keep track of all already allocated memory and their respective lengths. In our test case above, the heap was already empty and did not require much, but in worst case scenarios, the heap will be populated with multiple objects with gaps in between (heap fragmentation), and this will be much slower.
The heap is required to cycle though all the bytes to find one that fits your length.
The heap can suffer from fragmentation since it will never completely clean itself unless you specify it. So if you allocated an int, a char, and another int, your heap would look like this
[i][i][i][i][c][i2][i2][i2][i2]
(i stands for bytes occupied by int and c stands for bytes occupied by a char. When you de-allocate the char, it will look like this.
[i][i][i][i][empty][i2][i2][i2][i2]
So when you want to allocate another object into the heap,
[i][i][i][i][empty][i2][i2][i2][i2][i3][i3][i3][i3]
unless an object is the size of 1 char, the overall heap size for that allocation is reduced by 1 byte. In more complex programs with millions of allocations and deallocations, the fragmentation issue becomes severe and the program will become unstable.
Worry about cases like thread safety (Someone else said this already).
Custom Heap/Allocator
So, a custom allocator usually needs to address these problems while providing the benefits of the heap, such as personalized memory management and object permanence.
These are usually accomplished with specialized allocators. If you know you dont need to worry about thread safety or you know exactly how long your string will be or a predictable usage pattern you can make your allocator fast than malloc and new by quite a lot.
For example, if your program requires a lot of allocations as fast as possible without lots of deallocations, you could implement a stack allocator, in which you allocate a huge chunk of memory with malloc at startup,
e.g
typedef char* buffer;
//Super simple example that probably doesnt work.
struct StackAllocator:public Allocator{
buffer stack;
char* pointer;
StackAllocator(int expectedSize){ stack = new char[expectedSize];pointer = stack;}
allocate(int size){ char* returnedPointer = pointer; pointer += size; return returnedPointer}
empty() {pointer = stack;}
};
Get expected size, get a chunk of memory from the heap.
Assign a pointer to the beginning.
[P][][][][][][][][][] ..... [].
then have one pointer that moves for each allocation. When you no longer need the memory, you simply move the pointer to the beginning of your buffer. This gives your the advantage of O(1) speed allocations and deallocations as well as object permanence for the lack of flexible deallocation and large initial memory requirements.
For strings, you could try a chunk allocator. For every allocation, the allocator gives a set chunk of memory.
Compatibility
Compatibility with other strings is almost guaranteed. As long as you are allocating a contiguous chunk of memory and preventing anything else from using that block of memory, it will work.

Does the standard guarantee that the total memory occupied by a std::vector scales as C+N*sizeof(T)?

The C++ standard provides the guarantee that the content of a std::vector is stored contiguously. But does it states that the total occupied memory is:
S = C+N*sizeof(T)
where:
S is the total size on the stack AND on the heap
C is the total size on the stack: C = sizeof(std::vector)
N is the capacity of the vector
T is the type stored
In other words, do I have the guarantee that there is no overhead per element ?
And if I have no such guarantee is there any reason ?
EDIT: to be clear, if I take the example of a std::list, it generally stores 2 extra pointers per element. So my question is: would a such implementation of a std::vector be standard-compliant ?
For there to be any such guarantee, the standard would have to pass the requirement on to the interface of the allocator. It doesn't, so there isn't.
In practice though, as a quality of implementation issue, you expect that memory allocators probably have a constant overhead per allocation but no overhead proportional to the size of the allocation. A counter-example to this would be a memory allocator that always uses a power-of-two-sized block regardless of the size requested. This would be pretty wasteful for large allocations, but not forbidden either as a user-defined allocator or even as the system allocator used by ::operator new[]. It would create an overhead proportional to N on average, assuming that the vector capacities don't happen to fit nicely.
Leaving aside the allocator, I don't believe there's anything in the standard to say that the vector can't allocate (for example) an extra byte per element and use it to store some flags for who-knows-what purpose. As others have remarked, the contiguousness requirement means that those extra bytes cannot lie between the vector elements. They would have to be in a separate allocation or all together at one end of the allocation.
There's at least one good reason that the standard doesn't forbid implementations from "wasting" space by using it to store data used for operations not required by the standard -- doing so would rule out many debugging techniques!
Do I have the guarantee that there is no overhead per element?
Does the standard prohibit it? No.
But would you ever expect to see this in practice? No.
The rule of contiguous data storage and the complexity requirements of vector growth mean that the only possible way for a non-constant-sized data block to be part of the vector would be if it were emplaced directly before the dynamically-allocated element data, or somewhere else entirely. There is no guarantee that this doesn't happen, but, quite simply, no implementation does it because it would be entirely ridiculous and serve no purpose whatsoever.
Does it states that the total occupied memory is:
S = C+N*sizeof(T)
There may be other data members of the vector itself (what you've inaccurately deemed to be "on the stack"), increasing the object's size in constant terms.
The standard gives no guarantee, afaics. But the requirement that the elements be stored contiguously makes it likely that there is no per element overhead. The whole data must be in a memory area which was allocated in one piece. #aschepler remarked correctly though that typical free store implementations have a (constant) overhead per allocation unit, typically a size variable or an end pointer.
Additionally there may be some padding overhead, e.g. an allocation unit will probably span multiples of the natural word size on a machine. And then the OS call will likely reserve a whole memory page to the program, even if you allocate only 1 byte. Whether you consider that as overhead or not is a matter of taste (from the outside yes, from the inside of the program no; and of course subsequent vectors or resize()s dine from the same page).
So at least it's CM + CV + N*sizeof(T), CM and CV being the overhead in the vector (not necessarily on the stack, as Lighness said) and CM the overhead of the memory management.
No, the implementation characteristics you suggest would not be standard compliant. The STL specifies that a std::vector support appending individual elements in amortized constant time.
In order for the amortized cost of inserting an element to be O(1), the size of the array must increase in at least a geometric progression when it is reallocated (see here). A geometric progression means that if the size of the array was N, the new size after reallocation must be K * N, for some K > 1. The choice of K is implementation dependent.
To find out how much space a std::vector has allocated, call std::vector::capacity(). With regard to overhead per element, in the best case the capacity() == size(). In the worst case capacity() == K * (size() - 1).
If you must ensure that your vector is absolutely no larger than it has to be, you can call std::vector::reserve() if you know exactly how large your std::vector will be. You may also call std::vector::resize() (or std::vector::shrink_to_fit() in C++11) after you are done adding elements to reduce the amount of memory reserved.

Efficient Array Reallocation in C++

How would I efficiently resize an array allocated using some standards-conforming C++ allocator? I know that no facilities for reallocation are provided in the C++ alloctor interface, but did the C++11 revision enable us to work with them more easily? Suppose that I have a class vec with a copy-assignment operator foo& operator=(const foo& x) defined. If x.size() > this->size(), I'm forced to
Call allocator.destroy() on all elements in the internal storage of foo.
Call allocator.deallocate() on the internal storage of foo.
Reallocate a new buffer with enough room for x.size() elements.
Use std::uninitialized_copy to populate the storage.
Is there some way that I more easily reallocate the internal storage of foo without having to go through all of this? I could provide an actual code sample if you think that it would be useful, but I feel that it would be unnecessary here.
Based on a previous question, the approach that I took for handling large arrays that could grow and shrink with reasonable efficiency was to write a container similar to a deque that broke the array down into multiple pages of smaller arrays. So for example, say we have an array of n elements, we select a page size p, and create 1 + n/p arrays (pages) of p elements. When we want to re-allocate and grow, we simply leave the existing pages where they are, and allocate the new pages. When we want to shrink, we free the totally empty pages.
The downside is the array access is slightly slower, in that given and index i, you need the page = i / p, and the offset into the page i % p, to get the element. I find this is still very fast however and provides a good solution. Theoretically, std::deque should do something very similar, but for the cases I tried with large arrays it was very slow. See comments and notes on the linked question for more details.
There is also a memory inefficiency in that given n elements, we are always holding p - n % p elements in reserve. i.e. we only ever allocate or deallocate complete pages. This was the best solution I could come up with in the context of large arrays with the requirement for re-sizing and fast access, while I don't doubt there are better solutions I'd love to see them.
A similar problem also arises if x.size() > this->size() in foo& operator=(foo&& x).
No, it doesn't. You just swap.
There is no function that will resize in place or return 0 on failure (to resize). I don't know of any operating system that supports that kind of functionality beyond telling you how big a particular allocation actually is.
All operating systems do however have support for implementing realloc, however, that does a copy if it cannot resize in place.
So, you can't have it because the C++ language would not be implementable on most current operating systems if you had to add a standard function to do it.
There are the C++11 rvalue reference and move constructors.
There's a great video talk on them.
Even if re-allocate exists, actually, you can only avoid #2 you mentioned in your question in a copy constructor. However in the case of internal buffer growing, re-allocate can save these four operations.
Is internal buffer of your array continuous? if so see the answer of your link
if not, Hashed array tree or array list may be your choice to avoid re-allocate.
Interestingly, the default allocator for g++ is smart enough to use the same address for consecutive deallocations and allocations of larger sizes, as long as there is enough unused space after the end of the initially-allocated buffer. While I haven't tested what I'm about to claim, I doubt that there is much of a time difference between malloc/realloc and allocate/deallocate/allocate.
This leads to a potentially very dangerous, nonstandard shortcut that may work if you know that there is enough room after the current buffer so that a reallocation would not result in a new address. (1) Deallocate the current buffer without calling alloc.destroy() (2) Allocate a new, larger buffer and check the returned address (3) If the new address equals the old address, proceed happily; otherwise, you lost your data (4) Call allocator.construct() for elements in the newly-allocated space.
I wouldn't advocate using this for anything other than satisfying your own curiosity, but it does work on g++ 4.6.

Why is it not possible to access the size of a new[]'d array?

When you allocate an array using new [], why can't you find out the size of that array from the pointer? It must be known at run time, otherwise delete [] wouldn't know how much memory to free.
Unless I'm missing something?
In a typical implementation the size of dynamic memory block is somehow stored in the block itself - this is true. But there's no standard way to access this information. (Implementations may provide implementation-specific ways to access it). This is how it is with malloc/free, this is how it is with new[]/delete[].
In fact, in a typical implementation raw memory allocations for new[]/delete[] calls are eventually processed by some implementation-specific malloc/free-like pair, which means that delete[] doesn't really have to care about how much memory to deallocate: it simply calls that internal free (or whatever it is named), which takes care of that.
What delete[] does need to know though is how many elements to destruct in situations when array element type has non-trivial destructor. And this is what your question is about - the number of array elements, not the size of the block (these two are not the same, the block could be larger than really required for the array itself). For this reason, the number of elements in the array is normally also stored inside the block by new[] and later retrieved by delete[] to perform the proper array element destruction. There are no standard ways to access this number either.
(This means that in general case, a typical memory block allocated by new[] will independently, simultaneously store both the physical block size in bytes and the array element count. These values are stored by different levels of C++ memory allocation mechanism - raw memory allocator and new[] itself respectively - and don't interact with each other in any way).
However, note that for the above reasons the array element count is normally only stored when the array element type has non-trivial destructor. I.e. this count is not always present. This is one of the reasons why providing a standard way to access that data is not feasible: you'd either have to store it always (which wastes memory) or restrict its availability by destructor type (which is confusing).
To illustrate the above, when you create an array of ints
int *array = new int[100];
the size of the array (i.e. 100) is not normally stored by new[] since delete[] does not care about it (int has no destructor). The physical size of the block in bytes (like, 400 bytes or more) is normally stored in the block by the raw memory allocator (and used by raw memory deallocator invoked by delete[]), but it can easily turn out to be 420 for some implementation-specific reason. So, this size is basically useless for you, since you won't be able to derive the exact original array size from it.
You most likely can access it, but it would require intimate knowledge of your allocator and would not be portable. The C++ standard doesn't specify how implementations store this data, so there's no consistent method for obtaining it. I believe it's left unspecified because different allocators may wish to store it in different ways for efficiency purposes.
It makes sense, as for example the size of the allocated block may not necessarily be the same size as the array. While it is true that new[] may store the number of elements (calling each elements destructor), it doesn't have to as it wouldn't be required for a empty destructor. There is also no standard way (C++ FAQ Lite 1, C++ FAQ Lite 2) of implementing where new[] stores the array length as each method has its pros and cons.
In other words, it allows allocations to as fast an cheap as possible by not specifying anything about the implementation. (If the implementation has to store the size of the array as well as the size of the allocated block every time, it wastes memory that you may not need).
Simply put, the C++ standard does not require support for this. It is possible that if you know enough about the internals of your compiler, you can figure out how to access this information, but that would generally be considered bad practice. Note that there may be a difference in memory layout for heap-allocated arrays and stack-allocated arrays.
Remember that essentially what you are talking about here are C-style arrays, too -- even though new and delete are C++ operators -- and the behavior is inherited from C. If you want a C++ "array" that is sized, you should be using the STL (e.g. std::vector, std::deque).