size of dynamically allocated array - c++

Is it true that a pointer assigned to the starting address of a dynamically allocated array does not have the information of the size of the array? So we have to use another variable to store its size for later processing the array through the pointer.
But when we free the dynamically allocated array, we don't specify the size, instead we just "free ptr" or "delete [] ptr". How could free or delete know the size of the array? Can we use the same scheme to avoid storing the size of the array in another variable?
Thanks!

Yes, this is true.
delete knows the size of the memory chunk because new adds extra information to the chunk (usually before the area returned to the user), containing its size, along with other information. Note that this is all very much implementation specific and shouldn't be used by your code.
So to answer your last question: No - we can't use it - it's an implementation detail that's highly platform and compiler dependent.
For example, in the sample memory allocator demonstrated in K&R2, this is the "header" placed before each allocated chunk:
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
size is the size of the allocated block (that's then used by free, or delete).

The funny thing is that historically it was delete [20] arr; just as it is arr = new int[20]. However practice proved that the information on size can be painlessly stored by the allocator, and since most people using it then stored it anyway, it was added to the standard.
What is more funny, and little known, is the fact that this "extended delete syntax" is in fact supported by a few C++ compilers (despite being incorrect even in face of the C++98 standard), although none require it.
int* arr = new int[20];
delete [20] arr;
The sad part about this all however is, that there's no standard-conforming way to retrieve that passed size for your own use :-/

It is true that the array does not contain the size of the array, you have to store that information for later. When deleting an array through delete or free it is the pointer to the allocated memory you pass. The memory manager used (either by the system or your own custom from overriding new and delete) knows the memory area that is freed, and keeps track of it. Hope it makes sense.

Yes, it's true. This is part of why you should rarely try to deal with this directly, and use a standard container instead. About the only time it makes sense to deal with it is if you decide to implement a container yourself (in which case you'll normally track the size information in your container's implementation).

Related

How does the compiler/program deduces the size of memory to be deleted(released) in case of delete[] arr; [duplicate]

Foo* set = new Foo[100];
// ...
delete [] set;
You don't pass the array's boundaries to delete[]. But where is that information stored? Is it standardised?
When you allocate memory on the heap, your allocator will keep track of how much memory you have allocated. This is usually stored in a "head" segment just before the memory that you get allocated. That way when it's time to free the memory, the de-allocator knows exactly how much memory to free.
ONE OF THE approaches for compilers is to allocate a little more memory and to store a count of elements in a head element.
Example how it could be done:
Here
int* i = new int[4];
compiler will allocate sizeof(int)*5 bytes.
int *temp = malloc(sizeof(int)*5)
Will store "4" in the first sizeof(int) bytes
*temp = 4;
and set i
i = temp + 1;
So i will points to an array of 4 elements, not 5.
And deletion
delete[] i;
will be processed in the following way:
int *temp = i - 1;
int numbers_of_element = *temp; // = 4
... call destructor for numbers_of_element elements
... that are stored in temp + 1, temp + 2, ... temp + 4 if needed
free (temp)
The information is not standardised. However in the platforms that I have worked on this information is stored in memory just before the first element. Therefore you could theoretically access it and inspect it, however it's not worth it.
Also this is why you must use delete [] when you allocated memory with new [], as the array version of delete knows that (and where) it needs to look to free the right amount of memory - and call the appropriate number of destructors for the objects.
It's defined in the C++ standard to be compiler specific. Which means compiler magic. It can break with non-trivial alignment restrictions on at least one major platform.
You can think about possible implementations by realizing that delete[] is only defined for pointers returned by new[], which may not be the same pointer as returned by operator new[]. One implementation in the wild is to store the array count in the first int returned by operator new[], and have new[] return a pointer offset past that. (This is why non-trivial alignments can break new[].)
Keep in mind that operator new[]/operator delete[]!=new[]/delete[].
Plus, this is orthogonal to how C knows the size of memory allocated by malloc.
Basically its arranged in memory as:
[info][mem you asked for...]
Where info is the structure used by your compiler to store the amount of memory allocated, and what not.
This is implementation dependent though.
This isn't something that's in the spec -- it's implementation dependent.
Because the array to be 'deleted' should have been created with a single use of the 'new' operator. The 'new' operation should have put that information on the heap. Otherwise, how would additional uses of new know where the heap ends?
This is a more interesting problem than you might think at first. This reply is about one possible implementation.
Firstly, while at some level your system has to know how to 'free' the memory block, the underlying malloc/free (which new/delete/new[]/delete[] generally call) don't always remember exactly how much memory you ask for, it can get rounded up (for example, once you are above 4K it is often rounded up to the next 4K-sized block).
Therefore, even if could get the size of the memory block, that doesn't tell us how many values are in the new[]ed memory, as it can be smaller. Therefore, we do have to store an extra integer telling us how many values there are.
EXCEPT, if the type being constructed doesn't have a destructor, then delete[] doesn't have to do anything except free the memory block, and therefore doesn't have to store anything!
It is not standardized. In Microsoft's runtime the new operator uses malloc() and the delete operator uses free(). So, in this setting your question is equivalent to the following: How does free() know the size of the block?
There is some bookkeeping going on behind the scenes, i.e. in the C runtime.

C++ doesn't tell you the size of a dynamic array. But why?

I know that there is no way in C++ to obtain the size of a dynamically created array, such as:
int* a;
a = new int[n];
What I would like to know is: Why? Did people just forget this in the specification of C++, or is there a technical reason for this?
Isn't the information stored somewhere? After all, the command
delete[] a;
seems to know how much memory it has to release, so it seems to me that delete[] has some way of knowing the size of a.
It's a follow on from the fundamental rule of "don't pay for what you don't need". In your example delete[] a; doesn't need to know the size of the array, because int doesn't have a destructor. If you had written:
std::string* a;
a = new std::string[n];
...
delete [] a;
Then the delete has to call destructors (and needs to know how many to call) - in which case the new has to save that count. However, given it doesn't need to be saved on all occasions, Bjarne decided not to give access to it.
(In hindsight, I think this was a mistake ...)
Even with int of course, something has to know about the size of the allocated memory, but:
Many allocators round up the size to some convenient multiple (say 64 bytes) for alignment and convenience reasons. The allocator knows that a block is 64 bytes long - but it doesn't know whether that is because n was 1 ... or 16.
The C++ run-time library may not have access to the size of the allocated block. If for example, new and delete are using malloc and free under the hood, then the C++ library has no way to know the size of a block returned by malloc. (Usually of course, new and malloc are both part of the same library - but not always.)
One fundamental reason is that there is no difference between a pointer to the first element of a dynamically allocated array of T and a pointer to any other T.
Consider a fictitious function that returns the number of elements a pointer points to.
Let's call it "size".
Sounds really nice, right?
If it weren't for the fact that all pointers are created equal:
char* p = new char[10];
size_t ps = size(p+1); // What?
char a[10] = {0};
size_t as = size(a); // Hmm...
size_t bs = size(a + 1); // Wut?
char i = 0;
size_t is = size(&i); // OK?
You could argue that the first should be 9, the second 10, the third 9, and the last 1, but to accomplish this you need to add a "size tag" on every single object.
A char will require 128 bits of storage (because of alignment) on a 64-bit machine. This is sixteen times more than what is necessary.
(Above, the ten-character array a would require at least 168 bytes.)
This may be convenient, but it's also unacceptably expensive.
You could of course envision a version that is only well-defined if the argument really is a pointer to the first element of a dynamic allocation by the default operator new, but this isn't nearly as useful as one might think.
You are right that some part of the system will have to know something about the size. But getting that information is probably not covered by the API of memory management system (think malloc/free), and the exact size that you requested may not be known, because it may have been rounded up.
You will often find that memory managers will only allocate space in a certain multiple, 64 bytes for example.
So, you may ask for new int[4], i.e. 16 bytes, but the memory manager will allocate 64 bytes for your request. To free this memory it doesn't need to know how much memory you asked for, only that it has allocated you one block of 64 bytes.
The next question may be, can it not store the requested size? This is an added overhead which not everybody is prepared to pay for. An Arduino Uno for example only has 2k of RAM, and in that context 4 bytes for each allocation suddenly becomes significant.
If you need that functionality then you have std::vector (or equivalent), or you have higher-level languages. C/C++ was designed to enable you to work with as little overhead as you choose to make use of, this being one example.
There is a curious case of overloading the operator delete that I found in the form of:
void operator delete[](void *p, size_t size);
The parameter size seems to default to the size (in bytes) of the block of memory to which void *p points. If this is true, it is reasonable to at least hope that it has a value passed by the invocation of operator new and, therefore, would merely need to be divided by sizeof(type) to deliver the number of elements stored in the array.
As for the "why" part of your question, Martin's rule of "don't pay for what you don't need" seems the most logical.
There's no way to know how you are going to use that array.
The allocation size does not necessarily match the element number so you cannot just use the allocation size (even if it was available).
This is a deep flaw in other languages not in C++.
You achieve the functionality you desire with std::vector yet still retain raw access to arrays. Retaining that raw access is critical for any code that actually has to do some work.
Many times you will perform operations on subsets of the array and when you have extra book-keeping built into the language you have to reallocate the sub-arrays and copy the data out to manipulate them with an API that expects a managed array.
Just consider the trite case of sorting the data elements.
If you have managed arrays then you can't use recursion without copying data to create new sub-arrays to pass recursively.
Another example is an FFT which recursively manipulates the data starting with 2x2 "butterflies" and works its way back to the whole array.
To fix the managed array you now need "something else" to patch over this defect and that "something else" is called 'iterators'. (You now have managed arrays but almost never pass them to any functions because you need iterators +90% of the time.)
The size of an array allocated with new[] is not visibly stored anywhere, so you can't access it. And new[] operator doesn't return an array, just a pointer to the array's first element. If you want to know the size of a dynamic array, you must store it manually or use classes from libraries such as std::vector

Getting the size of a struct with dynamic variables?

I have a struct with a dynamic array:
struct test{int* arr;};
After allocating space for the arr array(arr=new int[100]), using sizeof returns 4 bytes, which is the size of the struct without the array elements. Is there another built-in function like sizeof that can return the size while keeping dynamically allocated space into account? Or do I have to do this myself?
+
I need this because I want to make it easier to save/load the contents of the struct to/from a file.
There's no way to get the memory usage due to an object and the other objects it points to, because it's not a well defined concept.
Two objects might point to the same arr block. Are they both responsible for consuming the memory?
What about recursion if you have an array of structures containing pointers? What about a cycle?
Maybe arr points to the stack. Does that count as using memory?
malloc might round up the requested allocation size, or allocate internal bookkeeping structures. Do such effects count?
Some operating systems do provide a facility to retrieve the argument to malloc (or sometimes a rounded-up value, because the underlying system might genuinely have no use for the original argument), but in standard C and C++, POSIX and in general practice, you are responsible for tracking allocation sizes yourself.
Unfortunately I think you are out of luck. The size returned by sizeof is the size of the pointer which cannot lead to the correct size of what it points to (in your case a dynamic array).
I suggest you to use std::vector. It has a size() member function that returns the number of elements in use:
struct test {
std::vector<int> arr;
}
test x;
// ...
x.arr.size();

What is the "proper" way to allocate variable-sized buffers in C++?

This is very similar to this question, but the answers don't really answer this, so I thought I'd ask again:
Sometimes I interact with functions that return variable-length structures; for example, FSCTL_GET_RETRIEVAL_POINTERS in Windows returns a variably-sized RETRIEVAL_POINTERS_BUFFER structure.
Using malloc/free is discouraged in C++, and so I was wondering:
What is the "proper" way to allocate variable-length buffers in standard C++ (i.e. no Boost, etc.)?
vector<char> is type-unsafe (and doesn't guarantee anything about alignment, if I understand correctly), new doesn't work with custom-sized allocations, and I can't think of a good substitute. Any ideas?
I would use std::vector<char> buffer(n). There's really no such thing as a variably sized structure in C++, so you have to fake it; throw type safety out the window.
If you like malloc()/free(), you can use
RETRIEVAL_POINTERS_BUFFER* ptr=new char [...appropriate size...];
... do stuff ...
delete[] ptr;
Quotation from the standard regarding alignment (expr.new/10):
For arrays of char and unsigned char, the difference between the
result of the new-expression and the address returned by the
allocation function shall be an integral multiple of the strictest
fundamental alignment requirement (3.11) of any object type whose size
is no greater than the size of the array being created. [ Note:
Because allocation functions are assumed to return pointers to storage
that is appropriately aligned for objects of any type with fundamental
alignment, this constraint on array allocation overhead permits the
common idiom of allocating character arrays into which objects of
other types will later be placed. — end note ]
I don't see any reason why you can't use std::vector<char>:
{
std::vector<char> raii(memory_size);
char* memory = &raii[0];
//Now use `memory` wherever you want
//Maybe, you want to use placement new as:
A *pA = new (memory) A(/*...*/); //assume memory_size >= sizeof(A);
pA->fun();
pA->~A(); //call the destructor, once done!
}//<--- just remember, memory is deallocated here, automatically!
Alright, I understand your alignment problem. It's not that complicated. You can do this:
A *pA = new (&memory[i]) A();
//choose `i` such that `&memory[i]` is multiple of four, or whatever alignment requires
//read the comments..
You may consider using a memory pool and, in the specific case of the RETRIEVAL_POINTERS_BUFFER structure, allocate pool memory amounts in accordance with its definition:
sizeof(DWORD) + sizeof(LARGE_INTEGER)
plus
ExtentCount * sizeof(Extents)
(I am sure you are more familiar with this data structure than I am -- the above is mostly for future readers of your question).
A memory pool boils down to "allocate a bunch of memory, then allocate that memory in small pieces using your own fast allocator".
You can build your own memory pool, but it may be worth looking at Boosts memory pool, which is a pure header (no DLLs!) library. Please note that I have not used the Boost memory pool library, but you did ask about Boost so I thought I'd mention it.
std::vector<char> is just fine. Typically you can call your low-level c-function with a zero-size argument, so you know how much is needed. Then you solve your alignment problem: just allocate more than you need, and offset the start pointer:
Say you want the buffer aligned to 4 bytes, allocate needed size + 4 and add 4 - ((&my_vect[0] - reinterpret_cast<char*>(0)) & 0x3).
Then call your c-function with the requested size and the offsetted pointer.
Ok, lets start from the beginning. Ideal way to return variable-length buffer would be:
MyStruct my_func(int a) { MyStruct s; /* magic here */ return s; }
Unfortunately, this does not work since sizeof(MyStruct) is calculated on compile-time. Anything variable-length just do not fit inside a buffer whose size is calculated on compile-time. The thing to notice that this happens with every variable or type supported by c++, since they all support sizeof. C++ has just one thing that can handle runtime sizes of buffers:
MyStruct *ptr = new MyStruct[count];
So anything that is going to solve this problem is necessarily going to use the array version of new. This includes std::vector and other solutions proposed earlier. Notice that tricks like the placement new to a char array has exactly the same problem with sizeof. Variable-length buffers just needs heap and arrays. There is no way around that restriction, if you want to stay within c++. Further it requires more than one object! This is important. You cannot make variable-length object with c++. It's just impossible.
The nearest one to variable-length object that the c++ provides is "jumping from type to type". Each and every object does not need to be of same type, and you can on runtime manipulate objects of different types. But each part and each complete object still supports sizeof and their sizes are determined on compile-time. Only thing left for programmer is to choose which type you use.
So what's our solution to the problem? How do you create variable-length objects? std::string provides the answer. It needs to have more than one character inside and use the array alternative for heap allocation. But this is all handled by the stdlib and programmer do not need to care. Then you'll have a class that manipulates those std::strings. std::string can do it because it's actually 2 separate memory areas. The sizeof(std::string) does return a memory block whose size can be calculated on compile-time. But the actual variable-length data is in separate memory block allocated by the array version of new.
The array version of new has some restrictions on it's own. sizeof(a[0])==sizeof(a[1]) etc. First allocating an array, and then doing placement new for several objects of different types will go around this limitation.

delete & new in c++

This may be very simple question,But please help me.
i wanted to know what exactly happens when i call new & delete , For example in below code
char * ptr=new char [10];
delete [] ptr;
call to new returns me memory address. Does it allocate exact 10 bytes on heap, Where information about size is stored.When i call delete on same pointer,i see in debugger that there are a lot of byte get changed before and after the 10 Bytes.
Is there any header for each new which contain information about number of byte allocated by new.
Thanks a lot
Do it allocate exact 10 bytes
That's implementation dependant. The guarantee is "at least 10 chars".
Where information about size is stored?
That's implementation dependant.
Is there any header for each new which contain information about number of byte allocated by new?
That's implementation dependant.
By "that's implementation dependant" I mean it's not defined in the standard.
That's all up to the compiler and your runtime library. It's only exactly defined what effects new and delete have on your program, but how exactly these are acieved is not specified.
In your case it seems like a little more memory than requested is allocated and it will probably store management information like the size of the current chunk of memory, information about adjacent areas of free space or information to help the debugger try to detect buffer overflows and similar problems.
It is completely implementation-dependent. In general case you have to store the number of elements elsewhere. The implementation must allocate enough space for at least the number of elements specified, but it can allocate more.
Is there any header for each new which contain information about number of byte allocated by new.
That's platform dependent but yes, on many platforms there are.
Precisely, according to the standard, new char[10] will alloc at least 10 bytes in the heap.
The internals of new and delete are implementation dependent. So it will vary from compiler to compiler, and platform to platform. Additionally, you can find a variety of allocator algorithms (e.g: TCMalloc).
I'll give you an overview of how it could work internally, but don't take it as absolute truth. It's written for the solely purpose of this explanation.
In short, the new operator internally invokes malloc. The malloc uses a really long linked list of available memory blocks, aka free chain. When malloc is invoked, it lookups this list for the first block that's big enough to hold the requested size. After that, it splits the block in two parts, one with the size you requested, and the other with the rest, which is then added back to the free chain. Finally, it returns the block with the request size.
The inverse occurs in a free call, which is invoked by delete/delete[]. In short, it puts the provided block back to the free chain.
There could be fancy tricks during the processes I described above, like sorting the free chain, rounding the requested size to the next power of two to reduce memory fragmentation, and so on.
char * ptr=new char [10];
You are creating an array of 10 character's in heap and storing the address of 0th element in a pointer.this is similar to doing an malloc in C
delete [] ptr;
You are deleting(freeing the memory) the heap memory which was allocated by the earlier statement.this is similar to doing a free in c.
It is implementation dependent, but mostly the metadata for a block of memory is usually stored in the area before the memory address returned. The change that you observed before the 10 bytes was likely metadata being updated for this block (likely the size of the block being written into the meta data), and after the 10 bytes were metadata being updated for the next block (still unallocated, likely the pointer to the next chunk on the free list).
It is not a good idea to mess with the heap as it is not portable. However, if you want to do such heap magic, I suggest you implement your own memory pools (just get a large chunk of memory from the heap and manage it yourself). A possible place to start would be to look at libmm.
While the specifics are implementation dependent, one piece of information the implementation will need to store is the number of elements in the array. Or if it does not store it directly, it will need to accurately derive it from the block size allocated.
The reason for this because if an array of objects is allocated with new[], when they are deleted with delete[], the destructor of each object in the array will need to be called. delete[] will need to know how many objects to destruct. This is why it is necessary to match new with delete and new[] with delete[].