zero array size not allowed in c++ [duplicate] - c++

A simple test app:
cout << new int[0] << endl;
outputs:
0x876c0b8
So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?

From 5.3.4/7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
From 3.7.3.1/2
The effect of dereferencing a pointer returned as a request for zero size is undefined.
Also
Even if the size of the space requested [by new] is zero, the request can fail.
That means you can do it, but you can not legally (in a well defined manner across all platforms) dereference the memory that you get - you can only pass it to array delete - and you should delete it.
Here is an interesting foot-note (i.e not a normative part of the standard, but included for expository purposes) attached to the sentence from 3.7.3.1/2
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]

Yes, it is legal to allocate a zero-sized array like this. But you must also delete it.

What does the standard say about this? Is it always legal to "allocate" empty block of memory?
Every object has a unique identity, i.e. a unique address, which implies a non-zero length (the actual amount of memory will be silently increased, if you ask for zero bytes).
If you allocated more than one of these objects then you'd find they have different addresses.

Yes it is completely legal to allocate a 0 sized block with new. You simply can't do anything useful with it since there is no valid data for you to access. int[0] = 5; is illegal.
However, I believe that the standard allows for things like malloc(0) to return NULL.
You will still need to delete [] whatever pointer you get back from the allocation as well.

Curiously, C++ requires that operator new return a legitimate pointer
even when zero bytes are requested. (Requiring this odd-sounding
behavior simplifies things elsewhere in the language.)
I found Effective C++ Third Edition said like this in "Item 51: Adhere to convention when writing new and delete".

I guarantee you that new int[0] costs you extra space since I have tested it.
For example,
the memory usage of
int **arr = new int*[1000000000];
is significantly smaller than
int **arr = new int*[1000000000];
for(int i =0; i < 1000000000; i++) {
arr[i]=new int[0];
}
The memory usage of the second code snippet minus that of the first code snippet is the memory used for the numerous new int[0].

Related

Creating an array using new without declaring size [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 3 years ago.
This has been bugging me for quite some time. I have a pointer. I declare an array of type int.
int* data;
data = new int[5];
I believe this creates an array of int with size 5. So I'll be able to store values from data[0] to data[4].
Now I create an array the same way, but without size.
int* data;
data = new int;
I am still able to store values in data[2] or data[3]. But I created an array of size 1. How is this possible?
I understand that data is a pointer pointing to the first element of the array. Though I haven't allocated memory for the next elements, I still able to access them. How?
Thanks.
Normally, there is no need to allocate an array "manually" with new. It is just much more convenient and also much safer to use std::vector<int> instead. And leave the correct implementation of dynamic memory management to the authors of the standard library.
std::vector<int> optionally provides element access with bounds checking, via the at() method.
Example:
#include <vector>
int main() {
// create resizable array of integers and resize as desired
std::vector<int> data;
data.resize(5);
// element access without bounds checking
data[3] = 10;
// optionally: element access with bounds checking
// attempts to access out-of-range elements trigger runtime exception
data.at(10) = 0;
}
The default mode in C++ is usually to allow to shoot yourself in the foot with undefined behavior as you have seen in your case.
For reference:
https://en.cppreference.com/w/cpp/container/vector
https://en.cppreference.com/w/cpp/container/vector/at
https://en.cppreference.com/w/cpp/language/ub
Undefined, unspecified and implementation-defined behavior
What are all the common undefined behaviours that a C++ programmer should know about?
Also, in the second case you don't allocate an array at all, but a single object. Note that you must use the matching delete operator too.
int main() {
// allocate and deallocate an array
int *arr = new int[5];
delete[] arr;
// allocate and deallocate a single object
int *p = new int;
delete p;
}
For reference:
https://en.cppreference.com/w/cpp/language/new
https://en.cppreference.com/w/cpp/language/delete
How does delete[] know it's an array?
When you used new int then accessing data[i] where i!=0 has undefined behaviour.
But that doesn't mean the operation will fail immediately (or every time or even ever).
On most architectures its very likely that the memory addresses just beyond the end of the block you asked for are mapped to your process and you can access them.
If you're not writing to them it's no surprise you can access them (though you shouldn't).
Even if you write to them most memory allocators have a minimum allocation and behind the scenes you may well have been allocated space for more (4 is realistic) integers even though the code only requests 1.
You may also be overwriting some area of memory but never get tripped up. A common consequence of writing beyond the end of an array is to corrupt the free-memory store itself. The consequence may be catastrophe but may only exhibit itself in a later allocation possibly of a similar sized object.
It's a dreadful idea to rely on such behaviour but it's not very surprising that it appears to work.
C++ doesn't (typically or by default) perform strict range checking and accessing invalid array elements may work or at least appear to work initially.
This is why C and C++ can be plagued with bizarre and intermittent errors. Not all code that provokes undefined behaviour fails catastrophically in every execution.
Going outside the bounds of an array in C++ is undefined behavior, so anything can happen, including things that appear to work "correctly".
In practical implementation terms on common systems, you can think of "virtual" memory as a large "flat" space from 0 up to the size of a pointer, and pointers are into this space.
The "virtual" memory for a process is mapped to physical memory, page file, etc. Now, if you access an address that is not mapped, or try to write a read-only part, you will get an error, such as an access violation or segfault.
But this mapping is done for fairly large chunks for efficiency, such as for 4KiB "pages". The allocators in a process, such as new and delete (or the stack) will further split up these pages as required. So accessing other parts of a valid page are unlikely to raise an error.
This has the unfortunate result that it can be hard to detect such out of bounds access, use after free, etc. In many cases writes will succeed, only to corrupt some other seemingly unrelated object, which may cause a crash later, or incorrect program output, so best to be very careful about C and C++ memory management.
data = new int; // will be some virtual address
data[1000] = 5; // possibly the start of a 4K page potentially allowing a great deal beyond it
other_int = new int[5];
other_int[10] = 10;
data[10000] = 42; // with further pages beyond, so you can really make a mess of your programs memory
other_int[10] == 42; // perfectly possible to overwrite other things in unexpected ways
C++ provides many tools to help, such as std::string, std::vector and std::unique_ptr, and it is generally best to try and avoid manual new and delete entirely.
new int allocates 1 integer only. If you access offsets larger than 0, e.g. data[1] you override the memory.
int * is a pointer to something that's probably an int. When you allocate using new int , you're allocating one int and storing the address to the pointer. In reality, int * is just a pointer to some memory.
We can treat an int * as a pointer to a scalar element (i.e. new int) or an array of elements -- the language has no way of telling you what your pointer is really pointing to; a very good argument to stop using pointers and only using scalar values and std::vector.
When you say a[2], you well access the memory sizeof(int) after the value pointed to by a. If a is pointing to a scalar value, anything could be after a and reading it causes undefined behaviour (your program might actually crash -- this is an actual risk). Writing to that adress will most likley cause problems; it is not merely a risk, but something you should actively guard against -- i.e. use std::vector if you need an array and int or int& if you don't.
The expression a[b], where one of the operands is a pointer, is another way to write *(a+b). Let's for the sake of sanity assume that a is the pointer here (but since addition is commutative it can be the other way around! try it!); then the address in a is incremented by b times sizeof(*a), resulting in the address of the bth object after *a.
The resulting pointer is dereferenced, resulting in a "name" for the object whose address is a+b.
Note that a does not have to be an array; if it is one, it "decays" to a pointer before the operator [] is applied. The operation is taking place on a typed pointer. If that pointer is invalid, or if the memory at a+b does not in fact hold an object of the type of *a, or even if that object is unrelated to *a (e.g., because it is not in the same array or structure), the behavior is undefined.
In the real world, "normal" programs do not do any bounds checking but simply add the offset to the pointer and access that memory location. (Accessing out-of-bounds memory is, of course, one of the more common bugs in C and C++, and one of the reasons these languages are not without restrictions recommended for high-security applications.)
If the index b is small, the memory is probably accessible by your program. For plain old data like int the most likely result is then that you simply read or write the memory in that location. This is what happened to you.
Since you overwrite unrelated data (which may in fact be used by other variables in your program) the results are often surprising in more complex programs. Such errors can be hard to find, and there are tools out there to detect such out-of-bounds access.
For larger indices you'll at some point end up in memory which is not assigned to your program, leading to an immediate crash on modern systems like Windows NT and up, and unpredictable results on architectures without memory management.
I am still able to store values in data[2] or data[3]. But I created an array of size 1. How is this possible?
The behaviour of the program is undefined.
Also, you didn't create an array of size 1, but a single non-array object instead. The difference is subtle.

Why is zero-length array allowed only if it's heap allocated?

I notice that it's not allowed to create non-heap allocated arrays of zero length.
// error: cannot allocate an array of constant length zero
char a[0];
I also notice that it's allowed to create heap allocated arrays of zero length.
// this is okay though
char *pa = new char[0];
I guess they're both guaranteed by the Standard (I don't have a copy of the Standard at hand). If so, why are they so different? Why not just allow a zero-length array on stack (or vice versa)?
This is addressed in the following Sections of the C++ Standard.
3.7.3.1/2:
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
And also,
5.3.4, paragraph 7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
An Array of size 0 is not allowed by the C++ standard:
8.3.4/1:
"If the _constant-expression+ (5.19) is present, it shall be an integral constant expression and its value shall be greater than zero."
In my understanding the rationale behind this seems to be the fact that C++ standard requires that every object must have an unique address(this is the very reason even an empty class object has size of 1).In the case of a non heap zero sized array, no objects need to be created, and hence no address is required to be given to it and hence no need of allowing it in first place.
As far as c is concerned, zero length arrays are allowed by the c standard, typically they are used to implement structures having a variable size by placing the zero length array at the end of the structure. If my memory serves my correct it is popularly called as C struct Hack.
A 0 length array isn't very useful. When you're calculating the
dimension, it can occur, and it is useful to not have to treat the case
specially in your code. Outside of new, the dimension of the array
must be a constant, not a calculated value, and if you know that the
constant is 0, why define it?
At least, that's the rationale I've heard (from people who worked on
it). I'm not totally convinced: it's not unusual in code to have to
work with symbolic constants whose value isn't known (from a header
file, or even the command line). So it probably would make sense to
allows arrays of 0 elements. And at least one compiler in the past has
allowed them, although I've forgotten which.
One frequent trick in C++ is to use such an array in a compile time
assert, something like:
char dummyToTestSomeSpecificCondition[ condition ];
This will fail to compile if the condition is false, and will compile if
it isn't. Except for that one compiler (if it still exists); I'll use
something like:
char dummyToTestSomeSpecificCondition[ 2 * condition - 1 ];
, just in case.
I think that the main reason why they behave different is that the first one:
char a[0];
is an array of size 0, and this is forbidden because its size should be by definition 0*sizeof(char), and in C or C++ you cannot define types of size 0.
But the second one:
char *pa = new char[0];
is not a real array, it is just a chunk of 0 objects of type char put all together in memory. Since a sequence of 0 objects may be useful, this is allowed. It just return a pointer past the last item, and that is perfectly fine.
To add to my argument, consider the following examples:
new int[0][3]; //ok: create 0 arrays of 3 integers
new int[3][0]; //error: create 3 arrays of 0 integers
Although both lines would alloc the same memory (0 bytes), one is allowed and the other is not.
That depends on compiler implementation/flags. Visual C++ doesn't allow, GCC allows (don't know how to disable).
Using this approach, STATIC_ASSERT may be implemented in VC, but not in GCC:
#define STATIC_ASSERT(_cond) { char __dummy[_cond];}
Even intuitively this makes sense.
Since the heap allocated method creates a pointer on the stack, to a piece of memory allocated on the heap, it's still "creating something" of size: sizeof(void*). In the case of allocating a zero length array, the pointer that exists on the stack can point anywhere, but nowhere meaningful.
By contrast, if you allocate the array on the stack, what would a meaningful zero length array object look like? It doesn't really make sense.

Why the allocation succeeds for size zero bytes?

This is similar to What does zero-sized array allocation do/mean?
I have following code
int *p = new int[0];
delete []p;
p gets an address and gets deleted properly.
My question is: Why allocation of zero bytes is allowed by c++ Standard in the first place?
Why doesn't it throw bad_alloc or some special exception ?
I think, It is just postponing the catastrophic failure, making programmer's life difficult. Because if size to be allocated is calculated at run time and if programmer assumes its allocated properly and tries to write something to that memory, ends up corrupting memory !!! and Crash may happen some where else in the code.
EDIT: How much memory it allocates upon zero size request ?
Why would you want it to fail? If the programmer tries to read/write to non-existent elements, then that is an error. The initial allocation is not (this is no different to e.g. int *p = new int[1]; p[1] = 5;).
3.7.3.1/2:
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Compare dynamically allocated array to std::vector for example. You can have a vector of size 0, so why not allow the same for the array? And it is always an error to access past the end of the array whether its size is 0 or not.
Long time ago, before using exceptions, the malloc function returned a NULL pointer if the allocation failed.
If allocating zero bytes would also return a NULL pointer, it would be hard to make the distinction between a failed allocation and a succeeding-zero-bytes allocation.
On the other hand if the allocation of zero bytes would return a non-NULL pointer, you end up with a situation in which two different allocations of zero bytes can have the same pointer.
Therefore, to keep things simple, the malloc function of zero bytes allocates 1 byte.
The same can be said for int[N] where N>0:
Because if size to be allocated is calculated at run time and if programmer assumes its allocated properly and tries to write something past end of that memory, ends up corrupting memory !!! and Crash may happen some where else in the code.
Zero sized array allocation is covered in the ISO C++ Standard under 5.3.4, paragrahp 7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
This makes code that performs dnaymic array allocation easier.
In general: If someone calls a function and asks it to return an array with n (0 in your case) elements, the code shouldn't be trying to read the returned array past the n-nth element anyway.
So, I don't really see the catastrophic failure, since the code would have been faulty to begin with for any n.
As you say:
Because if size to be allocated is calculated at run time and if programmer assumes its allocated properly
The calculated size would be "0", if he tries to access more than his calculated size then, well.. I am repeating myself ;)

Dynamically allocating an array of size 0 [duplicate]

A simple test app:
cout << new int[0] << endl;
outputs:
0x876c0b8
So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?
From 5.3.4/7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
From 3.7.3.1/2
The effect of dereferencing a pointer returned as a request for zero size is undefined.
Also
Even if the size of the space requested [by new] is zero, the request can fail.
That means you can do it, but you can not legally (in a well defined manner across all platforms) dereference the memory that you get - you can only pass it to array delete - and you should delete it.
Here is an interesting foot-note (i.e not a normative part of the standard, but included for expository purposes) attached to the sentence from 3.7.3.1/2
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Yes, it is legal to allocate a zero-sized array like this. But you must also delete it.
What does the standard say about this? Is it always legal to "allocate" empty block of memory?
Every object has a unique identity, i.e. a unique address, which implies a non-zero length (the actual amount of memory will be silently increased, if you ask for zero bytes).
If you allocated more than one of these objects then you'd find they have different addresses.
Yes it is completely legal to allocate a 0 sized block with new. You simply can't do anything useful with it since there is no valid data for you to access. int[0] = 5; is illegal.
However, I believe that the standard allows for things like malloc(0) to return NULL.
You will still need to delete [] whatever pointer you get back from the allocation as well.
Curiously, C++ requires that operator new return a legitimate pointer
even when zero bytes are requested. (Requiring this odd-sounding
behavior simplifies things elsewhere in the language.)
I found Effective C++ Third Edition said like this in "Item 51: Adhere to convention when writing new and delete".
I guarantee you that new int[0] costs you extra space since I have tested it.
For example,
the memory usage of
int **arr = new int*[1000000000];
is significantly smaller than
int **arr = new int*[1000000000];
for(int i =0; i < 1000000000; i++) {
arr[i]=new int[0];
}
The memory usage of the second code snippet minus that of the first code snippet is the memory used for the numerous new int[0].

C++ new int[0] -- will it allocate memory?

A simple test app:
cout << new int[0] << endl;
outputs:
0x876c0b8
So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?
From 5.3.4/7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
From 3.7.3.1/2
The effect of dereferencing a pointer returned as a request for zero size is undefined.
Also
Even if the size of the space requested [by new] is zero, the request can fail.
That means you can do it, but you can not legally (in a well defined manner across all platforms) dereference the memory that you get - you can only pass it to array delete - and you should delete it.
Here is an interesting foot-note (i.e not a normative part of the standard, but included for expository purposes) attached to the sentence from 3.7.3.1/2
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Yes, it is legal to allocate a zero-sized array like this. But you must also delete it.
What does the standard say about this? Is it always legal to "allocate" empty block of memory?
Every object has a unique identity, i.e. a unique address, which implies a non-zero length (the actual amount of memory will be silently increased, if you ask for zero bytes).
If you allocated more than one of these objects then you'd find they have different addresses.
Yes it is completely legal to allocate a 0 sized block with new. You simply can't do anything useful with it since there is no valid data for you to access. int[0] = 5; is illegal.
However, I believe that the standard allows for things like malloc(0) to return NULL.
You will still need to delete [] whatever pointer you get back from the allocation as well.
Curiously, C++ requires that operator new return a legitimate pointer
even when zero bytes are requested. (Requiring this odd-sounding
behavior simplifies things elsewhere in the language.)
I found Effective C++ Third Edition said like this in "Item 51: Adhere to convention when writing new and delete".
I guarantee you that new int[0] costs you extra space since I have tested it.
For example,
the memory usage of
int **arr = new int*[1000000000];
is significantly smaller than
int **arr = new int*[1000000000];
for(int i =0; i < 1000000000; i++) {
arr[i]=new int[0];
}
The memory usage of the second code snippet minus that of the first code snippet is the memory used for the numerous new int[0].