Why is zero-length array allowed only if it's heap allocated? - c++

I notice that it's not allowed to create non-heap allocated arrays of zero length.
// error: cannot allocate an array of constant length zero
char a[0];
I also notice that it's allowed to create heap allocated arrays of zero length.
// this is okay though
char *pa = new char[0];
I guess they're both guaranteed by the Standard (I don't have a copy of the Standard at hand). If so, why are they so different? Why not just allow a zero-length array on stack (or vice versa)?

This is addressed in the following Sections of the C++ Standard.
3.7.3.1/2:
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
And also,
5.3.4, paragraph 7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
An Array of size 0 is not allowed by the C++ standard:
8.3.4/1:
"If the _constant-expression+ (5.19) is present, it shall be an integral constant expression and its value shall be greater than zero."
In my understanding the rationale behind this seems to be the fact that C++ standard requires that every object must have an unique address(this is the very reason even an empty class object has size of 1).In the case of a non heap zero sized array, no objects need to be created, and hence no address is required to be given to it and hence no need of allowing it in first place.
As far as c is concerned, zero length arrays are allowed by the c standard, typically they are used to implement structures having a variable size by placing the zero length array at the end of the structure. If my memory serves my correct it is popularly called as C struct Hack.

A 0 length array isn't very useful. When you're calculating the
dimension, it can occur, and it is useful to not have to treat the case
specially in your code. Outside of new, the dimension of the array
must be a constant, not a calculated value, and if you know that the
constant is 0, why define it?
At least, that's the rationale I've heard (from people who worked on
it). I'm not totally convinced: it's not unusual in code to have to
work with symbolic constants whose value isn't known (from a header
file, or even the command line). So it probably would make sense to
allows arrays of 0 elements. And at least one compiler in the past has
allowed them, although I've forgotten which.
One frequent trick in C++ is to use such an array in a compile time
assert, something like:
char dummyToTestSomeSpecificCondition[ condition ];
This will fail to compile if the condition is false, and will compile if
it isn't. Except for that one compiler (if it still exists); I'll use
something like:
char dummyToTestSomeSpecificCondition[ 2 * condition - 1 ];
, just in case.

I think that the main reason why they behave different is that the first one:
char a[0];
is an array of size 0, and this is forbidden because its size should be by definition 0*sizeof(char), and in C or C++ you cannot define types of size 0.
But the second one:
char *pa = new char[0];
is not a real array, it is just a chunk of 0 objects of type char put all together in memory. Since a sequence of 0 objects may be useful, this is allowed. It just return a pointer past the last item, and that is perfectly fine.
To add to my argument, consider the following examples:
new int[0][3]; //ok: create 0 arrays of 3 integers
new int[3][0]; //error: create 3 arrays of 0 integers
Although both lines would alloc the same memory (0 bytes), one is allowed and the other is not.

That depends on compiler implementation/flags. Visual C++ doesn't allow, GCC allows (don't know how to disable).
Using this approach, STATIC_ASSERT may be implemented in VC, but not in GCC:
#define STATIC_ASSERT(_cond) { char __dummy[_cond];}

Even intuitively this makes sense.
Since the heap allocated method creates a pointer on the stack, to a piece of memory allocated on the heap, it's still "creating something" of size: sizeof(void*). In the case of allocating a zero length array, the pointer that exists on the stack can point anywhere, but nowhere meaningful.
By contrast, if you allocate the array on the stack, what would a meaningful zero length array object look like? It doesn't really make sense.

Related

zero array size not allowed in c++ [duplicate]

A simple test app:
cout << new int[0] << endl;
outputs:
0x876c0b8
So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?
From 5.3.4/7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
From 3.7.3.1/2
The effect of dereferencing a pointer returned as a request for zero size is undefined.
Also
Even if the size of the space requested [by new] is zero, the request can fail.
That means you can do it, but you can not legally (in a well defined manner across all platforms) dereference the memory that you get - you can only pass it to array delete - and you should delete it.
Here is an interesting foot-note (i.e not a normative part of the standard, but included for expository purposes) attached to the sentence from 3.7.3.1/2
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Yes, it is legal to allocate a zero-sized array like this. But you must also delete it.
What does the standard say about this? Is it always legal to "allocate" empty block of memory?
Every object has a unique identity, i.e. a unique address, which implies a non-zero length (the actual amount of memory will be silently increased, if you ask for zero bytes).
If you allocated more than one of these objects then you'd find they have different addresses.
Yes it is completely legal to allocate a 0 sized block with new. You simply can't do anything useful with it since there is no valid data for you to access. int[0] = 5; is illegal.
However, I believe that the standard allows for things like malloc(0) to return NULL.
You will still need to delete [] whatever pointer you get back from the allocation as well.
Curiously, C++ requires that operator new return a legitimate pointer
even when zero bytes are requested. (Requiring this odd-sounding
behavior simplifies things elsewhere in the language.)
I found Effective C++ Third Edition said like this in "Item 51: Adhere to convention when writing new and delete".
I guarantee you that new int[0] costs you extra space since I have tested it.
For example,
the memory usage of
int **arr = new int*[1000000000];
is significantly smaller than
int **arr = new int*[1000000000];
for(int i =0; i < 1000000000; i++) {
arr[i]=new int[0];
}
The memory usage of the second code snippet minus that of the first code snippet is the memory used for the numerous new int[0].

Why C++ have the type array?

I am learning C++. I found that the pointer has the same function with array, like a[4], a can be both pointer and array. But as C++ defined, for different length of array, it is a different type. In fact when we pass an array to a function, it will be converted into pointer automatically, I think it is another proof that array can be replaced by pointer. So, my question is:
Why C++ don't replace all the array with pointer?
In early C it was decided to represent the size of an array as part of its type, available via the sizeof operator. C++ has to be backward compatible with that. There's much wrong with C++ arrays, but having size as part of the type is not one of the wrong things.
Regarding
” pointer has the same function with array, like a[4], a can be both pointer and array
no, this is just an implicit conversion, from array expression to pointer to first item of that array.
A weird as it sounds, C++ does not provide indexing of built-in arrays. There's indexing for pointers, and p[i] just means *(p+i) by definition, so you can also write that as *(i+p) and hence as i[p]. And thus also i[a], because it's really the pointer that's indexed. Weird indeed.
The implicit conversion, called a “decay”, loses information, and is one of the things that are wrong about C++ arrays.
The indexing of pointers is a second thing that's wrong (even if it makes a lot of sense at the assembly language and machine code level).
But it needs to continue to be that way for backward compatibility.
Why array decay is Bad™: this causes an array of T to often be represented by simply a pointer to T.
You can't see from such a pointer (e.g. as a formal argument) whether it points to a single T object or to the first item of an array of T.
But much worse, if T has a derived class TD, where sizeof(TD) > sizeof(T), and you form an array of TD, then you can pass that array to a formal argument that's pointer to T – because that array of TD decays to pointer to TD which converts implicitly to pointer to T. Now using that pointer to T as an array yields incorrect address computations, due to incorrect size assumption for the array items. And bang crash (if you're lucky), or perhaps just incorrect results (if you're not so lucky).
In C and C++, everything of a single type has the same size. An int[4] array is twice as big as an int[2] array, so they can't be of the same type.
But then you might ask, "Why should type imply size?" Well:
A local variable needs to take up a certain amount of memory. When you declare an array, it takes up memory that scales up with its length. When you declare a pointer, it is always the size of pointers on your machine.
Pointer arithmetic is determined by the size of the type it's pointing to: the distance between the address pointed to by p and that pointed to by p+1 is exactly the size of its type. If types didn't have fixed sizes, then p would need to carry around extra information, or C would have to give up arithmetic.
A function needs to know how big its arguments are, because functions are compiled to expect their variables to be in particular places, and having a parameter with an unknown size screws that up.
And you say, "Well, if I pass an array to a function, it just turns into a pointer anyway." True, but you can make new types that have arrays as members, and then you can pass THOSE types around. And in C++, you can in fact pass an array as an array.
int sum10(int (&arr)[10]){ //only takes int arrays of size 10
int result = 0;
for(int i=0; i<10; i++)
result += arr[i];
return result
}
You can't use pointers in place of array declarations without having to use malloc/free or new/delete to create and destroy memory on the heap. You can declare an array as a variable and it gets created on the stack and you do not have to worry about it's destruction.
Well, array is an easier was of dealing with data and manipulating them. However, In order to use pointers you need to have a clear memory address to point to. Also, both concepts are not different from each other when it comes to passing them to a function. Bothe pointers and arrays are passed by reference. Hope that helps
I'm not sure if i get your question but assuming you're new to coding:
when you declare an array int a[4] you let the compiler know you need 4*int memory, and what the compiler does is assign a the address of the 'start' of that 4*int size memory. when u later use a[x], [x] means to do (a + sizeof(int)*x) AND dereference that pointer address to get the int.
In other words, it's always a pointer being passed around instead of an 'array', which is just an abstraction that makes it easier for you to code.

Why is new int[n] valid when int array[n] is not?

For the following code:
foo(int n){
int array[n];
}
I understand that this is invalid syntax and that it is invalid because the c++ standard requires array size to be set at compile time (although some compilers support the following syntax).
However I also understand the the following is valid syntax:
bar(int n){
int *array = new int[n];
}
I don't understand why this is allowed, isn't it the same as creating an array where the size is determined at runtime? Is it good practice to do this or should I be using a vector if I need to do this instead?
That's because the former is allocated on the stack and the latter on the heap.
When you allocate something on the stack, knowing the size of the object is essential for correctly building it. C99 allows the size to be specified at run time, and this introduces some complications in building and dismantling the aforementioned stack, since you cannot calculate its size at compile time. Machine code must be emitted in order to perform said calculation during the execution of the program. This is probably the main reason why this feature wasn't included in the C++ standard.²
On the contrary, the heap has no fixed structure, as the name implies. Blocks of any size can be allocated with no particular order, as long as they do not overlap and you have enough (virtual) memory¹. In this case, knowing the size at compile time is not that relevant.
Also, remember that the stack has a limited size, mostly to detect infinite recursions before they consume all the available memory. Usually the limit is fixed around 1MB, and you rarely reach that. Unless you allocate large objects, which should be placed in the heap.
As of what you should use, probably a std::vector<int>. But it really depends on what you are trying to do.
Also note that C++11 has a std::array class, whose size must be known at compile time. C++14 should have introduced std::dynarray, but it was postponed because there is still much work to do concerning compile-time unknown size stack allocation.
¹ blocks are usually allocated sequentially for performance reasons, but that's not required.
² as pointed out, knowing the size at compile time is not a hard requirement, but it makes things simpler.
In the first case you are allocating the memory space statically to hold the integers. This is done when the program is compiled and so the amount of storage is inflexible.
In the latter case you are dynamically allocating a memory space to hold the integers. This is done when the program is run, and so the amount of storage required can be flexible.
The second call is actually a function that talks to the operating system to go and find a place in memory to use. That same process does not happen in the first case.
int array[n] allocates a fixed-length array on the call stack at compile-time, and thus n needs to be known at compile-time (unless a compiler-specific extension is used to allow the allocation at runtime, but the array is still on the stack).
int *array = new int[n] allocates a dynamic-length array on the heap at run-time, so n does not need to be known at compile-time.
The one and only valid answer to your question is, because the standard says so.
In contrast to C99, C++ never bothered to specify variable length arrays (VLAs), so the only way to get variably sized arrays is using dynamic allocation, with malloc, new or some other memory-manager.
In fairness to C++, having runtime-sized stack-allocations slightly complicates stack-unwinding, which would also make exception-handling for the functions using the feature consequently more bothersome.
Anyway, even if your compiler provides that C99-feature as an extension, it's a good idea to always keep a really tight rein on your stack-usage:
There is no way to recover from blowing the stack-limit, and the error-case is simply left Undefined Behavior for a reason.
The easiest way to simulate VLAs in C++, though without the performance-benefit of avoiding dynamic allocation (and the danger of blowing the limit):
unique_ptr<T[]> array{new T[n]};
In the expression
new int[n]
int[n] is not the type. C++ treats "new with arrays" and "new with non-arrays" differently. The N3337 standard draft has this to say about new:
When the allocated object is an array (that is, the noptr-new-declarator syntax is used or the new-type-id or type-id denotes an array type), the new-expression yields a pointer to the initial element (if any) of the array.
The noptr-new-declarator refers to this special case (evaluate n and create the array of this size), see:
noptr-new-declarator:
    [ expression ] attribute-specifier-seqopt
    noptr-new-declarator [ constant-expression ] attribute-specifier-seqopt
However you can't use this in the "usual" declarations like
int array[n];
or in the typedef
typedef int variable_array[n];
This is different with C99 VLAs, where both are allowed.
Should I be using vectors instead?
Yes, you should. You should use vectors all the time, unless you have a very strong reason to do otherwise (there was one time during the last 7 years when I used new - when I was implementing vector for a school assignment).
No, the second is not declaring an array. It's using the array form of operator new, and that specifically permits the first dimension to be variable.
This is because the C++ language does not have the C feature introduced in C99 known as "variable length arrays" (VLA).
C++ is lagging in adopting this C feature because the std::vector type from its library fulfills most of the requirements.
Furthermore, the 2011 standard of C backpedaled and made VLA's an optional feature.
VLA's, in a nutshell, allow you to use a run-time value to determine the size of a local array that is allocated in automatic storage:
int func(int variable)
{
long array[variable]; // VLA feature
// loop over array
for (size_t i = 0; i < sizeof array / sizeof array[0]; i++) {
// the above sizeof is also a VLA feature: it yields the dynamic
// size of the array, and so is not a compile-time constant,
// unlike other uses of sizeof!
}
}
VLA's existed in the GNU C dialect long before C99. In dialects of C without VLA's, array dimensions in a declaration must be constant expressions.
Even in dialects of C with VLA's, only certain arrays can be VLA's. For instance static arrays cannot be, and neither can dynamic arrays (for instance arrays inside a structure, even if instances of that structure are allocated dynamically).
In any case, since you're coding in C++, this is moot!
Note that storage allocated with operator new are not the VLA feature. This is a special C++ syntax for dynamic allocation, which returns a pointer type, as you know:
int *p = new int[variable];
Unlike a VLA's, this object will persist until it is explicitly destroyed with delete [], and can be returned from the surrounding scope.
Because it has different semantics:
If n is a compile-time constant (unlike in your example):
int array[n]; //valid iff n is compile-time constant, space known at compile-time
But consider when n is a runtime value:
int array[n]; //Cannot have a static array with a runtime value in C++
int * array = new int[n]; //This works because it happens at run-time,
// not at compile-time! Different semantics, similar syntax.
In C99 you can have a runtime n for an array and space will be made in the stack at runtime.
There are some proposals for similar extensions in C++, but none of them is into the standard yet.
You can allocate memory statically on the stack or dynamically on the heap.
In your first case, your function contains a declaration of an array with a possible variable length, but this is not possible, since raw arrays must have fixed size at compile time, because they are allocated on the stack. For this reason their size must be specified as a constant, for example 5. You could have something like this:
foo(){
int array[5]; // raw array with fixed size 5
}
Using pointers you can specify a variable size for the memory that will be pointed, since this memory will be allocated dynamically on the heap. In your second case, you are using the parameter n to specify the space of memory that will be allocated.
Concluding, we can say that pointers are not arrays: the memory allocated using a pointer is allocated on the heap, whereas the memory allocated for a raw array is allocated on the stack.
There are good alternatives to raw arrays, for example the standard container vector, which is basically a container with variable length size.
Make sure you understand well the difference between dynamic and static memory allocation, the difference between memory allocated on the stack and memory allocated on the heap.

Dynamically allocating an array of size 0 [duplicate]

A simple test app:
cout << new int[0] << endl;
outputs:
0x876c0b8
So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?
From 5.3.4/7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
From 3.7.3.1/2
The effect of dereferencing a pointer returned as a request for zero size is undefined.
Also
Even if the size of the space requested [by new] is zero, the request can fail.
That means you can do it, but you can not legally (in a well defined manner across all platforms) dereference the memory that you get - you can only pass it to array delete - and you should delete it.
Here is an interesting foot-note (i.e not a normative part of the standard, but included for expository purposes) attached to the sentence from 3.7.3.1/2
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Yes, it is legal to allocate a zero-sized array like this. But you must also delete it.
What does the standard say about this? Is it always legal to "allocate" empty block of memory?
Every object has a unique identity, i.e. a unique address, which implies a non-zero length (the actual amount of memory will be silently increased, if you ask for zero bytes).
If you allocated more than one of these objects then you'd find they have different addresses.
Yes it is completely legal to allocate a 0 sized block with new. You simply can't do anything useful with it since there is no valid data for you to access. int[0] = 5; is illegal.
However, I believe that the standard allows for things like malloc(0) to return NULL.
You will still need to delete [] whatever pointer you get back from the allocation as well.
Curiously, C++ requires that operator new return a legitimate pointer
even when zero bytes are requested. (Requiring this odd-sounding
behavior simplifies things elsewhere in the language.)
I found Effective C++ Third Edition said like this in "Item 51: Adhere to convention when writing new and delete".
I guarantee you that new int[0] costs you extra space since I have tested it.
For example,
the memory usage of
int **arr = new int*[1000000000];
is significantly smaller than
int **arr = new int*[1000000000];
for(int i =0; i < 1000000000; i++) {
arr[i]=new int[0];
}
The memory usage of the second code snippet minus that of the first code snippet is the memory used for the numerous new int[0].

C++ new int[0] -- will it allocate memory?

A simple test app:
cout << new int[0] << endl;
outputs:
0x876c0b8
So it looks like it works. What does the standard say about this? Is it always legal to "allocate" empty block of memory?
From 5.3.4/7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
From 3.7.3.1/2
The effect of dereferencing a pointer returned as a request for zero size is undefined.
Also
Even if the size of the space requested [by new] is zero, the request can fail.
That means you can do it, but you can not legally (in a well defined manner across all platforms) dereference the memory that you get - you can only pass it to array delete - and you should delete it.
Here is an interesting foot-note (i.e not a normative part of the standard, but included for expository purposes) attached to the sentence from 3.7.3.1/2
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Yes, it is legal to allocate a zero-sized array like this. But you must also delete it.
What does the standard say about this? Is it always legal to "allocate" empty block of memory?
Every object has a unique identity, i.e. a unique address, which implies a non-zero length (the actual amount of memory will be silently increased, if you ask for zero bytes).
If you allocated more than one of these objects then you'd find they have different addresses.
Yes it is completely legal to allocate a 0 sized block with new. You simply can't do anything useful with it since there is no valid data for you to access. int[0] = 5; is illegal.
However, I believe that the standard allows for things like malloc(0) to return NULL.
You will still need to delete [] whatever pointer you get back from the allocation as well.
Curiously, C++ requires that operator new return a legitimate pointer
even when zero bytes are requested. (Requiring this odd-sounding
behavior simplifies things elsewhere in the language.)
I found Effective C++ Third Edition said like this in "Item 51: Adhere to convention when writing new and delete".
I guarantee you that new int[0] costs you extra space since I have tested it.
For example,
the memory usage of
int **arr = new int*[1000000000];
is significantly smaller than
int **arr = new int*[1000000000];
for(int i =0; i < 1000000000; i++) {
arr[i]=new int[0];
}
The memory usage of the second code snippet minus that of the first code snippet is the memory used for the numerous new int[0].