Properties of a pointer to a zero length array - c++

Consider
int main()
{
auto a = new int[0];
delete[] a; // So there's no memory leak
}
Between the copy initialisation and deletion, are you allowed to read the pointer at a + 1?
Furthermore, does the language permit the compiler to set a to nullptr?

Per recent CWG reflector discussion as a result of editorial issue 3178, new int[0] produces what is currently called a "past-the-end" pointer value.
It follows that a cannot be null, and a + 1 is undefined by [expr.add]/4.

auto a = new int[0];
According to [basic.compound.3], the value stored in a must be one of the following:
A pointer to an object (of type int)
A pointer past the end of an object
Null
Invalid
We can rule out the first possibility since there were no objects of type int constructed. The third possibility is ruled out since C++ requires a non-null pointer to be returned (see [basic.stc.dynamic.allocation.2]). Thus we are left with two possibilities: a pointer past the end of an object or an invalid pointer.
I would be inclined to view a as a past-the-end pointer, but I don't have a reputable reference to definitively establish that. (There is, though, a strong implication of this in [basic.stc], seeing how you can delete this pointer.) So I'll entertain both possibilities in this answer.
Between the copy initialisation and deletion, are you allowed to read the pointer at a + 1?
The behavior is undefined, as dictated by [expr.add.4], regardless of which possibility from above applies.
If a is a past-the-end pointer, then it is considered to point to the hypothetical element at index 0 of an array with no elements. Adding the integer j to a is defined only when 0≤0+j≤n, where n is the size of the array. In our case, n is zero, so the sum a+j is defined only when j is 0. In particular, adding 1 is undefined.
If a is invalid, then we cleanly fall into "Otherwise, the behavior is undefined." (Not surprisingly, the cases that are defined cover only valid pointer values.)
Furthermore, does the language permit the compiler to set a to nullptr?
No. From the above-mentioned [basic.stc.dynamic.allocation.2]: "If the request succeeds, the value returned by a replaceable allocation function is a non-null pointer value". There is also a footnote calling out that C++ (but not C) requires a non-null pointer in response to a zero request.

Related

Is checking the value of a dangling pointer safe or Undefined Behavior? [duplicate]

This question already has answers here:
Pointers in c++ after delete
(3 answers)
Closed 1 year ago.
We can only de-reference a valid pointer and we can only check the address that a dangling built-in pointer points to. We cannot access its value (the value in the address of object it is pointing to).
int* ptr = nullptr;
if(ptr) // != 0x00000000
std::cout << *ptr << '\n';
ptr = new int(1000);
if(ptr) // != 0x00000000
std::cout << *ptr << '\n';
delete ptr; // still pointing at the address of that dynamic object but that object has been destroyed.
if(ptr) // succeeds or undefined behavior?
std::cout << *ptr << '\n'; // of course UB here
So it is clear for me but what matter me only is whether checking a pointer value is safe or yields UB? if(ptr). Because let's assume that I didn't access the value in that address like in std::cout << *ptr.
It is safe to dereference a non-null pointer only if it is actually pointing at a valid object, and unfortunately there is no way to test for that condition in C/C++ 1.
1: unless you manually keep track of the addresses of your valid objects and can thus search for the pointed-at address in your own tracking data.
It is not undefined behavior to test whether a pointer is equal to null or not. However, per [basic.stc.general], apparently after a block of memory is destroyed/reclaimed, any use of any pointer value referring to any part within that block is implementation-defined behavior:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values.
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior.
Any other use of an invalid pointer value has implementation-defined behavior.
So, some people may argue that it means a pointer holding the address of a destroyed object MAY OR MAY NOT even be legal to compare against other pointers or even nullptr itself, since the address is invalid. Only the compiler gets to decide if that is legal or not.
Is checking the value of a dangling pointer safe or Undefined Behavior?
It's not UB (since C++14), but "safe" depends on what you expect. There is no guarantee about the result of such check. It could be true or false. Assuming that a pointer is valid based on if(ptr) is not safe in general.
Yes, for int* ptr;, the expression ptr is always safe. The behaviour of the program remains defined. But
The specific value of a pointer is implementation-defined. [Ellipsis , emphasis mine]
[6.8.2.3.4] A value of a pointer type that is a pointer to or past the end of an object represents the address of the first
byte in memory (6.7.1) occupied by the object or the first byte in memory after the end of the storage
occupied by the object, respectively. [...]
The value representation of pointer types is implementation-defined. Pointers
to layout-compatible types shall have the same value representation and alignment requirements (6.7.6).
[Note: Pointers to over-aligned types (6.7.6) have no special representation, but their range of valid values is
restricted by the extended alignment requirement. — end note]
nullptr is guaranteed to be equal(==) to 0 and NULL.
In boolean context, nullptr evaluates to false, ALL other values, no matter whether they represent an address of valid objects, evaluate to true.
delete does not modify the pointer. EDIT: Or maybe it does, see the discussion in comments.
Pointers have same initialization rules as integers, meaning locals, non-static members are not initialized. int* ptr; assert(ptr==nullptr); does NOT hold in general. But global pointers are zero-initialized and thus evaluate to false at the start.
All taken together, although the program is safe, it is quite easy to make it nondeterministic at best.

Calling std::string::assign(const CharT* s, size_type count) with count 0 safe? [duplicate]

I have a function which returns a pointer and a length, and I want to call std::string::assign(pointer, length). Do I have to make a special case (calling clear) when length is zero and the pointer may be nullptr?
The C++ standard says:
21.4.6.3 basic_string::assign
basic_string& assign(const charT* s, size_type n);
Requires: s points to an array of at least n elements of charT.
So what if n is zero? What is an array of zero characters and how does one point to it?
Is it valid to call
s.assign(nullptr, 0);
or is it undefined behavior?
The implementation of libstdc++ appears not to dereference the pointer s when the size n is zero, but that's hardly a guarantee.
Pedantically, a nullptr does not meet the requirements of pointing to an array of size >=0, and therefore the standard does not guarantee the behaviour (it's UB).
On the other hand, the implementation wouldn't be allowed to dereference the pointer if n is zero, because the pointer could be to an array of size zero, and dereferencing such a pointer would have undefined behaviour. Besides, there wouldn't be any need to do so, because nothing is copied.
The above reasoning does not mean that it is OK to ignore the UB. But, if there is no reason to disallow s.assign(nullptr, 0) then it could be preferable to change the wording of the standard to "If n is greater than zero, then s points to ...". I don't know of any good reason to disallow it, but neither can I promise that a good reason doesn't exist.
Note that adding a check is hardly complicated:
s.assign(ptr ? ptr : "", n);
What is an array of zero characters
This is: new char[0]. Arrays of automatic or static storage may not have a zero size.
Well as you point out, the standard says "s points to an array...". A null pointer does not point to an array of any number of elements. Not even 0 elements. Also, note that s points to "an array of at least n elements...". So it's clear that if n is zero, you can still pass a legitimate pointer to an array.
Overall, std::string's API is not well-guarded against null pointers to charT. So you should always make sure that pointers you hand off to it are non-null.
I am not sure why an implementation would dereference any pointer to an array whose length is provided as zero.
That said, I would err to the side of caution. You could argue that you are not meeting the standards requirement:
21.4.6.3 basic_string::assign
8 Requires: s points to an array of at least n elements of charT
because nullptr is not pointing to an array.
So technically the behaviour is undefined.
From the Standard (2.14.7) [lex.nullptr]:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t
is a distinct type that is neither a pointer type nor a pointer to member type ... ]
std::nullptr_t can be implicitly converted to any type of null pointer as per 4.10.1 [conv.ptr]. Regardless of the type of null pointer, the fact remains that it points at nothing.
Thus, it doesn't meet the requirement that s points to an array of at least n elements of charT.
It seems to be undefined behavior.
Interestingly, according to this answer, the C++11 Standard clearly stated that s must not be a null pointer in the basic_string constructor, but this wording has since been removed.

Is incrementing a pointer to a 0-sized dynamic array undefined?

AFAIK, although we cannot create a 0-sized static-memory array, but we can do it with dynamic ones:
int a[0]{}; // Compile-time error
int* p = new int[0]; // Is well-defined
As I've read, p acts like one-past-end element. I can print the address that p points to.
if(p)
cout << p << endl;
Although I am sure of we cannot dereference that pointer (past-last-element) as we cannot with iterators (past-last element), but what I am not sure of is whether incrementing that pointer p? Is an undefined behaviour (UB) like with iterators?
p++; // UB?
Pointers to elements of arrays are allowed to point to a valid element, or one past the end. If you increment a pointer in a way that goes more than one past the end, the behavior is undefined.
For your 0-sized array, p is already pointing one past the end, so incrementing it is not allowed.
See C++17 8.7/4 regarding the + operator (++ has the same restrictions):
f the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.
I guess you've already have the answer; If you look a bit deeper: You've said that incrementing an off-the-end iterator is UB thus: This answer is in what is an iterator?
The iterator is just an object that has a pointer and incrementing that iterator is really incrementing the pointer it has. Thus in many aspects an iterator is handled in terms of a pointer.
int arr[] = {0,1,2,3,4,5,6,7,8,9};
int *p = arr; // p points to the first element in arr
++p; // p points to arr[1]
Just as we can use iterators to traverse the elements in a vector, we can use pointers to traverse the elements in an array. Of course, to do so, we need to obtain pointers to the first and one past the last element. As we’ve just seen, we can obtain a pointer to the first element by using the array itself or by taking the address-of the first element. We can obtain an off-the-end pointer by using another special property of arrays. We can take the address of the nonexistent element one past the last element of an array:
int *e = &arr[10]; // pointer just past the last element in arr
Here we used the subscript operator to index a nonexisting element; arr has ten elements, so the last element in arr is at index position 9. The only thing we can do with this element is take its address, which we do to initialize e. Like an off-the-end iterator (§ 3.4.1, p. 106), an off-the-end pointer does not point to an element. As a result, we may not dereference or increment an off-the-end pointer.
This is from C++ primer 5 edition by Lipmann.
So it is UB don't do it.
In the strictest sense, this is not Undefined Behavior, but implementation-defined. So, although inadvisable if you plan to support non-mainstream architectures, you can probably do it.
The standard quote given by interjay is a good one, indicating UB, but it is only the second best hit in my opinion, since it deals with pointer-pointer arithmetic (funnily, one is explicitly UB, while the other isn't). There is a paragraph dealing with the operation in the question directly:
[expr.post.incr] / [expr.pre.incr]
The operand shall be [...] or a pointer to a completely-defined object type.
Oh, wait a moment, a completely-defined object type? That's all? I mean, really, type? So you don't need an object at all?
It takes quite a bit of reading to actually find a hint that something in there might not be quite so well-defined. Because so far, it reads as if you are perfectly allowed to do it, no restrictions.
[basic.compound] 3 makes a statement about what type of pointer one may have, and being none of the other three, the result of your operation would clearly fall under 3.4: invalid pointer.
It however doesn't say that you aren't allowed to have an invalid pointer. On the contrary, it lists some very common, normal conditions (e.g. end of storage duration) where pointers regularly become invalid. So that's apparently an allowable thing to happen. And indeed:
[basic.stc] 4
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
We are doing an "any other" there, so it's not Undefined Behavior, but implementation-defined, thus generally allowable (unless the implementation explicitly says something different).
Unluckily, that's not the end of the story. Although the net result doesn't change any more from here on, it gets more confusing, the longer you search for "pointer":
[basic.compound]
A valid value of an object pointer type represents either the address of a byte in memory or a null pointer. If an object of type T is located at an address A [...] is said to point to that object, regardless of how the value was obtained.
[ Note: For instance, the address one past the end of an array would be considered to point to an unrelated object of the array's element type that might be located at that address. [...]].
Read as: OK, who cares! As long as a pointer points somewhere in memory, I'm good?
[basic.stc.dynamic.safety]
A pointer value is a safely-derived pointer [blah blah]
Read as: OK, safely-derived, whatever. It doesn't explain what this is, nor does it say I actually need it. Safely-derived-the-heck. Apparently I can still have non-safely-derived pointers just fine. I'm guessing that dereferencing them would probably not be such a good idea, but it's perfectly allowable to have them. It doesn't say otherwise.
An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value.
Oh, so it may not matter, just what I thought. But wait... "may not"? That means, it may as well. How do I know?
Alternatively, an implementation may have strict pointer safety, in which case a pointer value that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object is of dynamic storage duration and has previously been declared reachable
Wait, so it's even possible that I need to call declare_reachable() on every pointer? How do I know?
Now, you can convert to intptr_t, which is well-defined, giving an integer representation of a safely-derived pointer. For which, of course, being an integer, it is perfectly legitimate and well-defined to increment it as you please.
And yes, you can convert the intptr_t back to a pointer, which is also well-defined. Only just, not being the original value, it is no longer guaranteed that you have a safely-derived pointer (obviously). Still, all in all, to the letter of the standard, while being implementation-defined, this is a 100% legitimate thing to do:
[expr.reinterpret.cast] 5
A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size [...] and back to the same pointer type [...] original value; mappings between pointers and integers are otherwise implementation-defined.
The catch
Pointers are just ordinary integers, only you happen to use them as pointers. Oh if only that was true!
Unluckily, there exist architectures where that isn't true at all, and merely generating an invalid pointer (not dereferencing it, just having it in a pointer register) will cause a trap.
So that's the base of "implementation defined". That, and the fact that incrementing a pointer whenever you want, as you please could of course cause overflow, which the standard doesn't want to deal with. The end of application address space may not coincide with the location of overflow, and you do not even know whether there is any such thing as overflow for pointers on a particular architecture. All in all it's a nightmarish mess not in any relation of the possible benefits.
Dealing with the one-past-object condition on the other hand side, is easy: The implementation must simply make sure no object is ever allocated so the last byte in the address space is occupied. So that's well-defined as it's useful and trivial to guarantee.

Is it valid to pass nullptr to std::string::assign?

I have a function which returns a pointer and a length, and I want to call std::string::assign(pointer, length). Do I have to make a special case (calling clear) when length is zero and the pointer may be nullptr?
The C++ standard says:
21.4.6.3 basic_string::assign
basic_string& assign(const charT* s, size_type n);
Requires: s points to an array of at least n elements of charT.
So what if n is zero? What is an array of zero characters and how does one point to it?
Is it valid to call
s.assign(nullptr, 0);
or is it undefined behavior?
The implementation of libstdc++ appears not to dereference the pointer s when the size n is zero, but that's hardly a guarantee.
Pedantically, a nullptr does not meet the requirements of pointing to an array of size >=0, and therefore the standard does not guarantee the behaviour (it's UB).
On the other hand, the implementation wouldn't be allowed to dereference the pointer if n is zero, because the pointer could be to an array of size zero, and dereferencing such a pointer would have undefined behaviour. Besides, there wouldn't be any need to do so, because nothing is copied.
The above reasoning does not mean that it is OK to ignore the UB. But, if there is no reason to disallow s.assign(nullptr, 0) then it could be preferable to change the wording of the standard to "If n is greater than zero, then s points to ...". I don't know of any good reason to disallow it, but neither can I promise that a good reason doesn't exist.
Note that adding a check is hardly complicated:
s.assign(ptr ? ptr : "", n);
What is an array of zero characters
This is: new char[0]. Arrays of automatic or static storage may not have a zero size.
Well as you point out, the standard says "s points to an array...". A null pointer does not point to an array of any number of elements. Not even 0 elements. Also, note that s points to "an array of at least n elements...". So it's clear that if n is zero, you can still pass a legitimate pointer to an array.
Overall, std::string's API is not well-guarded against null pointers to charT. So you should always make sure that pointers you hand off to it are non-null.
I am not sure why an implementation would dereference any pointer to an array whose length is provided as zero.
That said, I would err to the side of caution. You could argue that you are not meeting the standards requirement:
21.4.6.3 basic_string::assign
8 Requires: s points to an array of at least n elements of charT
because nullptr is not pointing to an array.
So technically the behaviour is undefined.
From the Standard (2.14.7) [lex.nullptr]:
The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t
is a distinct type that is neither a pointer type nor a pointer to member type ... ]
std::nullptr_t can be implicitly converted to any type of null pointer as per 4.10.1 [conv.ptr]. Regardless of the type of null pointer, the fact remains that it points at nothing.
Thus, it doesn't meet the requirement that s points to an array of at least n elements of charT.
It seems to be undefined behavior.
Interestingly, according to this answer, the C++11 Standard clearly stated that s must not be a null pointer in the basic_string constructor, but this wording has since been removed.

Will off-the-end pointer overlap with other object?

Considering that C++ does not have bound checking for built-in type arrays, Is it possible that:
One array's off-the-end pointer points to another array's first element?
Yes, a pointer beyond the end of an array could point to another object. Dereferencing a pointer beyond the end of an array results in undefined behavior.
My opinion: yes, it is possible in C++. There have been several SO threads on this topic, none of which reached any solid conclusion. Here is one example.
In some cases we can be sure that there is actually a valid object in memory immediately after the end of the old object. One case is standard-layout structs; another is multi-dimensional arrays. I originally wrote this post with a multi-dimensional array, but I have edited it to use the standard layout struct case, to avoid any objections about what the term "array object" means in the Standard.
struct
{
int a[2];
int b[2];
} foo;
if ( sizeof foo == 4 * sizeof(int) )
{
int *p = &foo.a[0];
++p; // (1)
++p; // (2)
*p = 3; // (3)
++p; // (4)
*p = 5; // (5)
}
Which line causes undefined behaviour (if any)? p is (initially, anyway) a pointer into the array of type int[2] which is designated by foo.a.
After line (2), p is now a one-past-the-end pointer. Is this dereferenceable?
The case of incrementing the pointer is covered by the section on the + operator (it is defined to have the same effect on p as p = p + 1). Here is a quote from C++11 [expr.add]#7:
Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
Line (2) does not cause UB by this clause. What about line (3)?
As far as I can see, there is no clause in the C++ standard that says dereferencing a one-past-the-end pointer causes undefined behaviour. In several places it says that iterators "might not be dereferencable", or "the library does not assume that the iterator is dereferenceable". But it carefully avoids saying "the iterator is not dereferenceable".
From the fact that we proved there is no padding, and the rules about standard-layout structs saying that elements cannot be reordered; we can conclude that now p must hold the address of the element foo.b[0]. Therefore, p is a pointer into the subobject foo.b, as well as being a one-past-the-end pointer for foo.a.
Note that in C99 it is different. The text in C99 for the + operator has (emphasis mine):
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
So, in C99 line (3) causes undefined behaviour. However C++ deliberately omits the bolded line.
Rationale: I don't know what the actual rationale is. However, my "mental model" for C's pointers is that it permits the compiler to implement "fat pointers", i.e. bounds-checked pointers. A pointer may contain the bounds of the (sub-)object that it was pointed to; and so the executable can detect array bounds errors at runtime just based on the pointer value.
I believe the C99 text is compatible with this; and the compiler can produce an executable that aborts on line (3).
However , as already stated, C++ does not have equivalent text and I can find no justification in the C++ Standard for considering (3) to cause UB; nor (4) or (5).
Is it possible that:
One array's off-the-end pointer points to another array's first element?
I'm not sure by what you mean by off the end pointer. As c++ iterators use half open ranges, I'm assuming you mean the pointer that represents the end position in an iteration. As that is one past the end, yes, it might overlap a next array, and hence it may not be dereferenced.
When using pointers as iterators, addresses and not values are compared. End implies the next address beyond end.
Reading beyond the bound of an array might result in dirty read.
It could be possible you may hit another array body
but it could also be possible that you may hit an unallocated region or
in case of int pointer you may point to a 4 byte region shared by an array of two shorts.
Your pointer may try to access a region which does not belongs to your process. Fatal error!
Not recommended to go beyond the bounds.
Regards
Kajal