When are arrays converted to pointers?

When are arrays converted to pointers? - c++

Consider the following simple example computing lenght of an array:
#include <iostream>
int a[] = {1, 2, 4};
int main(){ std::cout << sizeof(a)/sizeof(a[0]) << std::endl; }
DEMO
The Standard N4296::8.3.4/7 [dcl.array]
If E is an n-dimensional array of rank i×j×. . .×k, then E appearing
in an expression that is subject to the array-to-pointer conversion
(4.2) is converted to a pointer to an (n−1)-dimensional array with
rank j ×. . .×k.
N4296::4.2/1 [conv.array]
An lvalue or rvalue of type “array of N T” or “array of unknown bound
of T” can be converted to a prvalue of type “pointer to T”. The result
is a pointer to the first element of the array.
So what is the expressions which are the subject of the convertion? Looks like unevaluated operands are not the subject.
http://coliru.stacked-crooked.com/a/36a1d02c7feff41c

I know of the following expressions in which an array is not converted/decayed to a pointer.
When used in a sizeof operator: sizeof(array)
When used in an addressof operator: &array
When used to bind a reference to an array: int (&ref)[3] = array;.
When deducing the typename to be used for instantiating templates.
When used in decltype: decltype(array)

I don't know if anyone can name all the rules off the top of their head, so a community wiki may be appropriate.
The array to pointer conversion occurs in the following contexts. All references are to the C++11 standard.
As part of an implicit conversion sequence selected by overload resolution1
As part of a standard conversion sequence, in contexts where one is allowed
When initializing an object of non-class type from an array ([dcl.init]/16)2
When assigning to an lvalue of non-class type from an array ([expr.ass]/3)
When a prvalue of pointer type is required as the operand to a built-in operator ([expr]/8)
When subscripting into the array ([expr.sub]/1)
When dereferencing a pointer ([expr.unary.op]/1)
With the unary + operator ([expr.unary.op]/7)
With the binary + operator ([expr.add]/1)
With the binary - operator ([expr.add]/2)
With the relational operators ([expr.rel]/1)
With the equality operators ([expr.eq]/1)
When calling a function, if an argument has array type and is passed to an ellipsis ([expr.call]/7)
When converting from a pointer to base class to a pointer to derived class ([expr.static.cast]/11)
In a reinterpret cast to a non-reference type ([expr.reinterpret.cast]/1)
In a const cast to a non-reference type ([expr.const.cast]/1)
In the second or third operand of the conditional operator, under certain circumstances ([expr.cond])
In a template argument, if the corresponding (non-type) template parameter has pointer to object type ([temp.arg.nontype]/5)
The array to pointer conversion does not occur in the following contexts:
Where an lvalue (or glvalue) is required
By the unary & operator ([expr.unary.op]/3)
In a static cast to reference type ([expr.static.cast]/2, [expr.static.cast]/3)
In a reinterpret cast to reference type ([expr.reinterpret.cast]/11)
In a const cast to reference type ([expr.const.cast]/4)
When binding to a reference to the same array type
In a discarded-value expression ([expr]/10)
In the operand to sizeof ([expr.sizeof]/4)
When the second and third operands to the conditional operator have the same array type and are both glvalues of the same value category
In either operand to the built-in comma operator
1 This includes the case where an array of T is passed to a function expecting cv T*, cv void*, or bool, when a user-defined conversion requires one of those types, etc.
2 This includes contextual conversions to bool as they occur in if statements and the like.

The rule of thumb I work by is "in any part of an expression that produces a value result that can be stored in a pointer but cannot be stored in an array".
So, for example;
The expression array + 0 converts array to a pointer before doing the addition, and gives a result that is a pointer.
f(array) converts array to a pointer before calling the function f() that accepts a pointer or an array (not a reference).
array[0] is not required to convert array to a pointer (but the
compiler is free to, since it makes no difference on the result of that expression).
sizeof array does not convert array to a pointer (since it doesn't
evaluate array at all, just its size)
The expression p = array converts array to a pointer and that value
is stored in p
I'm sure there are some cases I've missed, but that simple rule works reasonably well. Of course, it is based on an understanding of what an expression is.....

In your example code, a[0] is identical to *(a + 0), and is thus subject to array-to-pointer conversion. See the Built-in subscript operator section here.

Related

Implementing the Linux Kernel's __is_constexpr (ICE_P) macro in pure C++

After reading about the standard C11 version of Martin Uecker's ICE_P predicate, I tried to implement it in pure C++. The C11 version, making use of _Generic selection is as follows:
#define ICE_P(x) _Generic((1? (void *) ((x)*0) : (int *) 0), int*: 1, void*: 0)
The obvious approach for C++ is to replace _Generic by a template and decltype, such as:
template<typename T> struct is_ice_helper;
template<> struct is_ice_helper<void*> { enum { value = false }; };
template<> struct is_ice_helper<int*> { enum { value = true }; };
#define ICE_P(x) (is_ice_helper<decltype(1? (void *) ((x)*0) : (int *) 0)>::value)
However, it fails the simplest test. Why can't it detect integer constant expressions?

The issue is subtle. The specification for determining the composite type of the conditional expression's pointer operands are similar in C++ to the ones in C, so it starts off looking promising:
(N4659) [expr.cond]
7 Lvalue-to-rvalue, array-to-pointer, and function-to-pointer
standard conversions are performed on the second and third operands.
After those conversions, one of the following shall hold:
[...]
One or both of the second and third operands have pointer type; pointer conversions, function pointer conversions, and qualification
conversions are performed to bring them to their composite pointer
type (Clause [expr]). The result is of the composite pointer type.
[...]
The reduction to the composite pointer type is specified as follows:
(N4659) [expr]
5 The composite pointer type of two operands p1 and p2 having
types T1 and T2, respectively, where at least one is a pointer or
pointer to member type or std::nullptr_t, is:
if both p1 and p2 are null pointer constants, std::nullptr_t;
if either p1 or p2 is a null pointer constant, T2 or T1, respectively;
if T1 or T2 is “pointer to cv1 void” and the other type is “pointer to cv2 T”, where T is an object type or void, “pointer to cv12 void”,
where cv12 is the union of cv1 and cv2;
[...]
So the result of our ICE_P macro is determined by which of the bullets above we land one after checking each in order. Given how we defined is_ice_helper, we know that the composite type is not nullptr_t, otherwise we'd hit the first bullet, and will get an error due to the missing template specialization. So we must be hitting bullet number 3, making the predicate report false. It all seems to hinge on the definition of a null pointer constant.
(N4659) [conv.ptr] (emphasis mine)
1 A null pointer constant is an integer literal with value
zero or a prvalue of type std::nullptr_t. A null pointer
constant can be converted to a pointer type; the result is the null
pointer value of that type and is distinguishable from every other
value of object pointer or function pointer type. Such a conversion is
called a null pointer conversion. Two null pointer values of the same
type shall compare equal. The conversion of a null pointer constant to
a pointer to cv-qualified type is a single conversion, and not the
sequence of a pointer conversion followed by a qualification
conversion. A null pointer constant of integral type can be converted
to a prvalue of type std::nullptr_t.
Since (int*)0 is not a null pointer constant by the definition above, we do not qualify for the first bullet of [expr]/5. The composite type is not std::nullptr_t. Neither is (void *) ((x)*0) a null pointer constant, nor can it be turned into one. Removing the cast (something the definition doesn't allow) leaves us with (x)*0. This is a integer constant expression with value zero. But it is not an integer literal with value zero! The definition of a null pointer constant in C++ deviates from the one in C!
(N1570) 6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer constant. If
a null pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
C allows arbitrary constant expressions with value zero to form a null pointer constant, while C++ requires integer literals. Given C++'s rich support for computing constant expressions of a variety of literal types, this seems like a needless restriction. And one that makes the above approach to ICE_P a non-starter in C++.

C++ how does cast with reference work?

Can anyone explain what's happening in the following code?
char cd[1024];
unsigned short int & messageSize =reinterpret_cast<unsigned short int&>(*cd);
does it take the first 2 char of cd by reference and cast it to a 16 bit int?
when I remove the '&', the compiler complains about cannot cast from char to unsigned short int.
unsigned short int messageSize =reinterpret_cast<unsigned short int>(*cd);

The "intuitive" meaning of reinterpret_cast is "take a sequence of bits and treat it as if that sequence of bits has a different type". That is not possible to do for types char and unsigned short, because they have different width.
As for the first case, the intuition is: reinterpret_cast treats lvalue reference as if it was a pointer to the type it refers (and applies mentioned conversion to that pointer).
Formally, the standard says:
4.2 Array-to-pointer conversion [conv.array]
An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to a prvalue of type “pointer to T”. The result is a pointer to the first element of the array.
and:
5.3.1 Unary operators [expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function
to which the expression points. If the type of the expression is “pointer to T”, the type of the result is
“T”.
So, after dereferencing *cd we will get an lvalue of type char (same as if you wrote cd[0]).
5.2.10 Reinterpret cast [expr.reinterpret.cast]
A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type. [ Note: That is, for lvalues, a reference cast reinterpret_cast<T&>(x) has the same effect as the conversion *reinterpret_cast<T*>(&x) with the built-in & and * operators (and similarly for reinterpret_cast<T&&>(x)). — end note ] No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called.
That means, you have got something like
*reinterpret_cast<unsigned short *>(&cd[0])
But what is perhaps more important than all the above:
3.10 Lvalues and rvalues [basic.lval]
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
...
a char or unsigned char type.
That is, binding a "reference to char" to an object of type "unsigned short" is ok. But doing vise-versa (i.e., as in your example) is sort-of-not-ok, because accessing such reference would invoke undefined behavior.

Cast with reference is different from the same cast without reference in one thing - cast without reference creates a new temporary object, while cast with reference changes the type of the already existing object. This matters in many cases, for example, in your case, since you assigning a result to a non-const reference. Non-const references can not be inialized with temporary objects.
On a side note, you know that what you are doing here is a violation on type aliasing rule, and is yields undefeined behaviour?

unsigned short int & messageSize means that messageSize is a variable of type unsigned short int, and the memory area where that variable will be stored shall be given as initializer.
The initializer =reinterpret_cast<unsigned short int&>(*cd) says: take the memory at the location being pointed to by cd, and pretend it contains a unsigned short int.
The result is that if you try to read and write messageSize, then you will try to read and write a unsigned short int in a memory location that contains something else. This causes undefined behaviour.
There are a few situations in which it is OK to pretend a memory location contains an object that it actually doesn't; this is not one of them.
If your compiler is not performing aliasing optimizations then it might appear as if your code "works" for now. However the code is broken.

reinterpret_cast<unsigned short int&>(*cd);
is similar to
*reinterpret_cast<unsigned short int*>(cd);

reinterpret_cast an iterator to a pointer

I've got an iterator of Things. If I want to convert the current item to a pointer to the item, why does this work:
thing_pointer = &(*it);
But this not:
thing_pointer = reinterpret_cast<Thing*>(it);
This is the compiler error I'm trying to comprehend: http://msdn.microsoft.com/en-us/library/sy5tsf8z(v=vs.90).aspx
Just in case, the type of the iterator is std::_Vector_iterator<std::_Vector_val<Thing,std::allocator<Thing> > >

In
&(*it);
the * is overloaded to do what you logically mean: convert the iterator type to its pointed-to object. You can then safely take the address of this object.
Whereas in
reinterpret_cast<Thing*>(it);
you are telling the compiler to literally reinterpret the it object as a pointer. But it might not be a pointer at all -- it might be a 50-byte struct, for all you know! In that case, the first sizeof (Thing*) bytes of it will absolutely not happen to point at anything sensible.
Tip: reinterpret_cast<> is nearly always the wrong thing.

Obligitory Standard Quotes, emphasis mine:
5.2.19 Reinterpret cast
1/ [...] Conversions that can be performed explicitly using
reinterpret_cast are listed below. No other conversion can be
performed explicitly using reinterpret_cast.
4/ A pointer can be explicitly converted to any integral type large
enough to hold it. [...]
5/ A value of integral type or enumeration type can be explicitly
converted to a pointer. [...]
6/ A function pointer can be explicitly converted to a function
pointer of a different type. [...]
7/ An object pointer can be explicitly converted to an object pointer
of a different type. [...]
8/ Converting a function pointer to an object pointer type or vice
versa is conditionally-supported. [...]
9/ The null pointer value (4.10) is converted to the null pointer
value of the destination type. [...]
10/ [...] “pointer to member of X of type T1” can be explicitly
converted to [...] “pointer to member of Y of type T2” [...]
11/ A [...] T1 can be cast to the type “reference to T2” if an
expression of type “pointer to T1” can be explicitly converted to the
type “pointer to T2” using a reinterpret_cast. [...]
With the exception of the integral-to-pointer and value-to-reference conversions noted in 4/, 5/ and 11/ the only conversions that can be performed using reinterpret_cast are pointer-to-pointer conversions.
However in:
thing_pointer = reinterpret_cast<Thing*>(it);
it is not a pointer, but an object. It just so happens that this object was designed to emulate a pointer in many ways, but it's still not a pointer.

Because * operator of iterator is overloaded and it return a
reference to the object it points on.
You can force it by thing_pointer = *(reinterpret_cast<Thing**>(&it));. But it's undefined behavior.

Because iterator is not a pointer. It is a class of implementation-defined structure, and if you try to reinterpret it to a pointer, the raw data of the iterator class will be taken as a memory pointer, which may, but probably will not point to valid memory

The first gets a reference to the object, then takes the address of it, giving the pointer.
The second tries to cast the iterator to a pointer, which is likely to fail because most types can't be cast to pointers - only other pointers, integers, and class types with a conversion operator.

Reshaping a 1-d array to a multidimensional array

Taking into consideration the entire C++11 standard, is it possible for any conforming implementation to succeed the first assertion below but fail the latter?
#include <cassert>
int main(int, char**)
{
const int I = 5, J = 4, K = 3;
const int N = I * J * K;
int arr1d[N] = {0};
int (&arr3d)[I][J][K] = reinterpret_cast<int (&)[I][J][K]>(arr1d);
assert(static_cast<void*>(arr1d) ==
static_cast<void*>(arr3d)); // is this necessary?
arr3d[3][2][1] = 1;
assert(arr1d[3 * (J * K) + 2 * K + 1] == 1); // UB?
}
If not, is this technically UB or not, and does that answer change if the first assertion is removed (is reinterpret_cast guaranteed to preserve addresses here?)? Also, what if the reshaping is done in the opposite direction (3d to 1d) or from a 6x35 array to a 10x21 array?
EDIT: If the answer is that this is UB because of the reinterpret_cast, is there some other strictly compliant way of reshaping (e.g., via static_cast to/from an intermediate void *)?

Update 2021-03-20:
This same question was asked on Reddit recently and it was pointed out that my original answer is flawed because it does not take into account this aliasing rule:
If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:
the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
a char, unsigned char, or std::byte type.
Under the rules for similarity, these two array types are not similar for any of the above cases and therefore it is technically undefined behaviour to access the 1D array through the 3D array. (This is definitely one of those situations where, in practice, it will almost certainly work with most compilers/targets)
Note that the references in the original answer refer to an older C++11 draft standard
Original answer:
reinterpret_cast of references
The standard states that an lvalue of type T1 can be reinterpret_cast to a reference to T2 if a pointer to T1 can be reinterpret_cast to a pointer to T2 (§5.2.10/11):
An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast.
So we need to determine if a int(*)[N] can be converted to an int(*)[I][J][K].
reinterpret_cast of pointers
A pointer to T1 can be reinterpret_cast to a pointer to T2 if both T1 and T2 are standard-layout types and T2 has no stricter alignment requirements than T1 (§5.2.10/7):
When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void.
Are int[N] and int[I][J][K] standard-layout types?
int is a scalar type and arrays of scalar types are considered to be standard-layout types (§3.9/9).
Scalar types, standard-layout class types (Clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called standard-layout types.
Does int[I][J][K] have no stricter alignment requirements than int[N].
The result of the alignof operator gives the alignment requirement of a complete object type (§3.11/2).
The result of the alignof operator reflects the alignment requirement of the type in the complete-object case.
Since the two arrays here are not subobjects of any other object, they are complete objects. Applying alignof to an array gives the alignment requirement of the element type (§5.3.6/3):
When alignof is applied to an array type, the result shall be the alignment of the element type.
So both array types have the same alignment requirement.
That makes the reinterpret_cast valid and equivalent to:
int (&arr3d)[I][J][K] = *reinterpret_cast<int (*)[I][J][K]>(&arr1d);
where * and & are the built-in operators, which is then equivalent to:
int (&arr3d)[I][J][K] = *static_cast<int (*)[I][J][K]>(static_cast<void*>(&arr1d));
static_cast through void*
The static_cast to void* is allowed by the standard conversions (§4.10/2):
A prvalue of type “pointer to cv T,” where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The result of converting a “pointer to cv T” to a “pointer to cv void” points to the start of the storage location where the object of type T resides, as if the object is a most derived object (1.8) of type T (that is, not a base class subobject).
The static_cast to int(*)[I][J][K] is then allowed (§5.2.9/13):
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1.
So the cast is fine! But are we okay to access objects through the new array reference?
Accessing array elements
Performing array subscripting on an array like arr3d[E2] is equivalent to *((E1)+(E2)) (§5.2.1/1). Let's consider the following array subscripting:
arr3d[3][2][1]
Firstly, arr3d[3] is equivalent to *((arr3d)+(3)). The lvalue arr3d undergoes array-to-pointer conversion to give a int(*)[2][1]. There is no requirement that the underlying array must be of the correct type to do this conversion. The pointers value is then accessed (which is fine by §3.10) and then the value 3 is added to it. This pointer arithmetic is also fine (§5.7/5):
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
This this pointer is dereferenced to give an int[2][1]. This undergoes the same process for the next two subscripts, resulting in the final int lvalue at the appropriate array index. It is an lvalue due to the result of * (§5.3.1/1):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
It is then perfectly fine to access the actual int object through this lvalue because the lvalue is of type int too (§3.10/10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object
[...]
So unless I've missed something. I'd say this program is well-defined.

I am under the impression that it will work. You allocate the same piece of contiguous memory. I know the C-standard guarantees it will be contiguous at least. I don't know what is said in the C++11 standard.
However the first assert should always be true. The address of the first element of the array will always be the same. All memory address will be the same since the same piece of memory is allocated.
I would therefore also say that the second assert will always hold true. At least as long as the ordering of the elements are always in row major order. This is also guaranteed by the C-standard and I would be surprised if the C++11 standard says anything differently.

Is using an array as a conditional expression valid in C++?

I have this code:
int main()
{
char buffer[10];
if( buffer ) {
return 1;
}
return 0;
}
which Visual C++ 10 interprets like this: buffer decays to a pointer, then a pointer is compared against null. When this is compiled with /O2 the check gets eliminated and the code gets equivalent to just return 1;.
Is the code above valid? Does Visual C++ compile it right (I mean the decaying part, not the optimization)?

C++11, 6.4/4:
The value of a condition that is an expression is the value of the
expression, contextually converted to bool for statements other than
switch; if that conversion is ill-formed, the program is ill-formed.
So the standard says that the compiler has to perform any implicit conversions at its disposal to convert the array to a boolean. Decaying the array to pointer and converting the pointer to boolean with a test against against equality to null is one way to do that, so yes the program is well-defined and yes it does produce the correct result -- obviously, since the array is allocated on the stack, the pointer it decays to can never be equal to the null pointer.
Update: As to why this chain of two conversions is followed:
C++11, 4.2/1:
An lvalue or rvalue of type “array of N T” or “array of unknown bound
of T” can be converted to a prvalue of type “pointer to T”. The result
is a pointer to the first element of the array.
So, the only legal conversion from an array type is to a pointer to element type. There is no choice in the first step.
C++11, 4.12/1:
A prvalue of arithmetic, unscoped enumeration, pointer, or pointer
to member type can be converted to a prvalue of type bool. A zero
value, null pointer value, or null member pointer value is converted
to false; any other value is converted to true. A prvalue of type
std::nullptr_t can be converted to a prvalue of type bool; the
resulting value is false.
There is an implicit conversion directly from bare pointer to boolean; so the compiler picks that as the second step because it allows the desired result (conversion to boolean) to be immediately reached.

Yes, the conversion from an array type to bool is well-defined by the standard conversions. Quoting C++11, 4/1 (with the relevant conversions highlighted):
A standard conversion sequence is a sequence of standard conversions in the following
order:
— Zero or one conversion from the following set: lvalue-to-rvalue conversion, array-to-pointer conversion,
and function-to-pointer conversion.
— Zero or one conversion from the following set: integral promotions, floating point promotion, integral
conversions, floating point conversions, floating-integral conversions, pointer conversions, pointer to
member conversions, and boolean conversions.
— Zero or one qualification conversion.
A standard conversion sequence will be applied to an expression if necessary to convert it to a required
destination type.

Yes.
if( buffer ) means: check if buffer is not NULL. An array variable points to the start of the array (unless you move it) and is equivalent to a pointer.
The optimization just returns 1 because that buffer is allocated on the stack, so it definitely has a value (pointer to the location on the stack), so it's always true.

You said it yourself :
buffer decays to a pointer
Since the array is on the stack, it can not be NULL (unless something goes wrong, like stack smashing).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js