The code is below:
int main(void) {
char* str = "12345678";
int* in = (int*)str;
printf("%d\n%d\n", in[0], in[1]);
return 0;
}
What is result? Why?
What is result?
Implementation defined.
Why?
Because the standard says so.
§5.2.10.7
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of
object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast(static_cast(v)). Converting a prvalue of type “pointer to T1” to the type “pointer to
T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer value.
BTW, this is illegal in c++: char* str = "12345678"; as string literals are const. To be correct it should be: const char* str = "12345678";
The answer is (for my system)
875770417
943142453
Here goes the reason :
str points to "12345678". In my system int is 4 bytes. So *in points to "1234" and *in+1 points to "5678".
Inside the memory char is stored in binary form.
Now lets have look at in[0]. in[0] would evaluate "1234". The memory representation would be this (in binary):
00110001 00110010 00110011 00110100
(These are ascii values for 1, 2, 3, 4 which are - 49, 50, 51, 52)
Now as you have assigned char to int, now the compiler will load the about 32 bits as one word. Depending upon BIG ENDIAN or LITTLE ENDIAN evaluation will happen. My system is LITTLE ENDIAN.
So my evaluation happened this way -
52*(2^24) + 51*(2^16) + 50*(2^8) + 49 = 8775770417
Similar interpretation for in[1]. Hope this clears things.
The result is undefined behaviour, which n4296 (the final draft of C++ 14) defines as
behavior for which this International Standard imposes no requirements
Anything can happen. Seg fault, hard drive wiped, etc. A quite plausible result is that the ASCII characters "1234" are treated as the bytes of a four byte little endian integer, and the resulting number printed out (and similarly for "5678") - but don't rely on this.
Related
I think that I am misreading the standard quotation, hence I do not fully understand what's the exact intent of the wording.
Firstly, I am already aware of what alignment requirement is, but I can't figure out the exact relation between alignment requirement and casting in general, and what're the points I should care about, regarding alignment requirement, when I perform static_casting or reinterpret_casting. I think now the reader got my first question.
Secondly, there're some words in the standard quotation I spend two days to understand them but I don't. From the paragraph in:
N4885: 7.6.1.9 Static cast [expr.static.cast]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T”, where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. If
the original pointer value represents the address A of a byte in
memory and A does not satisfy the alignment requirement of T, then the
resulting pointer value is unspecified.
Here, they said "if the original pointer value doesn't satisfy the alignment requirement of T, then the resulting pointer value is unspecified". What really does that mean?
What's I can't understand is when does this "original pointer value" satisfies the alignment requirement of T, and when does not, to avoid such unspecified pointer value. I just need someone to explain the bold part from the above quote with simple examples; what I have to know, as a programmer, from that bold part and what I have to avoid. For example:
int i = 12;
double *pd = static_cast<double *>(static_cast<void *>(&i)); // does 'pd' has unspecified address value?.
short *ps = static_cast<short *>(static_cast<void *>(&i)); // does 'ps' has unspecified address value?.
Finally, there's a relatively same sentence I need to understand in:
N4885: 7.6.1.10 Reinterpret cast [expr.reinterpret.cast]
An object pointer can be explicitly converted to an object pointer of
a different type.61 When a prvalue v of object pointer type is
converted to the object pointer type “pointer to cv T”, the result is
static_cast<cv T*>(static_cast<cv void*>(v)). [Note 7: Converting a
pointer of type “pointer to T1” that points to an object of type T1 to
the type “pointer to T2” (where T2 is an object type and the
alignment requirements of T2 are no stricter than those of T1) and
back to its original type yields the original pointer value. — end
note].
What does the standard mean by this sentence "the alignment requirements of T2 are no stricter than those of T1", what does the word "stricter than" mean.
I think if I have this static_assert expression, then maybe the alignment requirements of T2 would not be stricter than those of T1: static_assert(alignof(T1) >= alignof(T2)); Or this assertion is not true for some cases.
int i = 34;
double *pd = reinterpret_cast<double *>(&i); // does 'pd' has unspecified address value?.
short *ps = reinterpret_cast<short *>(&i); // does 'ps' has unspecified address value?.
I am added these example to just clear what my problem lies, not to just answer the questions in the // comments
While common implementations have integer-like pointers (such that reinterpret_cast behaves like memcpy between pointers and integers and arithmetic on the pointers is reflected in the integer values), the standard as usual provides only weak guarantees to support less common architectures where pointers have other formats and/or special registers. As such, it’s impossible to observe alignment of a dynamic pointer value: the unspecified value applies if alignof(expression_type)<alignof(cast_type) unless the pointer refers to an object declared with alignas or to an object whose actual type is more strongly aligned.
This means that double* is a poor type to use for the sort of “temporary pointer storage” for which reinterpret_cast exists; fortunately, most C++ code uses templates (so this doesn’t come up), old C code uses char* (whose alignment is 1), and other code uses void* (which has no alignment), so there’s rarely an actual issue here.
Consider the following code.
#include <stdio.h>
int main() {
typedef int T;
T a[] = { 1, 2, 3, 4, 5, 6 };
T(*pa1)[6] = (T(*)[6])a;
T(*pa2)[3][2] = (T(*)[3][2])a;
T(*pa3)[1][2][3] = (T(*)[1][2][3])a;
T *p = a;
T *p1 = *pa1;
//T *p2 = *pa2; //error in c++
//T *p3 = *pa3; //error in c++
T *p2 = **pa2;
T *p3 = ***pa3;
printf("%p %p %p %p %p %p %p\n", a, pa1, pa2, pa3, p, p1, p2, p3);
printf("%d %d %d %d %d %d %d\n", a[5], (*pa1)[5],
(*pa2)[2][1], (*pa3)[0][1][2], p[5], p1[5], p2[5], p3[5]);
return 0;
}
The above code compiles and runs in C, producing the expected results. All the pointer values are the same, as are all the int values. I think the result will be the same for any type T, but int is the easiest to work with.
I confessed to being initially surprised that dereferencing a pointer-to-array yields an identical pointer value, but on reflection I think that is merely the converse of the array-to-pointer decay we know and love.
[EDIT: The commented out lines trigger errors in C++ and warnings in C. I find the C standard vague on this point, but this is not the real question.]
In this question, it was claimed to be Undefined Behaviour, but I can't see it. Am I right?
Code here if you want to see it.
Right after I wrote the above it dawned on me that those errors are because there is only one level of pointer decay in C++. More dereferencing is needed!
T *p2 = **pa2; //no error in c or c++
T *p3 = ***pa3; //no error in c or c++
And before I managed to finish this edit, #AntonSavin provided the same answer. I have edited the code to reflect these changes.
This is a C-only answer.
C11 (n1570) 6.3.2.3 p7
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned*) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
*) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
The standard is a little vague what happens if we use such a pointer (strict aliasing aside) for anything else than converting it back, but the intent and wide-spread interpretation is that such pointers should compare equal (and have the same numerical value, e.g. they should also be equal when converted to uintptr_t), as an example, think about (void *)array == (void *)&array (converting to char * instead of void * is explicitly guaranteed to work).
T(*pa1)[6] = (T(*)[6])a;
This is fine, the pointer is correctly aligned (it’s the same pointer as &a).
T(*pa2)[3][2] = (T(*)[3][2])a; // (i)
T(*pa3)[1][2][3] = (T(*)[1][2][3])a; // (ii)
Iff T[6] has the same alignment requirements as T[3][2], and the same as T[1][2][3], (i), and (ii) are safe, respectively. To me, it sounds strange, that they couldn’t, but I cannot find a guarantee in the standard that they should have the same alignment requirements.
T *p = a; // safe, of course
T *p1 = *pa1; // *pa1 has type T[6], after lvalue conversion it's T*, OK
T *p2 = **pa2; // **pa2 has type T[2], or T* after conversion, OK
T *p3 = ***pa3; // ***pa3, has type T[3], T* after conversion, OK
Ignoring the UB caused by passing int * where printf expects void *, let’s look at the expressions in the arguments for the next printf, first the defined ones:
a[5] // OK, of course
(*pa1)[5]
(*pa2)[2][1]
(*pa3)[0][1][2]
p[5] // same as a[5]
p1[5]
Note, that strict aliasing isn’t a problem here, no wrongly-typed lvalue is involved, and we access T as T.
The following expressions depend on the interpretation of out-of-bounds pointer arithmetic, the more relaxed interpretation (allowing container_of, array flattening, the “struct hack” with char[], etc.) allows them as well; the stricter interpretation (allowing a reliable run-time bounds-checking implementation for pointer arithmetic and dereferencing, but disallowing container_of, array flattening (but not necessarily array “lifting”, what you did), the struct hack, etc.) renders them undefined:
p2[5] // UB, p2 points to the first element of a T[2] array
p3[5] // UB, p3 points to the first element of a T[3] array
The only reason your code compiles in C is that your default compiler setup allows the compiler to implicitly perform some illegal pointer conversions. Formally, this is not allowed by C language. These lines
T *p2 = *pa2;
T *p3 = *pa3;
are ill-formed in C++ and produce constraint violations in C. In casual parlance, these lines are errors in both C and C++ languages.
Any self-respecting C compiler will issue (is actually required to issue) diagnostic messages for these constraint violations. GCC compiler, for one example, will issue "warnings" telling you that pointer types in the above initializations are incompatible. While "warnings" are perfectly sufficient to satisfy standard requirements, if you really want to use GCC compiler's ability to recognize constraint violating C code, you have to run it with -pedantic-errors switch and, preferably, explicitly select standard language version by using -std= switch.
In your experiment, C compiler performed these implicit conversions for you as a non-standard compiler extension. However, the fact that GCC compiler running under ideone front completely suppressed the corresponding warning messages (issued by the standalone GCC compiler even in its default configuration) means that ideone is a broken C compiler. Its diagnostic output cannot be meaningfully relied upon to tell valid C code from invalid one.
As for the conversion itself... It is not undefined behavior to perform this conversion. But it is undefined behavior to access array data through the converted pointers.
UPDATE: The following applies to C++ only, for C scroll down.
In short, there's no UB in C++ and there is UB in C.
8.3.4/7 says:
A consistent rule is followed for multidimensional arrays. If E is an n-dimensional array of rank i x j x ... x k,
then E appearing in an expression that is subject to the array-to-pointer conversion (4.2) is converted to a
pointer to an (n - 1)-dimensional array with rank j x ... x k. If the * operator, either explicitly or implicitly
as a result of subscripting, is applied to this pointer, the result is the pointed-to (n - 1)-dimensional array,
which itself is immediately converted into a pointer.
So this won't produce error in C++ (and will work as expected):
T *p2 = **pa2;
T *p3 = ***pa3;
Regarding whether this is UB or not. Consider the very first conversion:
T(*pa1)[6] = (T(*)[6])a;
In C++ it's in fact
T(*pa1)[6] = reinterpret_cast<T(*)[6]>(a);
And this is what the standard says about reinterpret_cast:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue
v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast< cv
T2 * >(static_cast< cv void * >(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment
requirements of T2 are no stricter than those of T1, or if either type is void.
So a is converted to pa1 through static_cast to void* and back. Static cast to void* is guaranteed to return the real address address of a as stated in 4.10/2:
A prvalue of type “pointer to cv T,” where T is an object type, can be converted to a prvalue of type “pointer
to cv void”. The result of converting a non-null pointer value of a pointer to object type to a “pointer to
cv void” represents the address of the same byte in memory as the original pointer value.
Next static cast to T(*)[6] is again guaranteed to return the same address as stated in 5.2.9/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is
an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer
value is converted to the null pointer value of the destination type. If the original pointer value represents
the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer
value represents the same address as the original pointer value, that is, A
So the pa1 is guaranteed point to the same byte in memory as a, and so access to data through it is perfectly valid because the alignment of arrays is the same as the alignment of underlying type.
What about C?
Consider again:
T(*pa1)[6] = (T(*)[6])a;
In C11 standard, 6.3.2.3/7 states the following:
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
It means that unless the conversion is to char*, the value of converted pointer is not guaranteed to be equal to value of original pointer, resulting in undefined behavior when accessing data through converted pointer. In order to make it work, the conversion has to be done explicitly through void*:
T(*pa1)[6] = (T(*)[6])(void*)a;
Conversions back to T*
T *p = a;
T *p1 = *pa1;
T *p2 = **pa2;
T *p3 = ***pa3;
All of these are conversions from array of T to pointer to T, which are valid in both C++ and C, and no UB is triggered by accessing the data through converted pointers.
int main()
{
char buffer[5] = { 0 };
buffer[0] = 23;
std::string s(&buffer[0], 4);
std::uint32_t nb = *reinterpret_cast<const std::uint32_t*>(s.data());
return 0;
}
For this program, is reinterpret_cast's output implementation dependent? Or will any compiler conforming to the c++ standard always produce the same output?
For your example code, if you're looking for something that "any compiler conforming to the c++ standard always produce the same output", the answer is that there's no such guarantee.
A couple easy examples: alignment issues (as mentioned in several comments) and endianness differences.
C++11 5.2.10/7 "Reinterpret cast" says:
An object pointer can be explicitly converted to an object pointer of
a different type. When a prvalue v of type “pointer to T1” is
converted to the type “pointer to cv T2”, the result is
static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout
types (3.9) and the alignment requirements of T2 are no stricter than
those of T1, or if either type is void. Converting a prvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value. The result of any other such pointer
conversion is unspecified.
Since uint32_t will generally have a stricter alignment requirement than char[], the standard doesn't make any promises about the behavior (since the above only talks about the situation where the alignment requirements are met). So strictly speaking the behavior is undefined.
Now, lets assume that you're interested only in platforms where the alignment requirements are met (ie., uint32_t can be aligned on any address, same as char). Then your expression involving the reinterpret cast is equivalent to (note that you'd have to cast away the const from the const char* returned from std::string::data() as well):
std::uint32_t nb = *(static_cast<std::uint32_t*>(static_cast<void*>(const_cast<char*>(s.data()))));
The standard says this about using static_cast with object pointers (other than conversion between pointers in a class heirarchy) in 5.2.9/13 "Static cast":
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T,” where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. The
null pointer value is converted to the null pointer value of the
destination type. A value of type pointer to object converted to
“pointer to cv void” and back, possibly with different
cv-qualification, shall have its original value.
So, as far as the standard is concerned, all that you can do with the resulting pointer is cast it back to get the original value. Anything else would be undefined behavior (that an implementation might give a better guarantee on).
3.10/10 "Lvalues and rvalues" allows an object to be accessed through char or unsigned char types as well.
However, to reiterate: the standard does not guarantee that "any compiler conforming to the c++ standard always produce the same output" for the example you posted.
You're casting to std::uint32_t a buffer that is not necessarily properly aligned for such a value.
That's likely to blow up and/or be hugely inefficient.
The unsigned integer type means that any bitpattern for the value representation bits is OK, and on the PC platform for built-in type there are no bits other than the value representation bits; in particular no trap bits or trapping total bitpatterns.
Thus, you can do a memcpy and you'll be fine, technically – provided there are enough bytes, that s.length() >= sizeof(std::uint32_t).
However, such a conversion, if it occurred in ordinary code, would be a strong code-smell, an indication of something fundamentally wrong in the design.
Addendum, regarding “Or a compiler respectfull to the c++ standard will always produce the same output”.
I somehow didn’t see that when I answered. But the short answer is that if the conversion is performed in a way that works, such as using memcpy, then it depends on the endianness, a.k.a. byte order, in practice whether the most significant or least significant part of an integer is placed at lowest address.
In practice you can use network-oriented functions that convert to from network byte order. Just assume network byte order for the serialized data. Check out ntohl et al (these are not part of the C++ standard library, but commonly available).
We can look at the representation of an object of type T by converting a T* that points at that object into a char*. At least in practice:
int x = 511;
unsigned char* cp = (unsigned char*)&x;
std::cout << std::hex << std::setfill('0');
for (int i = 0; i < sizeof(int); i++) {
std::cout << std::setw(2) << (int)cp[i] << ' ';
}
This outputs the representation of 511 on my system: ff 01 00 00.
There is (surely) some implementation defined behaviour occurring here. Which of the casts is allowing me to convert an int* to an unsigned char* and which conversions does that cast entail? Am I invoking undefined behaviour as soon as I cast? Can I cast any T* type like this? What can I rely on when doing this?
Which of the casts is allowing me to convert an int* to an unsigned char*?
That C-style cast in this case is the same as reinterpret_cast<unsigned char*>.
Can I cast any T* type like this?
Yes and no. The yes part: You can safely cast any pointer type to a char* or unsigned char* (with the appropriate const and/or volatile qualifiers). The result is implementation-defined, but it is legal.
The no part: The standard explicitly allows char* and unsigned char* as the target type. However, you cannot (for example) safely cast a double* to an int*. Do this and you've crossed the boundary from implementation-defined behavior to undefined behavior. It violates the strict aliasing rule.
Your cast maps to:
unsigned char* cp = reinterpret_cast<unsigned char*>(&x);
The underlying representation of an int is implementation defined, and viewing it as characters allows you to examine that. In your case, it is 32-bit little endian.
There is nothing special here -- this method of examining the internal representation is valid for any data type.
C++03 5.2.10.7: A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
This suggests that the cast results in unspecified behavior. But pragmatically speaking, casting from any pointer type to char* will always allow you to examine (and modify) the internal representation of the referenced object.
The C-style cast in this case is equivalent to reinterpret_cast. The Standard describes the semantics in 5.2.10. Specifically, in paragraph 7:
"A pointer to an object can be explicitly converted to a pointer to a
different object type.70 When a prvalue v of type “pointer to T1” is
converted to the type “pointer to cvT2”, the result is
static_cast<cvT2*>(static_cast<cvvoid*>(v)) if both T1 and T2 are
standard-layout types (3.9) and the alignment requirements of T2 are
no stricter than those of T1. Converting a prvalue of type “pointer to
T1” to the type “pointer to T2” (where T1 and T2 are object types and
where the alignment requirements of T2 are no stricter than those of
T1) and back to its original type yields the original pointer value.
The result of any other such pointer conversion is unspecified."
What it means in your case, the alignment requirements are satisfied, and the result is unspecified.
The implementation behaviour in your example is the endianness attribute of your system, in this case your CPU is a little endian.
About the type casting, when you cast an int* to char* all what you are doing is telling the compiler to interpret what cp is pointing to as a char, so it will read the first byte only and interpret it as a character.
The cast between pointers are themselves always possible since all pointers are nothing more than memory addresses and whatever type, in memory, can always be thought as a sequence of bytes.
But -of course- the way the sequence is formed depends on how the decomposed type is represented in memory, and that's out of the scope of the C++ specifications.
That said, unless of very pathological cases, you can expect that representation to be the same on all the code produced by a same compiler for all the machines of a same platform (or family), and you should not expect same results on different platforms.
In general one thing to avoid is to express the relation between type sizes as "predefined":
in your sample you assume sizeof(int) == 4*sizeof(char): that's not necessarily always true.
But it is always true that sizeof(T) = N*sizeof(char), hence whatever T can always be seen as a integer number of char-s
Unless you have a cast operator, then a cast is simply telling to "see" that memory area in a different way. Nothing really fancy, I would say.
Then, you are reading the memory area byte-by-byte; as long as you do not change it, it is just fine. Of course, the result of what you see depends a lot from the platform: think about endianness, word size, padding, and so on.
Just reverse the byte order then it becomes
00 00 01 ff
Which is 256 (01) + 255 (ff) = 511
This is because your platfom is little endian.
I know this is a bizarre thing to do, and it's not portable. But I have an allocated array of unsigned ints, and I occasionaly want to "store" a float in it. I don't want to cast the float or convert it to the closest equivalent int; I want to store the exact bit image of the float in the allocated space of the unsigned int, such that I could later retrieve it as a float and it would retain its original float value.
This can be achieved through a simple copy:
uint32_t dst;
float src = get_float();
char * const p = reinterpret_cast<char*>(&dst);
std::copy(p, p + sizeof(float), reinterpret_cast<char *>(&src));
// now read dst
Copying backwards works similarly.
Just do a reinterpret cast of the respective memory location:
float f = 0.5f;
unsigned int i = *reinterpret_cast<unsigned int*>(&f);
or the more C-like version:
unsigned int i = *(unsigned int*)&f;
From your question text I assume you are aware that this breaks if float and unsigned int don't have the same size, but on most usual platforms both should be 32-bit.
EDIT: As Kerrek pointed out, this seems to be undefined behaviour. But I still stand to my answer, as it is short and precise and should indeed work on any practical compiler (convince me of the opposite). But look at Kerrek's answer if you want a UB-free answer.
You can use reinterpret_cast if you really have to. You don't even need to play with pointers/addresses as other answers mention. For example
int i;
reinterpret_cast<float&>(i) = 10;
std::cout << std::endl << i << " " << reinterpret_cast<float&>(i) << std::endl;
also works (and prints 1092616192 10 if you are qurious ;).
EDIT:
From C++ standard (about reinterpret_cast):
5.2.10.7 A pointer to an object can be explicitly converted to a pointer to an object of different type.Except that converting an
rvalue of type “pointer to T1” to the type “pointer to T2” (where T1
and T2 are object types and where the alignment requirements of T2 are
no stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified.
5.2.10.10 10 An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be
explicitly converted to the type “pointer to T2” using a
reinterpret_cast. That is, a reference cast reinterpret_cast<T&>(x)
has the same effect as the conversion
*reinterpret_cast<T*>(&x) with the built-in & and * operators. The result is an lvalue that refers to the same object as the source
lvalue, but with a different type. No temporary is created, no copy is
made, and constructors (12.1) or conversion functions (12.3) are not
called.67)
So it seems that consistently reinterpreting pointers is not undefined behavior, and using references has the same result as taking address, reintepreting and deferencing obtained pointer. I still claim that this is not undefined behavior.