Cast the object pointed by (void*) to char[] - c++

I'm trying to cast some objects (the size is known) pointed by void* to a char array bitwisely in c++. I'm considering using union with a char array so that I don't need to worry too much about the casting. However, since the type of the object is unknown, I don't know how to define this union.
Just wondering if there is any other better way to deal with this?
PS: edited to avoid confusion. For instance, an integer could be cast to a 4-character array.
Thanks!

In the link I put in the comments, the accepted answer goes into great detail about type punning and why you can't do it in c++.
What you can do is safely inspect any object with a char* (signed or unsigned) by using reinterpret_cast.
char* ptr = reinterpret_cast<char*>(&object);
for (std::size_t x = 0; x < sizeof(object); ++x)
std::cout << ptr[x]; //Or something less slow but this is an example
If you want to actually move the object into a char[], you should use std::memcpy.

If you are not worried about a bit of extra memory, you can use memcpy.
int i = 10;
char carray[sizeof(i)];
memcpy(carray, &i, sizeof(i));
However, remember that carray won't be a null terminated string. It will be just an array of chars. It will be better to use unsigned char since the value in one of those bytes might be too large for char if char is a signed type on your platform.
int i = 10;
unsigned char carray[sizeof(i)];
memcpy(carray, &i, sizeof(i));

Why do you feel you need to worry about the casting?
Just reinterpret_cast the void pointer to a char* and iterate over each character up to the size of the original object. Keep in mind that the char* pointer is not a null-terminated string and may or may not contain null characters in the middle of the data, so do not process it like a C string.

From 5.2.10 Reinterpret cast:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue
v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast(static_cast(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment
requirements of T2 are no stricter than those of T1, or if either type is void.
So you simply want to use:
char* my_bytes = reinterpret_cast<char*>(my_pointer);
size_t num_bytes = sizeof(my_pointer);
for(size_t i = 0; i < num_bytes; ++i) {
// *(my_bytes + i) has the most significant to least significant bytes
}

Related

Cast pointers in consteval

I wanted to make a compiletime data encryptor.
I tried this by creating a struct with a buffer, and a constructor for that struct that
would read an array of structures in that buffer. And because I wanted to use blockcypher,
I wanted that buffer length to be devideable by 16.
Here is my codesnippit:
template <typename type, size_t len>
struct DataPackage
{
unsigned char buffer[len * sizeof(type) + (16 - ((len * sizeof(type)) % 16))]; //create buffer to fit all types and fill to 16 byte blocks
__forceinline consteval DataPackage(type data[len])
{
for (int i = 0; i < sizeof(buffer); i++)
buffer[i] = i < len * sizeof(type) ? *(reinterpret_cast<unsigned char*>(data) + i) : '\0';
//encrypt buffer here
}
};
It gives me the error:
conversation from 'type*' to 'unsigned char*' is invalid in constant expression evaluation.
I use visual c++ compiler with c++ 20.
Is there a way to cast a pointer to a pointer to a different type in a consteval function?
Is there maybe a dirty compile hack workaround?
What exactly is preventing me from this, could this be fixed in future versions of c++?
It is impossible to reinterpret data as a different type in a constant expression, but you can cast the object representation to a different type if that is your intention:
auto obj_repr = std::bit_cast<std::array<unsigned char, sizeof(type)>>(data[i]);
Now obj_repr is an array of unsigned char containing the object representation of the ith element of the array. You can operate on that copy.
This requires of course that T is trivially-copyable. Otherwise your reinterpret_cast approach would have undefined behavior as well anyway. Furthermore to make this work as a constant expression type may not be or contain as subobject a pointer type, a union type, a pointer-to-member type, a volatile-qualified type or a reference type.
(There is some disagreement on whether theoretically this is always guaranteed to work because std::array may not be required to be free of padding, but it will work in practice. See e.g. std::bit_cast with std::array)
A consteval function can only be evaluated at compile-time. Therefore I highly doubt that __forceinline has any meaning on it, although of course that is purely implementation-defined and I don't know how MSVC handles it in this situation.
Usually consteval is not required over constexpr. I don't see anything in the code you are showing that needs to be prevented from being evaluated at runtime.

When and how is conversion to char pointer allowed?

We can look at the representation of an object of type T by converting a T* that points at that object into a char*. At least in practice:
int x = 511;
unsigned char* cp = (unsigned char*)&x;
std::cout << std::hex << std::setfill('0');
for (int i = 0; i < sizeof(int); i++) {
std::cout << std::setw(2) << (int)cp[i] << ' ';
}
This outputs the representation of 511 on my system: ff 01 00 00.
There is (surely) some implementation defined behaviour occurring here. Which of the casts is allowing me to convert an int* to an unsigned char* and which conversions does that cast entail? Am I invoking undefined behaviour as soon as I cast? Can I cast any T* type like this? What can I rely on when doing this?
Which of the casts is allowing me to convert an int* to an unsigned char*?
That C-style cast in this case is the same as reinterpret_cast<unsigned char*>.
Can I cast any T* type like this?
Yes and no. The yes part: You can safely cast any pointer type to a char* or unsigned char* (with the appropriate const and/or volatile qualifiers). The result is implementation-defined, but it is legal.
The no part: The standard explicitly allows char* and unsigned char* as the target type. However, you cannot (for example) safely cast a double* to an int*. Do this and you've crossed the boundary from implementation-defined behavior to undefined behavior. It violates the strict aliasing rule.
Your cast maps to:
unsigned char* cp = reinterpret_cast<unsigned char*>(&x);
The underlying representation of an int is implementation defined, and viewing it as characters allows you to examine that. In your case, it is 32-bit little endian.
There is nothing special here -- this method of examining the internal representation is valid for any data type.
C++03 5.2.10.7: A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
This suggests that the cast results in unspecified behavior. But pragmatically speaking, casting from any pointer type to char* will always allow you to examine (and modify) the internal representation of the referenced object.
The C-style cast in this case is equivalent to reinterpret_cast. The Standard describes the semantics in 5.2.10. Specifically, in paragraph 7:
"A pointer to an object can be explicitly converted to a pointer to a
different object type.70 When a prvalue v of type “pointer to T1” is
converted to the type “pointer to cvT2”, the result is
static_cast<cvT2*>(static_cast<cvvoid*>(v)) if both T1 and T2 are
standard-layout types (3.9) and the alignment requirements of T2 are
no stricter than those of T1. Converting a prvalue of type “pointer to
T1” to the type “pointer to T2” (where T1 and T2 are object types and
where the alignment requirements of T2 are no stricter than those of
T1) and back to its original type yields the original pointer value.
The result of any other such pointer conversion is unspecified."
What it means in your case, the alignment requirements are satisfied, and the result is unspecified.
The implementation behaviour in your example is the endianness attribute of your system, in this case your CPU is a little endian.
About the type casting, when you cast an int* to char* all what you are doing is telling the compiler to interpret what cp is pointing to as a char, so it will read the first byte only and interpret it as a character.
The cast between pointers are themselves always possible since all pointers are nothing more than memory addresses and whatever type, in memory, can always be thought as a sequence of bytes.
But -of course- the way the sequence is formed depends on how the decomposed type is represented in memory, and that's out of the scope of the C++ specifications.
That said, unless of very pathological cases, you can expect that representation to be the same on all the code produced by a same compiler for all the machines of a same platform (or family), and you should not expect same results on different platforms.
In general one thing to avoid is to express the relation between type sizes as "predefined":
in your sample you assume sizeof(int) == 4*sizeof(char): that's not necessarily always true.
But it is always true that sizeof(T) = N*sizeof(char), hence whatever T can always be seen as a integer number of char-s
Unless you have a cast operator, then a cast is simply telling to "see" that memory area in a different way. Nothing really fancy, I would say.
Then, you are reading the memory area byte-by-byte; as long as you do not change it, it is just fine. Of course, the result of what you see depends a lot from the platform: think about endianness, word size, padding, and so on.
Just reverse the byte order then it becomes
00 00 01 ff
Which is 256 (01) + 255 (ff) = 511
This is because your platfom is little endian.

How can I assign a float variable to an unsigned int variable, bit image, not cast

I know this is a bizarre thing to do, and it's not portable. But I have an allocated array of unsigned ints, and I occasionaly want to "store" a float in it. I don't want to cast the float or convert it to the closest equivalent int; I want to store the exact bit image of the float in the allocated space of the unsigned int, such that I could later retrieve it as a float and it would retain its original float value.
This can be achieved through a simple copy:
uint32_t dst;
float src = get_float();
char * const p = reinterpret_cast<char*>(&dst);
std::copy(p, p + sizeof(float), reinterpret_cast<char *>(&src));
// now read dst
Copying backwards works similarly.
Just do a reinterpret cast of the respective memory location:
float f = 0.5f;
unsigned int i = *reinterpret_cast<unsigned int*>(&f);
or the more C-like version:
unsigned int i = *(unsigned int*)&f;
From your question text I assume you are aware that this breaks if float and unsigned int don't have the same size, but on most usual platforms both should be 32-bit.
EDIT: As Kerrek pointed out, this seems to be undefined behaviour. But I still stand to my answer, as it is short and precise and should indeed work on any practical compiler (convince me of the opposite). But look at Kerrek's answer if you want a UB-free answer.
You can use reinterpret_cast if you really have to. You don't even need to play with pointers/addresses as other answers mention. For example
int i;
reinterpret_cast<float&>(i) = 10;
std::cout << std::endl << i << " " << reinterpret_cast<float&>(i) << std::endl;
also works (and prints 1092616192 10 if you are qurious ;).
EDIT:
From C++ standard (about reinterpret_cast):
5.2.10.7 A pointer to an object can be explicitly converted to a pointer to an object of different type.Except that converting an
rvalue of type “pointer to T1” to the type “pointer to T2” (where T1
and T2 are object types and where the alignment requirements of T2 are
no stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified.
5.2.10.10 10 An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be
explicitly converted to the type “pointer to T2” using a
reinterpret_cast. That is, a reference cast reinterpret_cast<T&>(x)
has the same effect as the conversion
*reinterpret_cast<T*>(&x) with the built-in & and * operators. The result is an lvalue that refers to the same object as the source
lvalue, but with a different type. No temporary is created, no copy is
made, and constructors (12.1) or conversion functions (12.3) are not
called.67)
So it seems that consistently reinterpreting pointers is not undefined behavior, and using references has the same result as taking address, reintepreting and deferencing obtained pointer. I still claim that this is not undefined behavior.

reinterpret casting to and from unsigned char* and char*

I'm wondering if it is necessary to reinterpret_cast in the function below. ITER_T might be a char*, unsigned char*, std::vector<unsigned char> iterator, or something else like that. It doesn't seem to hurt so far, but does the casting ever affect how the bytes are copied at all?
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
unsigned char* stg = reinterpret_cast<unsigned char*>(alloc_storage(length));
std::copy(begin, begin + length, stg);
return reinterpret_cast<char*>(stg);
}
reinterpret_casts are used for low-level implementation defined casts. According to the standard, reinterpret_casts can be used for the following conversions (C++03 5.2.10):
Pointer to an integral type
Integral type to Pointer
A pointer to a function can be converted to a pointer to a function of a different type
A pointer to an object can be converted to a pointer to an object of different type
Pointer to member functions or pointer to data members can be converted to functions or objects of a different type. The result of such a pointer conversion is unspecified, except the pointer a converted back to its original type.
An expression of type A can be converted to a reference to type B if a pointer to type A can be explicitly converted to type B using a reinterpret_cast.
That said, using the reinterpret_cast is not a good solution in your case, since casting to different types are unspecified by the standard, though casting from char * to unsigned char * and back should work on most machines.
In your case I would think about using a static_cast or not casting at all by defining stg as type char *:
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
The code as written is working as intended according to standard 4.7 (2), although this is guaranteed only for machines with two's complement representation.
If alloc_storage returns a char*, and 'char' is signed, then if I understand 4.7 (3) correctly the result would be implementation defined if the iterator's value type is unsigned and you'd drop the cast and pass the char* to copy.
The short answer is yes, it could affect.
char and unsigned char are convertible types (3.9.1 in C++ Standard 0x n2800) so you can assign one to the other. You don't need the cast at all.
[3.9.1] ... A char, a signed char, and an unsigned
char occupy the same amount of storage
and have the same alignment
requirements; that is, they have the
same object representation.
[4.7] ...
2 If the destination type is
unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2n where n
is the number of bits used to
represent the unsigned type).
[ Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). —end note
]
3 If the destination type is signed,
the value is unchanged if it can be
represented in the destination type
(and bit-field width); otherwise, the
value is implementation-defined.
Therefore even in the worst case you will get the best (less implementation-defined) conversion. Anyway in most implementations this will not change anything in the bit pattern, and you will not even have a conversion if look into the generated assembler.
template<class ITER_T>
char *copy_binary( unsigned char length, const ITER_T& begin)
{
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
Using reinterpret_cast you depend on the compiler:
[5.2.10.3] The mapping performed by
reinterpret_cast is
implementation-defined. [ Note: it
might, or might not, produce a
representation different from the
original value. —end note ]
Note: This is an interesting related post.
So if I get it right, the cast to unsigned char is to gaurantee an unsigned byte-by-byte copy. But then you cast it back for the return. The function looks a bit dodgy what exactly is the context/reason for setting it up this way? A quick fix might be to replace all this with a memcpy() (but as commented, do not use that on iterator objects) -- otherwise just remove the redundant casts.

How do C/C++ compilers handle type casting between types with different value ranges?

How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.