I wanted to make a compiletime data encryptor.
I tried this by creating a struct with a buffer, and a constructor for that struct that
would read an array of structures in that buffer. And because I wanted to use blockcypher,
I wanted that buffer length to be devideable by 16.
Here is my codesnippit:
template <typename type, size_t len>
struct DataPackage
{
unsigned char buffer[len * sizeof(type) + (16 - ((len * sizeof(type)) % 16))]; //create buffer to fit all types and fill to 16 byte blocks
__forceinline consteval DataPackage(type data[len])
{
for (int i = 0; i < sizeof(buffer); i++)
buffer[i] = i < len * sizeof(type) ? *(reinterpret_cast<unsigned char*>(data) + i) : '\0';
//encrypt buffer here
}
};
It gives me the error:
conversation from 'type*' to 'unsigned char*' is invalid in constant expression evaluation.
I use visual c++ compiler with c++ 20.
Is there a way to cast a pointer to a pointer to a different type in a consteval function?
Is there maybe a dirty compile hack workaround?
What exactly is preventing me from this, could this be fixed in future versions of c++?
It is impossible to reinterpret data as a different type in a constant expression, but you can cast the object representation to a different type if that is your intention:
auto obj_repr = std::bit_cast<std::array<unsigned char, sizeof(type)>>(data[i]);
Now obj_repr is an array of unsigned char containing the object representation of the ith element of the array. You can operate on that copy.
This requires of course that T is trivially-copyable. Otherwise your reinterpret_cast approach would have undefined behavior as well anyway. Furthermore to make this work as a constant expression type may not be or contain as subobject a pointer type, a union type, a pointer-to-member type, a volatile-qualified type or a reference type.
(There is some disagreement on whether theoretically this is always guaranteed to work because std::array may not be required to be free of padding, but it will work in practice. See e.g. std::bit_cast with std::array)
A consteval function can only be evaluated at compile-time. Therefore I highly doubt that __forceinline has any meaning on it, although of course that is purely implementation-defined and I don't know how MSVC handles it in this situation.
Usually consteval is not required over constexpr. I don't see anything in the code you are showing that needs to be prevented from being evaluated at runtime.
Related
C++17 (expr.add/4) say:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n;
otherwise, the behavior is undefined. Likewise, the expression P - J
points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n;
otherwise, the behavior is undefined.
struct Foo {
float x, y, z;
};
Foo f;
char *p = reinterpret_cast<char*>(&f) + offsetof(Foo, z); // (*)
*reinterpret_cast<float*>(p) = 42.0f;
Has the line marked with (*) UB? reinterpret_cast<char*>(&f) doesn't point to a char array, but to a float, so it should UB according to the cited paragraph. But, if it is UB, then offsetof's usefulness would be limited.
Is it UB? If not, why not?
The addition is intended to be valid, but I do not believe the standard manages to say so clearly enough. Quoting N4140 (roughly C++14):
3.9 Types [basic.types]
2 For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array
of char or unsigned char.42 [...]
42) By using, for example, the library functions (17.6.1.2) std::memcpy or std::memmove.
It says "for example" because std::memcpy and std::memmove are not the only ways in which the underlying bytes are intended to be allowed to be copied. A simple for loop which copies byte by byte manually is supposed to be valid as well.
In order for that to work, addition has to be defined for pointers to the raw bytes that make up an object, and the way definedness of expressions works, the addition's definedness cannot depend on whether the addition's result will subsequently be used to copy the bytes into an array.
Whether that means those bytes form an array already or whether this is a special exception to the general rules for the + operator that is somehow omitted in the operator description, is not clear to me (I suspect the former), but either way would make the addition you're performing in your code valid.
Any interpretation that disallows the intended usage of offsetof must be wrong:
#include <assert.h>
#include <stddef.h>
struct S { float a, b, c; };
const size_t idx_S[] = {
offsetof(struct S, a),
offsetof(struct S, b),
offsetof(struct S, c),
};
float read_S(struct S *sp, unsigned int idx)
{
assert(idx < 3);
return *(float *)(((char *)sp) + idx_S[idx]); // intended to be valid
}
However, any interpretation that allows one to step past the end of an explicitly-declared array must also be wrong:
#include <assert.h>
#include <stddef.h>
struct S { float a[2]; float b[2]; };
static_assert(offsetof(struct S, b) == sizeof(float)*2,
"padding between S.a and S.b -- should be impossible");
float read_S(struct S *sp, unsigned int idx)
{
assert(idx < 4);
return sp->a[idx]; // undefined behavior if idx >= 2,
// reading past end of array
}
And we are now on the horns of a dilemma, because the wording in both the C and C++ standards, that was intended to disallow the second case, probably also disallows the first case.
This is commonly known as the "what is an object?" problem. People, including members of the C and C++ committees, have been arguing about this and related issues since the 1990s, and there have been multiple attempts to fix the wording, and to the best of my knowledge none has succeeded (in the sense that all existing "reasonable" code is rendered definitely conforming and all existing "reasonable" optimizations are still allowed).
(Note: All of the above code is written as it would be written in C to emphasize that the same problem exists in both languages, and can be encountered without the use of any C++ constructs.)
See CWG 1314
According to 6.9 [basic.types] paragraph 4,
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
and 4.5 [intro.object] paragraph 5,
An object of trivially copyable or standard-layout type (6.9 [basic.types]) shall occupy contiguous bytes of storage.
Do these passages make pointer arithmetic (8.7 [expr.add] paragraph 5) within a standard-layout object well-defined (e.g., for writing one's own version of memcpy?
Rationale (August, 2011):
The current wording is sufficiently clear that this usage is permitted.
I strongly disagree with CWG's statement that "the current wording is sufficiently clear", but nevertheless, that's the ruling we have.
I interpret CWG's response as suggesting that a pointer to unsigned char into an object of trivially copyable or standard-layout type, for the purposes of pointer arithmetic, ought to be interpreted as a pointer to an array of unsigned char whose size equals the size of the object in question. I don't know whether they intended that it would also work using a char pointer or (as of C++17) a std::byte pointer. (Maybe if they had decided to actually clarify it instead of claiming the existing wording was clear enough, then I would know the answer.)
(A separate issue is whether std::launder is required to make the OP's code well-defined. I won't go into this here; I think it deserves a separate question.)
As far as I know, your code is valid. Aliasing an object as a char array is explicitly allowed as per § 3.10 ¶ 10.8:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
[…]
a char or unsigned char type.
The other question is whether casting the char* pointer back to float* and assigning through it is valid. Since your Foo is a POD type, this is okay. You are allowed to compute the address of a POD's member (given that the computation itself is not UB) and then access the member through that address. You must not abuse this to, for example, gain access to a private member of a non-POD object. Furthermore, it would be UB if you'd, say, cast to int* or write at an address where no object of type float exists. The reasoning behind this can be found in the section quoted above.
Yes, this is undefined. As you have stated in your question,
reinterpret_cast<char*>(&f) doesn't point to a char array, but to a float, ...
... reinterpret_cast<char*>(&f) does even not point to a char, so even if the object representation is a char array, the behavior is still undefined.
For offsetof, you can still use it like
struct Foo {
float x, y, z;
};
Foo f;
auto p = reinterpret_cast<std::uintptr_t>(&f) + offsetof(Foo, z);
// ^^^^^^^^^^^^^^
*reinterpret_cast<float*>(p) = 42.0f;
I'm trying to cast some objects (the size is known) pointed by void* to a char array bitwisely in c++. I'm considering using union with a char array so that I don't need to worry too much about the casting. However, since the type of the object is unknown, I don't know how to define this union.
Just wondering if there is any other better way to deal with this?
PS: edited to avoid confusion. For instance, an integer could be cast to a 4-character array.
Thanks!
In the link I put in the comments, the accepted answer goes into great detail about type punning and why you can't do it in c++.
What you can do is safely inspect any object with a char* (signed or unsigned) by using reinterpret_cast.
char* ptr = reinterpret_cast<char*>(&object);
for (std::size_t x = 0; x < sizeof(object); ++x)
std::cout << ptr[x]; //Or something less slow but this is an example
If you want to actually move the object into a char[], you should use std::memcpy.
If you are not worried about a bit of extra memory, you can use memcpy.
int i = 10;
char carray[sizeof(i)];
memcpy(carray, &i, sizeof(i));
However, remember that carray won't be a null terminated string. It will be just an array of chars. It will be better to use unsigned char since the value in one of those bytes might be too large for char if char is a signed type on your platform.
int i = 10;
unsigned char carray[sizeof(i)];
memcpy(carray, &i, sizeof(i));
Why do you feel you need to worry about the casting?
Just reinterpret_cast the void pointer to a char* and iterate over each character up to the size of the original object. Keep in mind that the char* pointer is not a null-terminated string and may or may not contain null characters in the middle of the data, so do not process it like a C string.
From 5.2.10 Reinterpret cast:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue
v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast(static_cast(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment
requirements of T2 are no stricter than those of T1, or if either type is void.
So you simply want to use:
char* my_bytes = reinterpret_cast<char*>(my_pointer);
size_t num_bytes = sizeof(my_pointer);
for(size_t i = 0; i < num_bytes; ++i) {
// *(my_bytes + i) has the most significant to least significant bytes
}
Is the following well-defined:
char* charPtr = new char[42];
int* intPtr = (int*)charPtr;
charPtr++;
intPtr = (int*) charPtr;
The intPtr isn't properly aligned (in at least one of the two cases). Is it illegal just having it there? Is it UB using it at any stage? How can you use it and how can't you?
In general, the result is unspecified (5.2.10p7) if the alignment requirements of int are greater than those of char, which they usually will be. The result will be a valid value of the type int * so it can be e.g. printed as a pointer with operator<< or converted to intptr_t.
Because the result has an unspecified value, unless specified by the implementation it is undefined behaviour to indirect it and perform lvalue-to-rvalue conversion on the resulting int lvalue (except in unevaluated contexts). Converting back to char * will not necessarily round-trip.
However, if the original char * was itself the result of a cast from int *, then the cast to int * counts as the second half of a round trip; in that case, the cast is defined.
In particular, in the case above where the char * was the result of a new[] expression, we are guaranteed (5.3.4p10) that the char * pointer is appropriately aligned for int, as long as sizeof(int) <= 42. Because the new[] expression obtains its storage from an allocation function, 3.7.4.1p2 applies; the void * pointer can be converted
to a pointer of any complete object type with a fundamental alignment requirement and then used to access the object [...] which strongly implies, along with the note to 5.3.4p10, that the same holds for the char * pointer returned by the new[] expression. In this case the int * is a pointer to an uninitialised int object, so performing lvalue-to-rvalue conversion on its indirection is undefined (3.8p6), but assigning to its indirection is fully defined. The int object is in the storage allocated (3.7.4.1p2) so converting the int * back to char * will yield the original value per 1.8p6. This does not hold for the incremented char * pointer as unless sizeof(int) == 1 it is not the address of an int object.
First, of course: the pointer is guaranteed to be aligned in the
first case (by §5.3.4/10 and §3.7.4.1/2), and may be correctly
aligned in both cases. (Obviously, if sizeof(int) == 1, but
even when this is not the case, an implementation doesn't
necessarily have alignment requirements.)
And to make things clear: your casts are all reinterpret_cast.
Beyond that, this is an interesting question, because as far as
I can tell, there is no difference in the two casts, as far as
the standard is concerned. The results of the conversion are
unspecified (according to §5.2.10/7); you're not even guaranteed
that converting it back into a char* will result in the
original value. (It obviously won't, for example, on machines
where int* is smaller than a char*.)
In practice, of course: the standard requires that the return
value of new char[N] be sufficiently aligned for any value
which may fit into it, so you are guaranteed to be able to do:
intPtr = new (charPtr) int;
Which has exactly the same effect as your cast, given that the
default constructor for int is a no-op. (And assuming that
sizeof(int) <= 42.) So it's hard to imagine an implementation
in which the first part fails. You should be able to use the
intPtr just like any other legally obtained intPtr. And the
idea that converting it back to a char* would somehow result
in a different value from the original char* seems
preposterous.
In the second part, all bets are off: you definitely can't
dereference the pointer (unless your implementation guarantees
otherwise), and it's also quite possible that converting it back
to char* results in something different. (Imagine a word
addressed machine, for example, where converting a char* to an
int* rounds up. Then converting back would result in
a char* which was sizeof(int) higher than the original. Or
where an attempt to convert a misaligned pointer always resulted
in a null pointer.)
I'm wondering if it is necessary to reinterpret_cast in the function below. ITER_T might be a char*, unsigned char*, std::vector<unsigned char> iterator, or something else like that. It doesn't seem to hurt so far, but does the casting ever affect how the bytes are copied at all?
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
unsigned char* stg = reinterpret_cast<unsigned char*>(alloc_storage(length));
std::copy(begin, begin + length, stg);
return reinterpret_cast<char*>(stg);
}
reinterpret_casts are used for low-level implementation defined casts. According to the standard, reinterpret_casts can be used for the following conversions (C++03 5.2.10):
Pointer to an integral type
Integral type to Pointer
A pointer to a function can be converted to a pointer to a function of a different type
A pointer to an object can be converted to a pointer to an object of different type
Pointer to member functions or pointer to data members can be converted to functions or objects of a different type. The result of such a pointer conversion is unspecified, except the pointer a converted back to its original type.
An expression of type A can be converted to a reference to type B if a pointer to type A can be explicitly converted to type B using a reinterpret_cast.
That said, using the reinterpret_cast is not a good solution in your case, since casting to different types are unspecified by the standard, though casting from char * to unsigned char * and back should work on most machines.
In your case I would think about using a static_cast or not casting at all by defining stg as type char *:
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
The code as written is working as intended according to standard 4.7 (2), although this is guaranteed only for machines with two's complement representation.
If alloc_storage returns a char*, and 'char' is signed, then if I understand 4.7 (3) correctly the result would be implementation defined if the iterator's value type is unsigned and you'd drop the cast and pass the char* to copy.
The short answer is yes, it could affect.
char and unsigned char are convertible types (3.9.1 in C++ Standard 0x n2800) so you can assign one to the other. You don't need the cast at all.
[3.9.1] ... A char, a signed char, and an unsigned
char occupy the same amount of storage
and have the same alignment
requirements; that is, they have the
same object representation.
[4.7] ...
2 If the destination type is
unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2n where n
is the number of bits used to
represent the unsigned type).
[ Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). —end note
]
3 If the destination type is signed,
the value is unchanged if it can be
represented in the destination type
(and bit-field width); otherwise, the
value is implementation-defined.
Therefore even in the worst case you will get the best (less implementation-defined) conversion. Anyway in most implementations this will not change anything in the bit pattern, and you will not even have a conversion if look into the generated assembler.
template<class ITER_T>
char *copy_binary( unsigned char length, const ITER_T& begin)
{
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
Using reinterpret_cast you depend on the compiler:
[5.2.10.3] The mapping performed by
reinterpret_cast is
implementation-defined. [ Note: it
might, or might not, produce a
representation different from the
original value. —end note ]
Note: This is an interesting related post.
So if I get it right, the cast to unsigned char is to gaurantee an unsigned byte-by-byte copy. But then you cast it back for the return. The function looks a bit dodgy what exactly is the context/reason for setting it up this way? A quick fix might be to replace all this with a memcpy() (but as commented, do not use that on iterator objects) -- otherwise just remove the redundant casts.
How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.