Lately I've been doing a lot of exercises with file streams. When I use fstream.write(...)
to e.g. write an array of 10 integers (intArr[10]) I write:
fstream.write((char*)intArr,sizeof(int)*10);
Is the (char*)intArr-cast safe? I didn't have any problems with it until now but I learned about static_cast (the c++ way right?) and used static_cast<char*>(intArr) and it failed! Which I cannot understand ... Should I change my methodology?
A static cast simply isn't the right thing. You can only perform a static cast when the types in question are naturally convertible. However, unrelated pointer types are not implicitly convertible; i.e. T* is not convertible to or from U* in general. What you are really doing is a reinterpreting cast:
int intArr[10];
myfile.write(reinterpret_cast<const char *>(intArr), sizeof(int) * 10);
In C++, the C-style cast (char *) becomes the most appropriate sort of conversion available, the weakest of which is the reinterpreting cast. The benefit of using the explicit C++-style casts is that you demonstrate that you understand the sort of conversion that you want. (Also, there's no C-equivalent to a const_cast.)
Maybe it's instructive to note the differences:
float q = 1.5;
uint32_t n = static_cast<uint32_t>(q); // == 1, type conversion
uint32_t m1 = reinterpret_cast<uint32_t>(q); // undefined behaviour, but check it out
uint32_t m2 = *reinterpret_cast<const uint32_t *>(&q); // equally bad
Off-topic: The correct way of writing the last line is a bit more involved, but uses copious amounts of casting:
uint32_t m;
char * const pm = reinterpret_cast<char *>(&m);
const char * const pq = reinterpret_cast<const char *>(&q);
std::copy(pq, pq + sizeof(float), pm);
Related
I'm using a char* array to store different data types, like in the next example:
int main()
{
char* arr = new char[8];
*reinterpret_cast<uint32_t*>(&arr[1]) = 1u;
return 0;
}
Compiling and running with clang UndefinedBehaviorSanitizer will report the following error:
runtime error: store to misaligned address 0x602000000011 for type 'uint32_t' (aka 'unsigned int'), which requires 4 byte alignment
I suppose I could do it another way, but why is this undefined behavior? What concepts are involved here?
You cannot cast an arbitrary char* to uint32_t*, even if it points to an array large enough to hold a uint32_t
There are a couple reasons why.
The practical answer:
uint32_t generally likes 4-byte alignment: its address should be a multiple of 4.
char does not have such a restriction. It can live at any address.
That means that an arbitrary char* is unlikely to be aligned properly for a uint32_t.
The Language Lawyer answer:
Aside from the alignment issue, your code exhibits undefined behavior because you're violating the strict aliasing rules. No uint32_t object exists at the address you're writing to, but you're treating it as if there is one there.
In general, while char* may be used to point to any object and read its byte representation, a T* for any given type T, cannot be used to point at an array of bytes and write the byte-representation of the object into it.
No matter the reason for the error, the way to fix it is the same:
If you don't care about treating the bytes as a uint32_t and are just serializing them (to send over a network, or write to disk, for example), then you can std::copy the bytes into the buffer:
char buffer[BUFFER_SIZE] = {};
char* buffer_pointer = buffer;
uint32_t foo = 123;
char* pfoo = reinterpret_cast<char*>(&foo);
std::copy(pfoo, pfoo + sizeof(foo), buffer_pointer);
buffer_pointer += sizeof(foo);
uint32_t bar = 234;
char* pbar = reinterpret_cast<char*>(&bar);
std::copy(pbar, pbar + sizeof(bar), buffer_pointer);
buffer_pointer += sizeof(bar);
// repeat as needed
If you do want to treat those bytes as a uint32_t (if you're implementing a std::vector-like data structure, for example) then you will need to ensure the buffer is properly-aligned, and use placement-new:
std::aligned_storage_t<sizeof(uint32_t), alignof(uint32_t)> buffer[BUFFER_SIZE];
uint32_t foo = 123;
uint32_t* new_uint = new (&buffer[0]) uint32_t(foo);
uint32_t bar = 234;
uint32_t* another_new_uint = new (&buffer[1]) uint32_t(foo);
// repeat as needed
This has been bugging me for a very long time: how to do pointer conversion from anything to char * to dump binary to disk.
In C, you don't even think about it.
double d = 3.14;
char *cp = (char *)&d;
// do what u would do to dump to disk
However, in C++, where everyone is saying C-cast is frowned upon, I've been doing this:
double d = 3.14;
auto cp = reinterpret_cast<char *>(&d);
Now this is copied from cppreference,
so I assume this is the proper way.
However, I've read from multiple sources saying this is UB.
(e.g. this one)
So I can't help wonder if there is any "DB" way at all (According to that post, there's none).
Another scenario I often encounter is to implement an API like this:
void serialize(void *buffer);
where you would dump a lot of things to this buffer. Now, I've been doing this:
void serialize(void *buffer) {
int intToDump;
float floatToDump;
int *ip = reinterpret_cast<int *>(buffer);
ip[0] = intToDump;
float *fp = reinterpret_cast<float *>(&ip[1]);
fp[0] = floatToDump;
}
Well, I guess this is UB as well.
Now, is there truly no "DB" way to accomplish either of these tasks?
I've seen someone using uintptr_t to accomplish sth similar to serialize task with pointer as integer math along with sizeof,
but I'm guessing here that it's UB as well.
Even though they are UB, compiler writers usually do the rational things to make sure everything is okay.
And I'm okay with that: it's not an unreasonable thing to ask for.
So my questions really are, for the two common tasks mentioned above:
Is there truly no "DB" way to accomplish them that will satisfy the ultimate C++ freaks?
Any better way to accomplish them other than what I've been doing?
Thanks!
Your serialize implementation's behavior is undefined because you violate the strict aliasing rules. The strict aliasing rules say, in short, that you cannot reference any object via a pointer or reference to a different type. There is one major exception to that rule though: any object may be referenced via a pointer to char, unsigned char, or (since C++17) std::byte. Note that this exception does not apply the other way around; a char array may not be accessed via a pointer to a type other than char.
That means that you can make your serialize function well-defined by changing it as so:
void serialize(char* buffer) {
int intToDump = 42;
float floatToDump = 3.14;
std::memcpy(buffer, &intToDump, sizeof(intToDump));
std::memcpy(buffer + sizeof(intToDump), &floatToDump, sizeof(floatToDump));
// Or you could do byte-by-byte manual copy loops
// i.e.
//for (std::size_t i = 0; i < sizeof(intToDump); ++i, ++buffer) {
// *buffer = reinterpret_cast<char*>(&intToDump)[i];
//}
//for (std::size_t i = 0; i < sizeof(floatToDump); ++i, ++buffer) {
// *buffer = reinterpret_cast<char*>(&floatToDump)[i];
//}
}
Here, rather than casting buffer to a pointer to an incompatible type, std::memcpy casts a pointer to the object to serialize to a pointer to unsigned char. In doing so, the strict aliasing rules are not violated, and the program's behavior remains well-defined. Note that the exact representation is still unspecified; as it will depend on your CPU's endianess.
An API uses void* to store untyped pointer offsets. It's a bit hacky, but okay whatever.
To express my offset arithmetic, I tried doing something like this
int main ()
{
void * foo;
foo = static_cast <int *> (nullptr) + 100;
static_cast <int * &> (foo) += 100;
}
The last line fails to compile (gcc)
x.cpp:7:28: error: invalid static_cast from type ‘void*’ to type ‘int*&’
The fix is simple:
foo = static_cast <int *> (foo) + 100;
But why isn't the first one allowed?
Before you answer "because the standard says so", why does the standard say so? Is the first method somehow dangerous? Or is it just an oversight?
It's not allowed for the same reason that int i; static_cast<long &>(l) = 3L; isn't allowed.
Sure, on a lot of implementations (where int and long have the same size, representation and alignment), it could work. But the rules for which casts are valid are mostly the same for all implementations, and clearly this could never work on platforms where int and long have different sizes, meaning it'd be impossible to allow accessing one as the other on those platforms.
Historically, there have been implementations on which void * and int * have different representations.
Later, after the standard stating that accessing void * as if it were an int * is invalid, implementations also started optimising on the assumption that valid programs do not do that:
void *f (void **ppv, int **ppi) {
void *result = *ppv;
*ppi = nullptr;
return result;
}
The implementation is allowed to optimise this to
void *f (void **ppv, int **ppi) {
*ppi = nullptr;
return *ppv;
}
and such optimisations, when they reduce code size or increase efficiency, are commonplace nowadays. If f were allowed to be called as void *pv = &pv; f (pv, &static_cast<int*&>(pv));, this optimisation would be invalid. Because such optimisations have proved useful, the rules are unlikely to change.
I am using the CUDA API / cuFFT API. In order to move data from host to GPU I am usign the cudaMemcpy functions. I am using it like below. len is the amount of elements on dataReal and dataImag.
void foo(const double* dataReal, const double* dataImag, size_t len)
{
cufftDoubleComplex* inputData;
size_t allocSizeInput = sizeof(cufftDoubleComplex)*len;
cudaError_t allocResult = cudaMalloc((void**)&inputData, allocSizeInput);
if (allocResult != cudaSuccess) return;
cudaError_t copyResult;
coypResult = cudaMemcpy2D(static_cast<void*>(inputData),
2 * sizeof (double),
static_cast<const void*>(dataReal),
sizeof(double),
sizeof(double),
len,
cudaMemcpyHostToDevice);
coypResult &= cudaMemcpy2D(static_cast<void*>(inputData) + sizeof(double),
2 * sizeof (double),
static_cast<const void*>(dataImag),
sizeof(double),
sizeof(double),
len,
cudaMemcpyHostToDevice);
//and so on.
}
I am aware, that pointer arithmetic on void pointers is actually not possible. the second cudaMemcpy2D does still work though. I still get a warning by the compiler, but it works correctly.
I tried using static_cast< char* > but that doesn't work as cuffDoubleComplex* cannot be static casted to char*.
I am a bit confused why the second cudaMemcpy with the pointer arithmetic on void is working, as I understand it shouldn't. Is the compiler implicitly assuming that the datatype behind void* is one byte long?
Should I change something there? Use a reinterpret_cast< char* >(inputData) for example?
Also during the allocation I am using the old C-style (void**) cast. I do this because I am getting a "invalid static_cast from cufftDoubleComplex** to void**". Is there another way to do this correctly?
FYI: Link to cudaMemcpy2D Doc
Link to cudaMalloc Doc
You cannot do arithmetic operations on void* since arithmetic operations on pointer are based on the size of the pointed objects (and sizeof(void) does not really mean anything).
Your code compiles probably thanks to a compiler extension that treats arithmetic operations on void* as arithmetic operation on char*.
In your case, you probably do not need arithmetic operations, the following should work (and be more robust):
coypResult &= cudaMemcpy2D(static_cast<void*>(&inputData->y),
sizeof (cufftDoubleComplex),
Since cufftDoubleComplex is simply:
struct __device_builtin__ __builtin_align__(16) double2
{
double x, y;
};
I'm a beginner in C++, and I have problem with understanding some code.
I had an exercise to do, to write function which returns size of int, and do not use sizeof() and reinterpret_cast. Someone gave me solution, but I do not understand how it works. Can you please help me to understand it? This is the code:
int intSize() {
int intArray[10];
int * intPtr1;
int * intPtr2;
intPtr1 = &intArray[1];
intPtr2 = &intArray[2];
//Why cast int pointer to void pointer?
void* voidPtr1 = static_cast<void*>(intPtr1);
//why cast void pointer to char pointer?
char* charPtr1 = static_cast<char*>(voidPtr1);
void* voidPtr2 = static_cast<void*>(intPtr2);
char* charPtr2 = static_cast<char*>(voidPtr2);
//when I try to print 'charPtr1' there is nothing printed
//when try to print charPtr2 - charPtr1, there is correct value shown - 4, why?
return charPtr2 - charPtr1;
}
To summarize what I don't understand is, why we have to change int* to void* and then to char* to do this task? And why we have the result when we subtract charPtr2 and charPtr1, but there is nothing shown when try to print only charPtr1?
First of all, never do this in real-world code. You will blow off your leg, look like an idiot and all the cool kids will laugh at you.
That being said, here's how it works:
The basic idea is that the size of an int is equal to the offset between two elements in an int array in bytes. Ints in an array are tightly packed, so the beginning of the second int comes right after the end of the first one:
int* intPtr1 = &intArray[0];
int* intPtr2 = &intArray[1];
The problem here is that when subtracting two int pointers, you won't get the difference in bytes, but the difference in ints. So intPtr2 - intPtr1 is 1, because they are 1 int apart.
But we are in C++, so we can cast pointers to anything! So instead of using int pointers, we copy the value to char pointers, which are 1 byte in size (at least on most platforms).
char* charPtr1 = reinterpret_cast<char*>(intPtr1);
char* charPtr2 = reinterpret_cast<char*>(intPtr2);
The difference charPtr2 - charPtr1 is the size in bytes. The pointers still point to the same location as before (i.e. the start of the second and first int in the array), but the difference will now be calculated in sizes of char, not in sizes of int.
Since the exercise did not allow reinterpret_cast you will have to resort to another trick. You cannot static_cast from int* to char* directly. This is C++'s way of protecting you from doing something stupid. The trick is to cast to void* first. You can static_cast any pointer type to void* and from void* to any pointer type.
This is the important bit:
intPtr1 = &intArray[1];
intPtr2 = &intArray[2];
This creates two pointers to adjacent ints in the array. The distance between these two pointers is the size of an integer that you're trying to retrieve. However the way that pointer arithmetic works is that if you subtract these two then the compiler will return you the size in terms of ints, which will always be 1.
So what you're doing next is re-casting these as character pointers. Characters are (or de-facto are) 1 byte each, so the difference between these two pointers as character pointers will give you an answer in bytes. That's why you're casting to character pointers and subtracting.
As for via void* - this is to avoid having to use reinterpret_cast. You're not allowed to cast directly from a int* to a char* with static_cast<>, but going via void* removes this restriction since the compiler no longer knows it started with an int*. You could also just use a C-style cast instead, (char*)(intPtr1).
"do not use sizeof() and reinterpret_cast"... nothing's said about std::numeric_limits, so you could do it like that :)
#include <limits>
int intSize()
{
// digits returns non-sign bits, so add 1 and divide by 8 (bits in a byte)
return (std::numeric_limits<int>::digits+1)/8;
}
Pointer subtraction in C++ gives the number of elements between
the pointed to objects. In other words, intPtr2 - intPtr1
would return the number of int between these two pointers.
The program wants to know the number of bytes (char), so it
converts the int* to char*. Apparently, the author doesn't
want to use reinterpret_cast either. And static_cast will
not allow a direct convertion from int* to char*, so he
goes through void* (which is allowed).
Having said all that: judging from the name of the function and
how the pointers are actually initialized, a much simpler
implementation of this would be:
int
intSize()
{
return sizeof( int );
}
There is actually no need to convert to void*, other than avoiding reinterpret_cast.
Converting from a pointer-to-int to a pointer-to-char can be done in one step with a reinterpret_cast, or a C-style cast (which, by the standard, ends up doing a reinterpret_cast). You could do a C-style cast directly, but as that (by the standard) is a reinterpret_cast in that context, you'd violate the requirements. Very tricky!
However, you can convert from an int* to a char* through the void* intermediary using only static_cast. This is a small hole in the C++ type system -- you are doing a two-step reinterpret_cast without ever calling it -- because void* conversion is given special permission to be done via static_cast.
So all of the void* stuff is just to avoid the reinterpret_cast requirement, and would be silly to do in real code -- being aware you can do it might help understanding when someone did it accidentally in code (ie, your int* appears to be pointing at a string: how did that happen? Well, someone must have gone through a hole in the type system. Either a C-style cast (and hence a reinterpret_cast), or it must have round-tripped through void* via static_cast).
If we ignore that gymnastics, we now have an array of int. We take pointers to adjacent elements. In C++, arrays are packed, with the difference between adjacent elements equal to the sizeof the elements.
We then convert those pointers to pointers-to-char, because we know (by the standard) that sizeof(char)==1. We subtract these char pointers, as that tells us how many multiples-of-sizeof(char) there are between them (if we subtract int pointers, we get how many multiples-of-sizeof(int) there are between them), which ends up being the size of the int.
If we try to print charPtr1 through std::cout, std::cout assumes that our char* is a pointer-to-\0-terminated-buffer-of-char, due to C/C++ convention. The first char pointed to is \0, so std::cout prints nothing. If we wanted to print the pointer value of the char*, we'd have to cast it to something like void* (maybe via static_cast<void*>(p)).
Please read this: richly commented.
int intSize()
{
int intArray[2]; // Allocate two elements. We don't need any more than that.
/*intPtr1 and intPtr2 point to the addresses of the zeroth and first array elements*/
int* intPtr1 = &intArray[0]; // Arrays in C++ are zero based
int* intPtr2 = &intArray[1];
/*Note that intPtr2 - intPtr1 measures the distance in memory
between the array elements in units of int*/
/*What we want to do is measure that distance in units of char;
i.e. in bytes since once char is one byte*/
/*The trick is to cast from int* to char*. In c++ you need to
do this via void* if you are not allowed to use reinterpret_cast*/
void* voidPtr1 = static_cast<void*>(intPtr1);
char* charPtr1 = static_cast<char*>(voidPtr1);
void* voidPtr2 = static_cast<void*>(intPtr2);
char* charPtr2 = static_cast<char*>(voidPtr2);
/*The distance in memory will now be measure in units of char;
that's how pointer arithmetic works*/
/*Since the original array is a contiguous memory block, the
distance will be the size of each element, i.e. sizeof(int) */
return charPtr2 - charPtr1;
}