This has been bugging me for a very long time: how to do pointer conversion from anything to char * to dump binary to disk.
In C, you don't even think about it.
double d = 3.14;
char *cp = (char *)&d;
// do what u would do to dump to disk
However, in C++, where everyone is saying C-cast is frowned upon, I've been doing this:
double d = 3.14;
auto cp = reinterpret_cast<char *>(&d);
Now this is copied from cppreference,
so I assume this is the proper way.
However, I've read from multiple sources saying this is UB.
(e.g. this one)
So I can't help wonder if there is any "DB" way at all (According to that post, there's none).
Another scenario I often encounter is to implement an API like this:
void serialize(void *buffer);
where you would dump a lot of things to this buffer. Now, I've been doing this:
void serialize(void *buffer) {
int intToDump;
float floatToDump;
int *ip = reinterpret_cast<int *>(buffer);
ip[0] = intToDump;
float *fp = reinterpret_cast<float *>(&ip[1]);
fp[0] = floatToDump;
}
Well, I guess this is UB as well.
Now, is there truly no "DB" way to accomplish either of these tasks?
I've seen someone using uintptr_t to accomplish sth similar to serialize task with pointer as integer math along with sizeof,
but I'm guessing here that it's UB as well.
Even though they are UB, compiler writers usually do the rational things to make sure everything is okay.
And I'm okay with that: it's not an unreasonable thing to ask for.
So my questions really are, for the two common tasks mentioned above:
Is there truly no "DB" way to accomplish them that will satisfy the ultimate C++ freaks?
Any better way to accomplish them other than what I've been doing?
Thanks!
Your serialize implementation's behavior is undefined because you violate the strict aliasing rules. The strict aliasing rules say, in short, that you cannot reference any object via a pointer or reference to a different type. There is one major exception to that rule though: any object may be referenced via a pointer to char, unsigned char, or (since C++17) std::byte. Note that this exception does not apply the other way around; a char array may not be accessed via a pointer to a type other than char.
That means that you can make your serialize function well-defined by changing it as so:
void serialize(char* buffer) {
int intToDump = 42;
float floatToDump = 3.14;
std::memcpy(buffer, &intToDump, sizeof(intToDump));
std::memcpy(buffer + sizeof(intToDump), &floatToDump, sizeof(floatToDump));
// Or you could do byte-by-byte manual copy loops
// i.e.
//for (std::size_t i = 0; i < sizeof(intToDump); ++i, ++buffer) {
// *buffer = reinterpret_cast<char*>(&intToDump)[i];
//}
//for (std::size_t i = 0; i < sizeof(floatToDump); ++i, ++buffer) {
// *buffer = reinterpret_cast<char*>(&floatToDump)[i];
//}
}
Here, rather than casting buffer to a pointer to an incompatible type, std::memcpy casts a pointer to the object to serialize to a pointer to unsigned char. In doing so, the strict aliasing rules are not violated, and the program's behavior remains well-defined. Note that the exact representation is still unspecified; as it will depend on your CPU's endianess.
Related
I am learning C++ using C++ Primer 5th edition. In particular, i read about void*. There it is written that:
We cannot use a void* to operate on the object it addresses—we don’t know that object’s type, and the type determines what operations we can perform on that object.
void*: Pointer type that can point to any nonconst type. Such pointers may not
be dereferenced.
My question is that if we're not allowed to use a void* to operate on the object it addressess then why do we need a void*. Also, i am not sure if the above quoted statement from C++ Primer is technically correct because i am not able to understand what it is conveying. Maybe some examples can help me understand what the author meant when he said that "we cannot use a void* to operate on the object it addresses". So can someone please provide some example to clarify what the author meant and whether he is correct or incorrect in saying the above statement.
My question is that if we're not allowed to use a void* to operate on the object it addressess then why do we need a void*
It's indeed quite rare to need void* in C++. It's more common in C.
But where it's useful is type-erasure. For example, try to store an object of any type in a variable, determining the type at runtime. You'll find that hiding the type becomes essential to achieve that task.
What you may be missing is that it is possible to convert the void* back to the typed pointer afterwards (or in special cases, you can reinterpret as another pointer type), which allows you to operate on the object.
Maybe some examples can help me understand what the author meant when he said that "we cannot use a void* to operate on the object it addresses"
Example:
int i;
int* int_ptr = &i;
void* void_ptr = &i;
*int_ptr = 42; // OK
*void_ptr = 42; // ill-formed
As the example demonstrates, we cannot modify the pointed int object through the pointer to void.
so since a void* has no size(as written in the answer by PMF)
Their answer is misleading or you've misunderstood. The pointer has a size. But since there is no information about the type of the pointed object, the size of the pointed object is unknown. In a way, that's part of why it can point to an object of any size.
so how can a int* on the right hand side be implicitly converted to a void*
All pointers to objects can implicitly be converted to void* because the language rules say so.
Yes, the author is right.
A pointer of type void* cannot be dereferenced, because it has no size1. The compiler would not know how much data he needs to get from that address if you try to access it:
void* myData = std::malloc(1000); // Allocate some memory (note that the return type of malloc() is void*)
int value = *myData; // Error, can't dereference
int field = myData->myField; // Error, a void pointer obviously has no fields
The first example fails because the compiler doesn't know how much data to get. We need to tell it the size of the data to get:
int value = *(int*)myData; // Now fine, we have casted the pointer to int*
int value = *(char*)myData; // Fine too, but NOT the same as above!
or, to be more in the C++-world:
int value = *static_cast<int*>(myData);
int value = *static_cast<char*>(myData);
The two examples return a different result, because the first gets an integer (32 bit on most systems) from the target address, while the second only gets a single byte and then moves that to a larger variable.
The reason why the use of void* is sometimes still useful is when the type of data doesn't matter much, like when just copying stuff around. Methods such as memset or memcpy take void* parameters, since they don't care about the actual structure of the data (but they need to be given the size explicitly). When working in C++ (as opposed to C) you'll not use these very often, though.
1 "No size" applies to the size of the destination object, not the size of the variable containing the pointer. sizeof(void*) is perfectly valid and returns, the size of a pointer variable. This is always equal to any other pointer size, so sizeof(void*)==sizeof(int*)==sizeof(MyClass*) is always true (for 99% of today's compilers at least). The type of the pointer however defines the size of the element it points to. And that is required for the compiler so he knows how much data he needs to get, or, when used with + or -, how much to add or subtract to get the address of the next or previous elements.
void * is basically a catch-all type. Any pointer type can be implicitly cast to void * without getting any errors. As such, it is mostly used in low level data manipulations, where all that matters is the data that some memory block contains, rather than what the data represents. On the flip side, when you have a void * pointer, it is impossible to determine directly which type it was originally. That's why you can't operate on the object it addresses.
if we try something like
typedef struct foo {
int key;
int value;
} t_foo;
void try_fill_with_zero(void *destination) {
destination->key = 0;
destination->value = 0;
}
int main() {
t_foo *foo_instance = malloc(sizeof(t_foo));
try_fill_with_zero(foo_instance, sizeof(t_foo));
}
we will get a compilation error because it is impossible to determine what type void *destination was, as soon as the address gets into try_fill_with_zero. That's an example of being unable to "use a void* to operate on the object it addresses"
Typically you will see something like this:
typedef struct foo {
int key;
int value;
} t_foo;
void init_with_zero(void *destination, size_t bytes) {
unsigned char *to_fill = (unsigned char *)destination;
for (int i = 0; i < bytes; i++) {
to_fill[i] = 0;
}
}
int main() {
t_foo *foo_instance = malloc(sizeof(t_foo));
int test_int;
init_with_zero(foo_instance, sizeof(t_foo));
init_with_zero(&test_int, sizeof(int));
}
Here we can operate on the memory that we pass to init_with_zero represented as bytes.
You can think of void * as representing missing knowledge about the associated type of the data at this address. You may still cast it to something else and then dereference it, if you know what is behind it. Example:
int n = 5;
void * p = (void *) &n;
At this point, p we have lost the type information for p and thus, the compiler does not know what to do with it. But if you know this p is an address to an integer, then you can use that information:
int * q = (int *) p;
int m = *q;
And m will be equal to n.
void is not a type like any other. There is no object of type void. Hence, there exists no way of operating on such pointers.
This is one of my favourite kind of questions because at first I was also so confused about void pointers.
Like the rest of the Answers above void * refers to a generic type of data.
Being a void pointer you must understand that it only holds the address of some kind of data or object.
No other information about the object itself, at first you are asking yourself why do you even need this if it's only able to hold an address. That's because you can still cast your pointer to a more specific kind of data, and that's the real power.
Making generic functions that works with all kind of data.
And to be more clear let's say you want to implement generic sorting algorithm.
The sorting algorithm has basically 2 steps:
The algorithm itself.
The comparation between the objects.
Here we will also talk about pointer functions.
Let's take for example qsort built in function
void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))
We see that it takes the next parameters:
base − This is the pointer to the first element of the array to be sorted.
nitems − This is the number of elements in the array pointed by base.
size − This is the size in bytes of each element in the array.
compar − This is the function that compares two elements.
And based on the article that I referenced above we can do something like this:
int values[] = { 88, 56, 100, 2, 25 };
int cmpfunc (const void * a, const void * b) {
return ( *(int*)a - *(int*)b );
}
int main () {
int n;
printf("Before sorting the list is: \n");
for( n = 0 ; n < 5; n++ ) {
printf("%d ", values[n]);
}
qsort(values, 5, sizeof(int), cmpfunc);
printf("\nAfter sorting the list is: \n");
for( n = 0 ; n < 5; n++ ) {
printf("%d ", values[n]);
}
return(0);
}
Where you can define your own custom compare function that can match any kind of data, there can be even a more complex data structure like a class instance of some kind of object you just define. Let's say a Person class, that has a field age and you want to sort all Persons by age.
And that's one example where you can use void * , you can abstract this and create other use cases based on this example.
It is true that is a C example, but I think, being something that appeared in C can make more sense of the real usage of void *. If you can understand what you can do with void * you are good to go.
For C++ you can also check templates, templates can let you achieve a generic type for your functions / objects.
According to Effective C++, "casting object addresses to char* pointers and then using pointer arithemetic on them almost always yields undefined behavior."
Is this true for plain-old-data? for example in this template function I wrote long ago to print the bits of an object. It works splendidly on x86, but... is it portable?
#include <iostream>
template< typename TYPE >
void PrintBits( TYPE data ) {
unsigned char *c = reinterpret_cast<unsigned char *>(&data);
std::size_t i = sizeof(data);
std::size_t b;
while ( i>0 ) {
i--;
b=8;
while ( b > 0 ) {
b--;
std::cout << ( ( c[i] & (1<<b) ) ? '1' : '0' );
}
}
std::cout << "\n";
}
int main ( void ) {
unsigned int f = 0xf0f0f0f0;
PrintBits<unsigned int>( f );
return 0;
}
It certainly is not portable. Even if you stick to fundamental types, there is endianness and there is sizeof, so your function will print different results on big-endian machines, or on machines where sizeof(int) is 16 or 64. Another issue is that not all PODs are fundamental types, structs may be POD, too.
POD struct members may have internal paddings according to the implementation-defined alignment rules. So if you pass this POD struct:
struct PaddedPOD
{
char c;
int i;
}
your code would print the contents of padding, too. And that padding will be different even on the same compiler with different pragmas and options.
On the other side, maybe it's just what you wanted.
So, it's not portable, but it's not UB. There are some standard guarantees: you can copy PODs to and from array of char or unsigned char, and the result of this copying via char buffer will hold the original value. That implies that you can safely traverse that array, so your function is safe. But nobody guarantees that this array (or object representation) of objects with same type and value will be the same on different computers.
BTW, I couldn't find that passage in Effective C++. Would you quote it, pls? I could say, if a part of your code already contains lots of #ifdef thiscompilerversion, sometimes it makes sense to go all-nonstandard and use some hacks that lead to undefined behavior, but work as intended on this compiler version with this pragmas and options. In that sense, yes, casting to char * often leads to UB.
Yes, POD types can always be treated as an array of chars, of size sizeof (TYPE). POD types are just like the corresponding C types (that's what makes them "plain, old"). Since C doesn't have function overloading, writing "generic" functions to do things like write them to files or network streams depends on the ability to access them as char arrays.
What is the most simple and efficient why to copy an int to a boost/std::array?
The following seems to work, but I'm not sure if this is the most appropriate way to do it:
int r = rand();
boost::array<char, sizeof(int)> send_buf;
std::copy(reinterpret_cast<char*>(&r), reinterpret_cast<char*>(&r + sizeof(int)), &send_buf[0]);
Just for comparison, here's the same thing with memcpy:
#include <cstring>
int r = rand();
boost::array<char, sizeof(int)> send_buf;
std::memcpy(&send_buf[0], &r, sizeof(int));
Your call whether an explosion of casts (and the opportunity to get them wrong) is better or worse than the C++ "sin" of using a function also present in C ;-)
Personally I think memcpy is quite a good "alarm" for this kind of operation, for the same reason that C++-style casts are a good "alarm" (easy to spot while reading, easy to search for). But you might prefer to have the same alarm for everything, in which case you can cast the arguments of memcpy to void*.
Btw, I might use sizeof r for both sizes rather than sizeof(int), but it sort of depends whether the context demands that the array "is big enough for r (which happens to be an int)" or "is the same size as an int (which r happens to be)". Since it's a send buffer, I guess the buffer is the size that the wire protocol demands and r is supposed to match the buffer, rather than the other way around. So sizeof(int) is probably appropriate but 4 or PROTOCOL_INTEGER_SIZE might be more appropriate still.
The idea is correct, but you have a bug:
reinterpret_cast<char*>(&r + sizeof(int))
Should be:
reinterpret_cast<char*>(&r) + sizeof(int)
or
reinterpret_cast<char*>(&r+1)
These or a memcpy equivalent are OK. Anything else risks alignment issues.
It's common currency to use reinterpret_cast for those purposes but the Standard makes it pretty clear that static_cast via void* is perfectly acceptable. In fact in the case of a type like int then reinterpret_cast<char*>(&r) is defined to have the semantics of static_cast<char*>(static_cast<void*>(&r)). Why not be explicit and use that outright?
If you get into the habit, you have less chance in the future of using a reinterpret_cast which will end up having implementation-defined semantics rather than a static_cast chain which will always have well-defined semantics.
Do note that you're allowed to treat a pointer to a single object as if it were a pointer into an array of one (cf. 5.7/4). This is convenient for obtaining the second pointer.
int r = rand();
boost::array<char, sizeof(int)> send_buf;
auto cast = [](int* p) { return static_cast<char*>(static_cast<void*>(p)); };
std::copy(cast(&r), cast(&r + 1), &send_buf[0]);
Minor bug as pointed out by Michael Anderson
But you could do this:
#include <iostream>
union U
{
int intVal;
char charVal[sizeof(int)];
};
int main()
{
U val;
val.intVal = 6;
std::cout << (int)val.charVal[0] << ":" << (int)val.charVal[1] << ":" << (int)val.charVal[2] << ":" << (int)val.charVal[3] << "\n";
}
Lately I've been doing a lot of exercises with file streams. When I use fstream.write(...)
to e.g. write an array of 10 integers (intArr[10]) I write:
fstream.write((char*)intArr,sizeof(int)*10);
Is the (char*)intArr-cast safe? I didn't have any problems with it until now but I learned about static_cast (the c++ way right?) and used static_cast<char*>(intArr) and it failed! Which I cannot understand ... Should I change my methodology?
A static cast simply isn't the right thing. You can only perform a static cast when the types in question are naturally convertible. However, unrelated pointer types are not implicitly convertible; i.e. T* is not convertible to or from U* in general. What you are really doing is a reinterpreting cast:
int intArr[10];
myfile.write(reinterpret_cast<const char *>(intArr), sizeof(int) * 10);
In C++, the C-style cast (char *) becomes the most appropriate sort of conversion available, the weakest of which is the reinterpreting cast. The benefit of using the explicit C++-style casts is that you demonstrate that you understand the sort of conversion that you want. (Also, there's no C-equivalent to a const_cast.)
Maybe it's instructive to note the differences:
float q = 1.5;
uint32_t n = static_cast<uint32_t>(q); // == 1, type conversion
uint32_t m1 = reinterpret_cast<uint32_t>(q); // undefined behaviour, but check it out
uint32_t m2 = *reinterpret_cast<const uint32_t *>(&q); // equally bad
Off-topic: The correct way of writing the last line is a bit more involved, but uses copious amounts of casting:
uint32_t m;
char * const pm = reinterpret_cast<char *>(&m);
const char * const pq = reinterpret_cast<const char *>(&q);
std::copy(pq, pq + sizeof(float), pm);
Assume that in my code I have to store a void* as data member and typecast it back to the original class pointer when needed. To test its reliability, I wrote a test program (linux ubuntu 4.4.1 g++ -04 -Wall) and I was shocked to see the behavior.
struct A
{
int i;
static int c;
A () : i(c++) { cout<<"A() : i("<<i<<")\n"; }
};
int A::c;
int main ()
{
void *p = new A[3]; // good behavior for A* p = new A[3];
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
}
This is just a test program; in actual for my case, it's mandatory to store any pointer as void* and then cast it back to the actual pointer (with help of template). So let's not worry about that part. The output of the above code is,
p->i = 0
p->i = 0 // ?? why not 1
p->i = 1
However if you change the void* p; to A* p; it gives expected behavior. WHY ?
Another question, I cannot get away with (A*&) otherwise I cannot use operator ++; but it also gives warning as, dereferencing type-punned pointer will break strict-aliasing rules. Is there any decent way to overcome warning ?
Well, as the compiler warns you, you are violating the strict aliasing rule, which formally means that the results are undefined.
You can eliminate the strict aliasing violation by using a function template for the increment:
template<typename T>
void advance_pointer_as(void*& p, int n = 1) {
T* p_a(static_cast<T*>(p));
p_a += n;
p = p_a;
}
With this function template, the following definition of main() yields the expected results on the Ideone compiler (and emits no warnings):
int main()
{
void* p = new A[3];
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
}
You have already received the correct answer and it is indeed the violation of the strict aliasing rule that leads to the unpredictable behavior of the code. I'd just note that the title of your question makes reference to "casting back pointer to the original class". In reality your code does not have anything to do with casting anything "back". Your code performs reinterpretation of raw memory content occupied by a void * pointer as a A * pointer. This is not "casting back". This is reinterpretation. Not even remotely the same thing.
A good way to illustrate the difference would be to use and int and float example. A float value declared and initialized as
float f = 2.0;
cab be cast (explicitly or implicitly converted) to int type
int i = (int) f;
with the expected result
assert(i == 2);
This is indeed a cast (a conversion).
Alternatively, the same float value can be also reinterpreted as an int value
int i = (int &) f;
However, in this case the value of i will be totally meaningless and generally unpredictable. I hope it is easy to see the difference between a conversion and a memory reinterpretation from these examples.
Reinterpretation is exactly what you are doing in your code. The (A *&) p expression is nothing else than a reinterpretation of raw memory occupied by pointer void *p as pointer of type A *. The language does not guarantee that these two pointer types have the same representation and even the same size. So, expecting the predictable behavior from your code is like expecting the above (int &) f expression to evaluate to 2.
The proper way to really "cast back" your void * pointer would be to do (A *) p, not (A *&) p. The result of (A *) p would indeed be the original pointer value, that can be safely manipulated by pointer arithmetic. The only proper way to obtain the original value as an lvalue would be to use an additional variable
A *pa = (A *) p;
...
pa++;
...
And there's no legal way to create an lvalue "in place", as you attempted to by your (A *&) p cast. The behavior of your code is an illustration of that.
As others have commented, your code appears like it should work. Only once (in 17+ years of coding in C++) I ran across something where I was looking straight at the code and the behavior, like in your case, just didn't make sense. I ended up running the code through debugger and opening a disassembly window. I found what could only be explained as a bug in VS2003 compiler because it was missing exactly one instruction. Simply rearranging local variables at the top of the function (30 lines or so from the error) made the compiler put the correct instruction back in. So try debugger with disassembly and follow memory/registers to see what it's actually doing?
As far as advancing the pointer, you should be able to advance it by doing:
p = (char*)p + sizeof( A );
VS2003 through VS2010 never give you complaints about that, not sure about g++