C++ and box2d: userdata cast int to void* - c++

i'm pretty new to box2d and i'm trying to use the userdata (of type void*) field in the b2body object to store an int value (an enum value, so i know the type of the object).
right now i'm doing something this:
int number = 1023;
void* data = (void*)(&number);
int rNumber = *(int*)data;
and i get the value correctly, but as i've been reading around casting to void* it's not portable or recommendable... is my code cross-platform? is it's behavior defined or implementation dependent?
Thanks!

Casting to void * is portable. It is not recommended because you are losing type safety. Anything can be put into a void * and anything can be gotten out. It makes it easier to shoot yourself in the foot. Otherwise void * is fine as long as you are careful and extra cautious.

You are actually not casting int to void*, you cast int* to void*, which is totally different.
A pointer to any type can be stored in a void*, and be cast back again to the same type. That is guaranteed to work.
Casting to any other type is not portable, as the language standard doesn't say that different pointers have to be the same size, or work the same way. Just that void* has to be wide enough to contain them.

One of the problems with void* is that you need to know (keep track of) what type it originally was in order to cast it properly. If it originally was a float and you case it to an int the compiler would just take your word for it.
To avoid this you could create a wrapper for your data which contains the type, that way you will be able to always cast it to the right type.
Edit: you should also make a habit of using C++ casting style instead of C i.e. reinterpret_cast

void * is somehow a relic of the past (ISO C), but very convenient. You can use it safely as far as you are careful casting back and forward the type you want. Consider other alternatives like the c++ class system or overloading a function
anyways you will have better cast operators, some times there is no other way around (void*), some other times they are just too convenient.
It can lead to non portable code, not because of the casting system, but because some people are tempted to do non portable operations with them. The biggest problem lies in the fact that (void*) is a as big as a memory address, which in many platforms happens to be also the length of the platform integers.
However in some rare exceptions size(void*) != size(int)
If you try to do some type of operations/magic with them without casting back to the type you want, you might have problems. You might be surprised of how many times I have seen people wanting to store an integer into a void* pointer

To answer your question, yes, it's safe to do.
To answer the question you didn't ask, that void pointer isn't meant for keeping an int in. Box2D has that pointer for you to point back to an Entity object in your game engine, so you can associate an Entity in your game to a b2Body in the physics simulation. It allows you to more easily program your entities interact with one another when one b2Body interacts with another.
So you shouldn't just be putting an enum in that void*. You should be pointing it directly to the game object represented by that b2body, which could have an enum in it.

Related

Should I replace (void*, size) with a GSL span?

Suppose I have
int foo(void* p, size_t size_in_bytes);
and assume it doesn't make sense to make foo typed. I want to be a good coder and apply the C++ core guidelines. Specifically, I want to use spans instead of (*, len) pairs. Well, span<void> won't compile (can't add to a void *); and span<char> or span<uint8_t> etc. would imply foo actually expects chars, which it might not.
So should I use a span<something-with-size-1> in this case, or stick with void*?
There can be no general answer to this question.
For a function to say that it takes a span<T> means that it takes a contiguous array of values, without any form of ownership transference. If that description does not reasonably represent what is going on, then it should not take a span<T>.
For example:
What if the function checks whether the buffer intersects a region in my memory space which is, say, mapped to a file?
That doesn't sound like a span<T>. That sounds like you should have a simple aggregate with a name that makes it clear what it means:
struct memory_region
{
void* p;
size_t size_in_bytes;
};
You could even give it a member function for testing intersections. If you are making a system for dealing with such regions of memory, I might advise a more encapsulated class type with constructors and such.
What type the function takes should explain what the data means. Preferably this meaning would be in a general sense, but at the very least, it should say what it means for the function in question.
One more thing:
or span<uint8_t> etc. would imply foo actually expects chars
No, it would not. While uint8_t will almost certainly have the same size as an unsigned char, that does not mean that one would expect to be able to pass an array of characters to any function which takes span<uint8_t>. If that function wanted to advertise that it accepted characters, it would have used unsigned char.
I meant to say span<whatever> would imply the function expect whatever's.
Yes, the requirement for spans is that it is passed an actual array of Ts of the given size.
The proposal before the C++ standardization committee is that if you want to pass around a pointer to a sequence of bytes (which is often what people want to do when passing void* around), then you would pass a span<std::byte>, which relies upon a new std::byte type. However, that requires a small change to the language standard to be legal.
In today's C++, you could pass span<unsigned char> (typedef'd as you find most descriptive) and get the same effect: access to a sequence of bytes.
What I chose to do, and what I think is, shall we say, sound design-wise, is implement a class named memory_region, which has all of the type-inspecific functionality of gsl::span (hence, for example, it doesn't have a begin() or an end()). It is not the same thing as a span of bytes, IMO - and I can structurally never get them mixed up.
Here's my implementation (it's part of a repository of DBMS-related GPU kernels and a testing framework I'm working on, hence the CUDA-related snippet; and it depends on some GSL, in my case gsl-lite by MS'es should be ok too I think).

cpp Is there a reason to prefer vector<class_type*> to vector<void*>?

I have class A_class, and I need a vector of pointers to objects of this class. Is there a good reason to prefer either of the usages:
vector<void*> my_vector;
vector<A_class*> my_vector;
is there really a difference?
Edit:
Let me rephrase - is there a difference in the way the actual array is stored? I'm not asking about safety, but about performance.
Using void * should be avoided whenever possible.
This is for two reasons:
You lose the type of the object.
The code gets messier, because you have to cast the void * back to the original type.
There are very few cases in C++ (thanks to templates) where void * is actually meaningful. I'm sure there are some, but I can't really think of any case right now (aside from when you need to use it for historical/compatibility reasons, e.g. transferring a pointer to a class to "C" code, where it doesn't understand the concept of class, so you can't give it the correct type).
You can't do a whole lot with void* unless you cast it, so there's really no advantage to using it over the actual type.
Even if the only reason was to avoid the unnecessary cast, that would be enough. It's not though. You can potentially cast the void* to anything and use the result incorrectly. Using the real class itself also has the added benefit that the code is clearer.
is there really a difference?
A vector<void*> would also accept values of type int*, YourClassHere** and any other pointer you can imagine. It's unsafe, so unless you have a very good reason to use it (and I can't think of many) - don't.
Why did you even consider it as an alternative?
It would also require that you explicitly cast the pointers back to A_class* whenever you use them (which is ugly, verbose and error prone).
You should remember that a void pointer cannot be dereferenced directly. You will ultimately have to typecast it to your class.
So if there is not any good reason to go for a void pointer and you are sure that the vector shall store pointers to objects of A_class, then I would prefer to avoid the use of void pointers.
You chose
vector<*void> my_vector;
if you don't know objects of what type will be stored in my_vector. If you know they will be of type A_class then use
vector<A_class*> my_vector;
using void pointers

Why are function pointers and data pointers incompatible in C/C++?

I have read that converting a function pointer to a data pointer and vice versa works on most platforms but is not guaranteed to work. Why is this the case? Shouldn't both be simply addresses into main memory and therefore be compatible?
An architecture doesn't have to store code and data in the same memory. With a Harvard architecture, code and data are stored in completely different memory. Most architectures are Von Neumann architectures with code and data in the same memory but C doesn't limit itself to only certain types of architectures if at all possible.
Some computers have (had) separate address spaces for code and data. On such hardware it just doesn't work.
The language is designed not only for current desktop applications, but to allow it to be implemented on a large set of hardware.
It seems like the C language committee never intended void* to be a pointer to function, they just wanted a generic pointer to objects.
The C99 Rationale says:
6.3.2.3 Pointers
C has now been implemented on a wide range of architectures. While some of these
architectures feature uniform pointers which are the size of some integer type, maximally
portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type.
The use of void* (“pointer to void”) as a generic object pointer type is an invention of the C89 Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.
Note Nothing is said about pointers to functions in the last paragraph. They might be different from other pointers, and the committee is aware of that.
For those who remember MS-DOS, Windows 3.1 and older the answer is quite easy. All of these used to support several different memory models, with varying combinations of characteristics for code and data pointers.
So for instance for the Compact model (small code, large data):
sizeof(void *) > sizeof(void(*)())
and conversely in the Medium model (large code, small data):
sizeof(void *) < sizeof(void(*)())
In this case you didn't have separate storage for code and date but still couldn't convert between the two pointers (short of using non-standard __near and __far modifiers).
Additionally there's no guarantee that even if the pointers are the same size, that they point to the same thing - in the DOS Small memory model, both code and data used near pointers, but they pointed to different segments. So converting a function pointer to a data pointer wouldn't give you a pointer that had any relationship to the function at all, and hence there was no use for such a conversion.
Pointers to void are supposed to be able to accommodate a pointer to any kind of data -- but not necessarily a pointer to a function. Some systems have different requirements for pointers to functions than pointers to data (e.g, there are DSPs with different addressing for data vs. code, medium model on MS-DOS used 32-bit pointers for code but only 16-bit pointers for data).
In addition to what is already said here, it is interesting to look at POSIX dlsym():
The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void *) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted as in:
fptr = (int (*)(int))dlsym(handle, "my_function");
Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.
C++11 has a solution to the long-standing mismatch between C/C++ and POSIX with regard to dlsym(). One can use reinterpret_cast to convert a function pointer to/from a data pointer so long as the implementation supports this feature.
From the standard, 5.2.10 para. 8, "converting a function pointer to an object pointer type or vice versa is conditionally-supported." 1.3.5 defines "conditionally-supported" as a "program construct that an implementation is not required to support".
Depending on the target architecture, code and data may be stored in fundamentally incompatible, physically distinct areas of memory.
undefined doesn't necessarily mean not allowed, it can mean that the compiler implementor has more freedom to do it how they want.
For instance it may not be possible on some architectures - undefined allows them to still have a conforming 'C' library even if you can't do this.
Another solution:
Assuming POSIX guarantees function and data pointers to have the same size and representation (I can't find the text for this, but the example OP cited suggests they at least intended to make this requirement), the following should work:
double (*cosine)(double);
void *tmp;
handle = dlopen("libm.so", RTLD_LAZY);
tmp = dlsym(handle, "cos");
memcpy(&cosine, &tmp, sizeof cosine);
This avoids violating the aliasing rules by going through the char [] representation, which is allowed to alias all types.
Yet another approach:
union {
double (*fptr)(double);
void *dptr;
} u;
u.dptr = dlsym(handle, "cos");
cosine = u.fptr;
But I would recommend the memcpy approach if you want absolutely 100% correct C.
They can be different types with different space requirements. Assigning to one can irreversibly slice the value of the pointer so that assigning back results in something different.
I believe they can be different types because the standard doesn't want to limit possible implementations that save space when it's not needed or when the size could cause the CPU to have to do extra crap to use it, etc...
The only truly portable solution is not to use dlsym for functions, and instead use dlsym to obtain a pointer to data that contains function pointers. For example, in your library:
struct module foo_module = {
.create = create_func,
.destroy = destroy_func,
.write = write_func,
/* ... */
};
and then in your application:
struct module *foo = dlsym(handle, "foo_module");
foo->create(/*...*/);
/* ... */
Incidentally, this is good design practice anyway, and makes it easy to support both dynamic loading via dlopen and static linking all modules on systems that don't support dynamic linking, or where the user/system integrator does not want to use dynamic linking.
A modern example of where function pointers can differ in size from data pointers: C++ class member function pointers
Directly quoted from https://blogs.msdn.microsoft.com/oldnewthing/20040209-00/?p=40713/
class Base1 { int b1; void Base1Method(); };
class Base2 { int b2; void Base2Method(); };
class Derived : public Base1, Base2 { int d; void DerivedMethod(); };
There are now two possible this pointers.
A pointer to a member function of Base1 can be used as a pointer to a
member function of Derived, since they both use the same this
pointer. But a pointer to a member function of Base2 cannot be used
as-is as a pointer to a member function of Derived, since the this
pointer needs to be adjusted.
There are many ways of solving this. Here's how the Visual Studio
compiler decides to handle it:
A pointer to a member function of a multiply-inherited class is really
a structure.
[Address of function]
[Adjustor]
The size of a pointer-to-member-function of a class that uses multiple inheritance is the size of a pointer plus the size of a size_t.
tl;dr: When using multiple inheritance, a pointer to a member function may (depending on compiler, version, architecture, etc) actually be stored as
struct {
void * func;
size_t offset;
}
which is obviously larger than a void *.
On most architectures, pointers to all normal data types have the same representation, so casting between data pointer types is a no-op.
However, it's conceivable that function pointers might require a different representation, perhaps they're larger than other pointers. If void* could hold function pointers, this would mean that void*'s representation would have to be the larger size. And all casts of data pointers to/from void* would have to perform this extra copy.
As someone mentioned, if you need this you can achieve it using a union. But most uses of void* are just for data, so it would be onerous to increase all their memory use just in case a function pointer needs to be stored.
I know that this hasn't been commented on since 2012, but I thought it would be useful to add that I do know an architecture that has very incompatible pointers for data and functions since a call on that architecture checks privilege and carries extra information. No amount of casting will help. It's The Mill.

Dynamic type dereferrencing?

In attempting to answer another question, I was intrigued by a bout of curiousity, and wanted to find out if an idea was possible.
Is it possible to dynamically dereference either a void * pointer (we assume it points to a valid referenced dynamically allocated copy) or some other type during run time to return the correct type?
Is there some way to store a supplied type (as in, the class knows the void * points to an int), if so how?
Can said stored type (if possible) be used to dynamically dereference?
Can a type be passed on it's own as an argument to a function?
Generally the concept (no code available) is a doubly-linked list of void * pointers (or similar) that can dynamically allocated space, which also keep with them a copy of what type they hold for later dereference.
1) Dynamic references:
No. Instead of having your variables hold just pointers, have them hold a struct containing both the actual pointer and a tag defining what type the pointer is pointing to
struct Ref{
int tag;
void *ref;
};
and then, when "dereferencing", first check the tag to find out what you want to do.
2) Storing types in your variables, passing them to functions.
This doesn't really make sense, as types aren't values that can be stored around. Perhaps what you just want is to pass around a class / constructor function and that is certainly feasible.
In the end, C and C++ are bare-bones languages. While a variable assignment in a dynamic language looks a lot like a variable assignment in C (they are just a = after all) in reality the dynamic language is doing a lot of extra stuff behind the scenes (something it is allowed to do, since a new language is free to define its semantics)
Sorry, this is not really possible in C++ due to lack of type reflection and lack of dynamic binding. Dynamic dereferencing is especially impossible due to these.
You could try to emulate its behavior by storing types as enums or std::type_info* pointers, but these are far from practical. They require registration of types, and huge switch..case or if..else statements every time you want to do something with them. A common container class and several wrapper classes might help achieving them (I'm sure this is some design pattern, any idea of its name?)
You could also use inheritance to solve your problem if it fits.
Or perhaps you need to reconsider your current design. What exactly do you need this for?

What use are const pointers (as opposed to pointers to const objects)?

I've often used pointers to const objects, like so...
const int *p;
That simply means that you can't change the integer that p is pointing at through p. But I've also seen reference to const pointers, declared like this...
int* const p;
As I understand it, that means that the pointer variable itself is constant -- you can change the integer it points at all day long, but you can't make it point at something else.
What possible use would that have?
When you're designing C programs for embedded systems, or special purpose programs that need to refer to the same memory (multi-processor applications sharing memory) then you need constant pointers.
For instance, I have a 32 bit MIPs processor that has a little LCD attached to it. I have to write my LCD data to a specific port in memory, which then gets sent to the LCD controller.
I could #define that number, but then I also have to cast it as a pointer, and the C compiler doesn't have as many options when I do that.
Further, I might need it to be volatile, which can also be cast, but it's easier and clearer to use the syntax provided - a const pointer to a volatile memory location.
For PC programs, an example would be: If you design DOS VGA games (there are tutorials online which are fun to go through to learn basic low level graphics) then you need to write to the VGA memory, which might be referenced as an offset from a const pointer.
-Adam
It allows you to protect the pointer from being changed. This means you can protect assumptions you make based on the pointer never changing or from unintentional modification, for example:
int* const p = &i;
...
p++; /* Compiler error, oops you meant */
(*p)++; /* Increment the number */
another example:
if you know where it was initialized, you can avoid future NULL checks.
The compiler guarantees you that the pointer never changed (to NULL)…
In any non-const C++ member function, the this pointer is of type C * const, where C is the class type -- you can change what it points to (i.e. its members), but you can't change it to point to a different instance of a C. For const member functions, this is of type const C * const. There are also (rarely encountered) volatile and const volatile member functions, for which this also has the volatile qualifier.
One use is in low-level (device driver or embedded) code where you need to reference a specific address that's mapped to an input/output device like a hardware pin. Some languages allow you to link variables at specific addresses (e.g. Ada has use at). In C the most idiomatic way to do this is to declare a constant pointer. Note that such usages should also have the volatile qualifier.
Other times it's just defensive coding. If you have a pointer that shouldn't change it's wise to declare it such that it cannot change. This will allow the compiler (and lint tools) to detect erroneous attempts to modify it.
I've always used them when I wanted to avoid unintended modification to the pointer (such as pointer arithmetic, or inside a function). You can also use them for Singleton patterns.
'this' is a hardcoded constant pointer.
Same as a "const int" ... if the compiler knows it's not going to change, it can be optimization assumptions based on that.
struct MyClass
{
char* const ptr;
MyClass(char* str) :ptr(str) {}
void SomeFunc(MyOtherClass moc)
{
for(int i=0; i < 100; ++i)
{
printf("%c", ptr[i]);
moc.SomeOtherFunc(this);
}
}
}
Now, the compiler could do quite a bit to optimize that loop --- provided it knows that SomeOtherFunc() does not change the value of ptr. With the const, the compiler knows that, and can make the assumptions. Without it, the compiler has to assume that SomeOtherFunc will change ptr.
I have seen some OLE code where you there was an object passed in from outside the code and to work with it, you had to access the specific memory that it passed in. So we used const pointers to make sure that functions always manipulated the values than came in through the OLE interface.
Several good reasons have been given as answers to this questions (memory-mapped devices and just plain old defensive coding), but I'd be willing to bet that most instances where you see this it's actually an error and that the intent was to have to item be a pointer-to-const.
I certainly have no data to back up this hunch, but I'd still make the bet.
Think of type* and const type* as types themselves. Then, you can see why you might want to have a const of those types.
always think of a pointer as an int. this means that
object* var;
actually can be thought of as
int var;
so, a const pointer simply means that:
const object* var;
becomes
const int var;
and hence u can't change the address that the pointer points too, and thats all. To prevent data change, u must make it a pointer to a const object.