Found this code example
void *handle;
double (*cosine)(double);
handle = dlopen("libm.so", RTLD_LAZY);
*(void **) (&cosine) = dlsym(handle, "cos");
I use rightmost-to-left reading rule to parse the variable's type:
double (*cosine)(double);
Here I write left-to-right but move LTR: "cosine" -> "*" -> "is a pointer"
then "(" we go outside the innermost () scope -> "(double)" -> "to function taking one double" -> and returning leftmost "double"
but what the hell is THIS? I even don't know where to start the parse from) is "&cosine" a address or reference? what does the (void **) mean? why it has leftmost "*" outside??? is it dereference or type?
*(void **) (&cosine)
Yup, that's a mouthful.
cosine is a function pointer. So &cosine is a pointer to that pointer. And then when we slap a * in front of it, we're changing the original pointer, to make it point somewhere else.
It's sort of like this:
int i = 5;
int *ip = &i;
*ip = 6; /* changes i to 6 */
Or it's more like this:
char a[10], b[10];
char *p = a;
*(&p) = b; /* changes p to point to b */
But in your case it's even trickier, because cosine is a pointer to a function, not a pointer to data. Most of the time, function pointers point to functions you have defined in your program. But here, we're arranging to make cosine point to a dynamically-loaded function, loaded by the dlsym() function.
dlsym is super special because it can return pointers to data, as well as pointers to functions. So it's got an impossible-to-define return type. It's declared as returning void *, of course, because that's the "generic" pointer type in C. (Think malloc.) But in pure C, void * is a generic data pointer type; it's not guaranteed to be able to be used with function pointers.
The straightforward thing to do would be to just say
cosine = dlsym(handle, "cos");
But a modern compiler will complain, because dlsym returns void *, and cosine has type double (*)(double) (that is, pointer to function taking double and returning double), and that's not a portable conversion.
So we go around the barn, and set cosine's value indirectly, not by saying
cosine = something
but rather by saying
*(&cosine) = something
But that's still no good in the dlsym case, because the types still don't match. We've got void * on the right, so we need void * on the left. And the solution is to take the address &cosine, which is otherwise a pointer-to-pointer-to-function, and cast it to a pointer-to-pointer-to-void, or void **, so that when we slap a * in front of it we've got a void * again, which is a proper destination to assign dlsym's return value to. So we end up with the line you were asking about:
* (void **) (&cosine) = dlsym(handle, "cos");
Now, it's important to note that we're on thin ice here. We've used the & and the cast to get around the fact that assigning a pointer-to-void to a `pointer-to-function isn't strictly legal. In the process we've successfully silenced the compiler's warning that what we're doing isn't strictly legal. (Indeed, silencing the warning was precisely the original programmer's intent in employing this dodge.)
The potential problem is, what if data pointers and function pointers have different sizes or representations? This code goes to some length to treat a function pointer, cosine, as if it were a data pointer, jamming the bits of a data pointer into it. If, say, a data pointer were somehow bigger than a function pointer, this would have terrible effects. (And, before you ask "But how could a data pointer ever be bigger than a function pointer?", that's exactly how they were, for example, in the "compact" memory model, back in the days of MS-DOS programming.)
Normally, playing games like this to break the rules and shut off compiler warnings is a bad idea. In the case of dlsym, though, it's fine, I would say perfectly acceptable. dlsym can't exist on a system where function pointers are different from data pointers, so if you're using dlsym at all, you must be on a machine where all pointers are the same, and this code will work.
It's also worth asking, if we have to play games with casts when calling dlsym, why take the extra trip around the barm with the pointer-to-pointer? Why not just say
cosine = (double (*)(double))dlsym(handle, "cos");
And the answer is, I don't know. I'm pretty sure this simpler cast will work just as well (again, as long as we're on a system where dlsym can exist at all). Perhaps there are compilers that warn about this case, that can only be tricked into silence by using the tricker, double-pointer trick.
See also Casting when using dlsym() .
This is nasty stuff. cosine is a pointer to a function that takes an argument of type double and returns double. &cosine is the address of that pointer. The cast says to pretend that that address is a pointer-to-pointer-to-void. The * in front of the cast is the usual dereference operator, so the result is to tell the compiler to pretend that the type of cosine is void*, so that the code can assign the return value from the call to dlsym to cosine. Phew; that hurts.
And, just for fun, a void* and a pointer-to-function are not at all related, which is why the code has to go through all that casting. The C++ language definition does not guarantee that this will work. I.e., the result is undefined behavior.
For C, a pointer to void can be converted to any pointer to an object without a cast. However, the C standard does not guarantee that void * can be converted to a pointer to a function - at all, since functions are not objects.
The dlsym is a POSIX function; and POSIX requires that as an extension, a pointer to a function must be convertable to void * and back again. However C++ wouldn't allow such a conversion without a cast.
In any case the *(void **) (&cosine) = dlsym(handle, "cos"); cast means that the pointer to the pointer to a function (double) returning double is cast as pointer to pointer to void, then dereferenced to get a lvalue of type void *, to which the return value of dlsym is assigned to. This is rather ugly, and should be better written as cosine = (double (*)(double))dlsym(handle, "cos") wherever a cast is required. Both are undefined behaviour when it comes to C, but the latter is not as much dark magic.
Related
From Programming Language Pragmatics, by Scott
For systems programming, or to facilitate the writing of
general-purpose con- tainer (collection) objects (lists, stacks,
queues, sets, etc.) that hold references to other objects, several
languages provide a universal reference type. In C and C++, this
type is called void *. In Clu it is called any; in Modula-2,
address; in Modula-3, refany; in Java, Object; in C#, object.
In C and C++, how does void * work as a universal reference type?
void * is always only a pointer type, while a universal reference type contains all values, both pointers and nonpointers. So I can't see how void * is a universal reference type.
Thanks.
A void* pointer will generally hold any pointer that is not a C++ pointer-to-member. It's rather inconvenient in practice, since you need to cast it to another pointer type before you can use it. You also need to convert it to the same pointer type that it was converted from to make the void*, otherwise you risk undefined behavior.
A good example would be the qsort function. It takes a void* pointer as a parameter, meaning it can point to an array of anything. The comparison function you pass to qsort must know how to cast two void* pointers back to the types of the array elements in order to compare them.
The crux of your confusion is that neither an instance of void * nor an instance of Modula-3's refany, nor an instance of any other language's "can refer to anything" type, contains the object that it refers to. A variable of type void * is always a pointer and a variable of type refany is always a reference. But the object that they refer to can be of any type.
A purist of programming-language theory would tell you that C does not have references at all, because pointers are not references. It has a nearly-universal pointer type, void *, which can point to an object of any type (including integers, aggregates, and other pointers). As a common but not ubiquitous extension, it can also point to any function (functions are not objects).
The purist would also tell you that C++ does not have a (nearly-)universal pointer type, because of its stricter type system, and doesn't have a universal reference type either.
They would also say that the book you are reading is being sloppy with its terminology, and they would caution you to not take any one such book for the gospel truth on terminological matters, or any other matters. You should instead read widely in both books and CS journals and conference proceedings (collectively known as "the literature") until you gain an "ear" for what is generally-agreed-on terminology, what is specific to a subdiscipline or a community of practice, and so on.
And finally they would remind you that C and C++ are two different languages, and anyone who speaks of them in the same breath is either glossing over the distinctions (which may or may not be relevant in context), decades out of date, or both.
Probably the reason is that you can take address of any variable of any type and cast it to void*.
It does by a silent contract that you know the actual type of object.
So you can store different kinds of elements in a container, but you need to somehow know what is what when taking elements back, to interpret them correctly.
The only convenience void* offers is that it's idiomatic for this, i.e. it's clear that dereferencing the pointer makes no sense, and void* is implicitly convertible to any pointer type. That is for c/
In c++ this is called type erasure techniques preferred. Or special types, like any (there is a boost version of this too.)
void* is no more just a pointer. Thus, it holds an address of an object (or an array and stuffs like that)
When your program is running, every variable should have it owns address in memory, right? And pointer is somethings point to that address.
In normal, each type of pointer should be the same type of object int b = 5; int* p = &b; for example. But that is the case you know what the type is, it means the specific type.
But sometimes, you just want to know that it stores somethings somewhere in memory and you know what "type" of that address, you can cast easily. For example, in OpenCV library which I am learning, there are a lot of functions where user can pass the arguments to instead of declaring global variables and most use in callback functions, like this:
void onChange(int v, void *ptr)
Here, the library does not care about what ptr point to, it just know that when you call the function, if you pass an address to like this onChange(5,&b) then you must cast ptr to the same type before dealing with it int b = static_cast<int*>(ptr);
Probably this explanation from Understanding pointers from Richard Reese will help
A pointer to void is a general-purpose pointer used to hold references to any data type.
It has two interesting properties:
A pointer to void will have the same representation and memory alignment as a pointer to char
A pointer to void will never be equal to another pointer. However, two void pointers assigned a NULL value will be equal.
Any pointer can be assigned to a pointer to void. It can then be cast back to its original pointer type. When this happens the value will be equal to the original pointer value.
This is illustrated in the following sequence, where a pointer to
int is assigned to a pointer to void and then back to a pointer to int
#include<stdio.h>
void main()
{
int num = 100;
int *pi = #
printf("value of pi is %p\n", pi);
void* pv = pi;
pi = (int*)pv;
printf("value of pi is %p\n", pi);
}
Pointers to void are used for data pointers, not function pointers
So in "the c++ programming language, 4th edition", there's a paragraph I don't understand about conversion of pointer-to-function types. Here is some of the code sample.
using P1 = int(*)(int*);
using P2 = void(*)(void);
void f(P1 pf) {
P2 pf2 = reinterpret_cast<P2>(pf);
pf2(); // likely serious problem
// other codes
}
When I run this it crashed.
I'm not sure if I am right, but I initially think the "likely serious problem" comment is when pf got casted to P2 in pf2, I think pf2 is not pointing to anything? Because when I created a function that matches P2's type and point pf2 to it, it didn't crash and runs normally.
After the code, I read this:
We need the nastiest of casts, reinterpret_cast, to do conversion of pointer-to-function types. The reason is that the result of using a pointer to function of the wrong type is so unpredictable and system-dependent. For example, in the example above, the called function may write to the object pointed to by its argument, but the call pf2() didn’t supply any argument!
Now I'm completely lost starting from "For example, in the example above" part:
"may write to the object pointed to by its argument" //what object is it exactly?
"but the call pf2() didn’t supply any argument!" //"using P2 = void(*)(void);" doesn't really need an arguement does it?
I think I'm missing something here. Can someone explain this?
For example, in the example above, the called function may write to the object pointed to by its argument (...)
pf is a pointer to a function like this:
int foo(int* intPtr)
{
// ...
}
So it could be implemented to write to its argument:
int foo(int* intPtr)
{
*intPtr = 42; // writing to the address given as argument
return 0;
}
(...) but the call pf2() didn’t supply any argument!
When you call foo through its cast to type P2, it will be called without arguments, so it is unclear what intPtr will be:
P2 pf2 = reinterpret_cast<P2>(pf);
pf2(); // no argument given here, although pf2 really is foo() and expects one!
Writing to it will most likely corrupt something.
Moreover, compilers usually implement calls to functions that return something by reserving space for the return value first, that will then be filled by the function call. When you call a P1 using the signature of P2, the call to P2 won't reserve space (as the return value is void) and the actual call will write an int somewhere it should not, which is another source for corruption.
Now I'm completely lost starting from "For example, in the example
above" part:
"may write to the object pointed to by its argument" //what object is
it exactly?
P1 is a function expecting a non-const pointer-to-int argument. That means it very well may write to the int referenced in its argument.
"but the call pf2() didn’t supply any argument!" //"using P2 =
void(*)(void);" doesn't really need an arguement does it?
When you call the function through another function pointer type passing no argument, the expectations of the called function aren't met. It may try to interpret whatever is on the stack as an int pointer and write to it, causing undefined behavior.
This does fail, but not necessarily in the way one might expect.
The implementation of a function pointer is left up to the compiler (undefined). Even the size of a function pointer can be bigger than a void*.
What is guaranteed about the size of a function pointer?
There is no guarentees about anything in the value of the function pointer. In fact, the only even guarentee that the comparison operators will work between function pointers of the same type.
Comparing function pointers
The standard does provide that function pointers can store the values of other function types.
Casting the function pointer to another type undefined behavior, meaning the compiler can do whatever it wants. Whether or not you supply the argument really doesn't matter, and how that would fail depends on the calling convention of the system. As far as your concerned, it could allow "demons to fly out of your nose".
Casting a function pointer to another type
So that brings us back to the statement by the author:
We need the nastiest of casts, reinterpret_cast, to do conversion of pointer-to-function types. The reason is that the result of using a pointer to function of the wrong type is so unpredictable and system-dependent. For example, in the example above, the called function may write to the object pointed to by its argument, but the call pf2() didn’t supply any argument!
That is trying to make the point that with no argument specified, if the function writes the output, it will write to some uninitialized state. Basically, if you look at the function as
int foo(int* arg) {*arg=10;}
if you didn't initialize arg, the author says you could be writing anywhere. But again, there is no guarentee that this even matters. The system could store functions with the footprint int (*)(int*) and void(*)(void) in completely different memory space, in which case instead of the above problem you'd have a jump into a random location in the program. Undefined behavior is just that: undefined.
Just don't do it man.
Specifically, can it point to int/float etc.?
What about objects like NSString and the like?
Any examples will be greatly appreciated.
void* is such a pointer, that any pointer can be implicitly converted to void*.
For example;
int* p = new int;
void* pv = p; //OK;
p = pv; //Error, the opposite conversion must be explicit in C++ (in C this is OK too)
Also note that pointers to const cannot be converted to void* without a const_cast
E.g.
const int * pc = new const int(4);
void * pv = pc; //Error
const void* pcv = pc; //OK
Hth.
In C any pointer can point to any address in memory, as the type information is in the pointer, not in the target. So an int* is just a pointer to some memory location, which is believed to be an integer. A void* pointer, is just a pointer to a memory location where the type is not defined (could be anything).
Thus any pointer can be cast to void*, but not the other way around, because casting (for example) a void pointer to an int pointer is adding information to it - by performing the cast, you are declaring that the target data is integer, so naturally you have to explicitly say this. The other way around, all you are doing is saying that the int pointer is some kind of pointer, which is fine.
It's probably the same in C++.
A void * can point at any data-like thing in memory, like an integer value, a struct, or whatever.
Do note, however, that you cannot freely convert between void * and function pointers. This is because on some architectures, code is not in the same address space as data, and thus it's possible that address 0x00000000 for code refers to a different set of bits than address 0x00000000 for data does.
It would be possible to implement the compiler so that void * is large enough to remember the difference, but in general I think this is not done, instead the language leaves it undefined.
On typical/mainstream computers, code and data reside in the same address space, and then the compilers typically generate sensible results if you do store a function pointer into a void *, since it can be quite useful.
Besides everything else that was already said by the other users, a void* it's commonly used in callback definitions. This allows your callback to receive user data of any type, including your own defined objects/structs, which should be casted to the appropriate type before using it:
void my_player_cb(int reason, void* data)
{
Player_t* player = (Player_t*)data;
if (reason == END_OF_FILE)
{
if (player->playing)
{
// execute stop(), release allocated resources and
// start() playing the next file on the list
}
}
}
void* can point to an address in memory but the syntax has no type-information. you can cast it to any pointer-type you want but it is your responsibility that that type matches the semantics of the data.
I would look this up, but honestly I wouldn't know where to start because I don't know what it is called. I've seen variables passed to functions like this:
myFunction((void**)&variable);
Which confuses the heck out of me cause all of those look familiar to me; I've just never seen them put together like that before.
What does it mean? I am a newb so the less jargon, the better, thanks!
void* is a "pointer to anything". void ** is another level of indirection - "pointer to pointer to anything". Basically, you pass that in when you want to allow the function to return a pointer of any type.
&variable takes the address of variable. variable should already be some kind of a pointer for that to work, but it's probably not void * - it might be, say int *, so taking its address would result in a int **. If the function takes void ** then you need to cast to that type.
(Of course, it needs to actually return an object of the right type, otherwise calling code will fail down the track when it tries to use it the wrong way.)
Take it apart piece by piece...
myFunction takes a pointer to a pointer of type void (which pretty much means it could point to anything). It might be declared something like this:
myFunction(void **something);
Anything you pass in has to have that type. So you take the address of a pointer, and cast it with (void**) to make it be a void pointer. (Basically stripping it of any idea about what it points to - which the compiler might whine about otherwise.)
This means that &variable is the address (& does this) of a pointer - so variable is a pointer. To what? Who knows!
Here is a more complete snippet, to give an idea of how this fits together:
#include <stdio.h>
int myInteger = 1;
int myOtherInt = 2;
int *myPointer = &myInteger;
myFunction(void **something){
*something = &myOtherInt;
}
main(){
printf("Address:%p Value:%d\n", myPointer, *myPointer);
myFunction((void**)&myPointer);
printf("Address:%p Value:%d\n", myPointer, *myPointer);
}
If you compile and run this, it should give this sort of output:
Address:0x601020 Value:1
Address:0x601024 Value:2
You can see that myFunction changed the value of myPointer - which it could only do because it was passed the address of the pointer.
It's a cast to a pointer to a void pointer.
You see this quite often with functions like CoCreateInstance() on Windows systems.
ISomeInterface* ifaceptr = 0;
HRESULT hr = ::CoCreateInstance(CLSID_SomeImplementation, NULL, CLSCTX_ALL,
IID_ISomeInterface, (void**)&ifaceptr);
if(SUCCEEDED(hr))
{
ifaceptr->DoSomething();
}
The cast converts the pointer to an ISomeInterface pointer into a pointer to a void pointer so that CoCreateInstance() can set ifaceptr to a valid value.
Since it is a pointer to a void pointer, the function can output pointers of any type, depending on the interface ID (such as IID_ISomeInterface).
It's a pointer to a pointer to a variable with an unspecified type. All pointers are the same size, so void* just means "a pointer to something but I have no idea what it is". A void** could also be a 2D array of unspecified type.
That casts &variable to a void** (that is, a pointer to a pointer to void).
For example, if you have something along the lines of
void myFunction(void** arg);
int* variable;
This passes the address of variable (that's what the unary-& does, it takes the address) to myFunction().
The variable is a pointer to something of undefined (void) type. The & operator returns the address of that variable, so you now have a pointer to a pointer of something. The pointer is therefore passed into the function by reference. The function might have a side effect which changes the memory referenced by that pointer. In other words, calling this function might change the something that the original pointer is referencing.
Embarrassing though it may be I know I am not the only one with this problem.
I have been using C/C++ on and off for many years. I never had a problem grasping the concepts of addresses, pointers, pointers to pointers, and references.
I do constantly find myself tripping over expressing them in C syntax, however. Not the basics like declarations or dereferencing, but more often things like getting the address of a pointer-to-pointer, or pointer to reference, etc. Essentially anything that goes a level or two of indirection beyond the norm. Typically I fumble with various semi-logical combinations of operators until I trip upon the correct one.
Clearly somewhere along the line I missed a rule or two that simplifies and makes it all fall into place. So I guess my question is: do you know of a site or reference that covers this matter with clarity and in some depth?
I don't know of any website but I'll try to explain it in very simple terms. There are only three things you need to understand:
variable will contain the contents of the variable. This means that if the variable is a pointer it will contain the memory address it points to.
*variable (only valid for pointers) will contain the contents of the variable pointed to. If the variable it points to is another pointer, ptr2, then *variable and ptr2 will be the same thing; **variable and *ptr2 are the same thing as well.
&variable will contain the memory address of the variable. If it's a pointer, it will be the memory address of the pointer itself and NOT the variable pointed to or the memory address of the variable pointed to.
Now, let's see a complex example:
void **list = (void **)*(void **)info.List;
list is a pointer to a pointer. Now let's examine the right part of the assignment starting from the end: (void **)info.List. This is also a pointer to a pointer.
Then, you see the *: *(void **)info.List. This means that this is the value the pointer info.List points to.
Now, the whole thing: (void **)*(void **)info.List. This is the value the pointer info.List points to casted to (void **).
I found the right-left-right rule to be useful. It tells you how to read a declaration so that you get all the pointers and references in order. For example:
int *foo();
Using the right-left-right rule, you can translate this to English as "foo is a function that returns a pointer to an integer".
int *(*foo)(); // "foo is a pointer to a function returning a pointer to an int"
int (*foo[])(); // "foo is an array of pointers to functions returning ints"
Most explanations of the right-left-right rule are written for C rather than C++, so they tend to leave out references. They work just like pointers in this context.
int &foo; // "foo is a reference to an integer"
Typedefs can be your friend when things get confusing. Here's an example:
typedef const char * literal_string_pointer;
typedef literal_string_pointer * pointer_to_literal_string_pointer;
void GetPointerToString(pointer_to_literal_string_pointer out_param)
{
*out_param = "hi there";
}
All you need to know is that getting the address of an object returns a pointer to that object, and dereferencing an object takes a pointer and turns it into to object that it's pointing to.
T x;
A a = &x; // A is T*
B b = *a; // B is T
C c = &a; // C is T**
D d = *c; // D is T*
Essentially, the & operator takes a T and gives you a T* and the * operator takes a T* and gives you a T, and that applies to higher levels of abstraction equally e.g.
using & on a T* will give you a T**.
Another way of thinking about it is that the & operator adds a * to the type and the * takes one away, which leads to things like &&*&**i == i.
I'm not sure exactly what you're looking for, but I find it helpful to remember the operator precedence and associativity rules. That said, if you're ever confused, you might as well throw in some more parens to disambiguate, even if it's just for your benefit and not the compiler's.
edit: I think I might understand your question a little better now. I like to think of a chain of pointers like a stack with the value at the bottom. The dereferencing operator (*) pops you down the stack, where you find the value itself at the end. The reference operator (&) lets you push another level of indirection onto the stack. Note that it's always legal to move another step away, but attempting to dereference the value is analogous to popping an empty stack.