Related
I am the beginner of learning C++. Today, I saw a pointer function like that
(*(int (*)())a)()
I was very confused with what the meaning of this and how I can understand it easily.
Let's add a typedef, to help make heads or tails out of it:
typedef int (*int_func_ptr)();
(*(int_func_ptr)a)();
So a is being cast to a function pointer of a particular prototype, dereferenced (which is redundant), and then called.
int (*)() is a function pointer type that returns int and accepts no parameters.
I presume that a is a function pointer whose type "erases" the actual type (perhaps so that one can store a bunch of different function pointers in a vector) that we need to cast to this pointer type, so (int(*)())a) will perform that casting.
Afterwards we want to call the function, so the provided code dereferences the pointer * and then calls it with parenthesis ()
Example
I have a function foo that looks like this:
int foo()
{
std::cout << "foo\n";
return 1;
}
And then via reinterpret_cast I get a pointer to a function that instead returns void (for type-erasure reasons):
void(*fptr)() = reinterpret_cast<void(*)()>(&::foo); //§5.2.10/6
Later, I want to call that function, so I need to re-cast it back to its original type, and then call it:
(*(int (*)())fptr)(); // prints `foo`
Demo
De-referencing it is actually unecessary and the following is equivalent:
((int (*)())fptr)();
The explanation for why they're equivalent boils down to "The standard says that both a function type and a function pointer type can be callable"
If you're standard savvy, you can check out §5.2.2[expr.call] that states
A function call is a postfix expression followed by parentheses containing a possibly empty, comma-separated list of initializer-clauses which constitute the arguments to the function. The postfix expression shall have function type or pointer to function type
StoryTeller and Andy have given correct answers. I'll give general rules, additionally.
StoryTeller makes a correct and useful typedef with typedef int (*int_func_ptr)();, which defines a function pointer type. Two things are to remember here.
The general language design for typedefs: it exactly mimics a declaration of an object of the given type! Simply prefixing a declaration with typedef makes the declared identifier a type alias instead of a variable. That is, if int i; declares an integer variable, typedefint i; declares i as a synonym for the type int. So declaring a function pointer variable would simply read int (*int_func_ptr)();. Prefixing this with the typedef, as StoryTeller did, makes it a type alias instead.
Casts of function pointers are notoriously confusing. One reason are the necessary parentheses:
Parentheses serve several unrelated purposes:
They group expressions to indicate subexpression precedence, as in (a+b) * c.
They delimit function arguments, both in declarations and in calls.
They delimit type names used in casts.
We have parentheses for all three purposes here!
The operator precedence is "unnatural" for function pointer declarations. This is so, of course, because they are natural for the much more frequent uses: Without parentheses, the declaration would be the more familiar looking int *int_func();, which declares a function proper which returns an int pointer. The reason is that the argument parentheses have higher priority than the dereferencing asterisk, so that in order to infer the type we mentally execute the call first, and not the dereferencing. And something that can be called is a function.1 The result of the call can be dereferenced, and that result is an int.
Compare that to the original int (*int_func_ptr)();: The additional parentheses force us to dereference first, so the identifier must be a pointer of some kind. The result of the dereferencing can be called, so it must be a function; the result of the call is an int.
Another reason why function pointer declarations or typedefs look unnatural is that the declared identifier tends to be at the center of the expression. The reason is that operators to the left and to the right of the identifier are applied (the dereferencing, the function call, and then there is finally the result type declaration all the way to the left).
The next rule is about constructing casts. The type names used in casts are constructed from corresponding variable declarations simply by omitting the variable name! This is obvious in the simple cases: since int i declares an int variable, (int) without the i is the corresponding cast.
If we apply that to the function pointer type, int (*int_func_ptr)() is transformed to the weird-looking (int (*)()) by omitting the variable name and putting the type name in parentheses as required for a cast. Note that the parentheses which force precedence of the asterisk are still there, even though there is nothing to dereference! Without them, (int *()) would be derived from int *int_func() and therefore denote a function which returns a pointer.2
It is perhaps surprising that there is exactly one place in a declaration where a variable name can syntactically be, so that even very complicated type expressions in casts are well-defined: It is this one place where a variable name fits which defines the cast type.
With these rules, let's re-examine the original expression:
(*(int (*)())a)()
On the outermost level we have two pairs of parentheses. The second pair is empty and thus must be a function call operator. That implies that the operand to the left has function type:
*(int (*)())a
The operand is an expression in parentheses for precedence. It has three parts: The asterisk, an expression in parentheses, and an identifier a. Since there is no operator between the parenthesized expression and the variable a, it must be a cast, actually the one we scrutinized above. * and the type cast have the same precedence and are evaluated right-to-left: first a is cast to a function pointer, and then * dereferences that in order to obtain the "function proper" (which are no real objects in C++). This fits because the function call operator from above will be applied to this result.
1 That C permits calling function pointers directly as well, without dereferencing first, is syntactic sugar and not considered in declarations.
2 While the expression is syntactically valid, such a cast to function is not allowed in C or C++.
So in "the c++ programming language, 4th edition", there's a paragraph I don't understand about conversion of pointer-to-function types. Here is some of the code sample.
using P1 = int(*)(int*);
using P2 = void(*)(void);
void f(P1 pf) {
P2 pf2 = reinterpret_cast<P2>(pf);
pf2(); // likely serious problem
// other codes
}
When I run this it crashed.
I'm not sure if I am right, but I initially think the "likely serious problem" comment is when pf got casted to P2 in pf2, I think pf2 is not pointing to anything? Because when I created a function that matches P2's type and point pf2 to it, it didn't crash and runs normally.
After the code, I read this:
We need the nastiest of casts, reinterpret_cast, to do conversion of pointer-to-function types. The reason is that the result of using a pointer to function of the wrong type is so unpredictable and system-dependent. For example, in the example above, the called function may write to the object pointed to by its argument, but the call pf2() didn’t supply any argument!
Now I'm completely lost starting from "For example, in the example above" part:
"may write to the object pointed to by its argument" //what object is it exactly?
"but the call pf2() didn’t supply any argument!" //"using P2 = void(*)(void);" doesn't really need an arguement does it?
I think I'm missing something here. Can someone explain this?
For example, in the example above, the called function may write to the object pointed to by its argument (...)
pf is a pointer to a function like this:
int foo(int* intPtr)
{
// ...
}
So it could be implemented to write to its argument:
int foo(int* intPtr)
{
*intPtr = 42; // writing to the address given as argument
return 0;
}
(...) but the call pf2() didn’t supply any argument!
When you call foo through its cast to type P2, it will be called without arguments, so it is unclear what intPtr will be:
P2 pf2 = reinterpret_cast<P2>(pf);
pf2(); // no argument given here, although pf2 really is foo() and expects one!
Writing to it will most likely corrupt something.
Moreover, compilers usually implement calls to functions that return something by reserving space for the return value first, that will then be filled by the function call. When you call a P1 using the signature of P2, the call to P2 won't reserve space (as the return value is void) and the actual call will write an int somewhere it should not, which is another source for corruption.
Now I'm completely lost starting from "For example, in the example
above" part:
"may write to the object pointed to by its argument" //what object is
it exactly?
P1 is a function expecting a non-const pointer-to-int argument. That means it very well may write to the int referenced in its argument.
"but the call pf2() didn’t supply any argument!" //"using P2 =
void(*)(void);" doesn't really need an arguement does it?
When you call the function through another function pointer type passing no argument, the expectations of the called function aren't met. It may try to interpret whatever is on the stack as an int pointer and write to it, causing undefined behavior.
This does fail, but not necessarily in the way one might expect.
The implementation of a function pointer is left up to the compiler (undefined). Even the size of a function pointer can be bigger than a void*.
What is guaranteed about the size of a function pointer?
There is no guarentees about anything in the value of the function pointer. In fact, the only even guarentee that the comparison operators will work between function pointers of the same type.
Comparing function pointers
The standard does provide that function pointers can store the values of other function types.
Casting the function pointer to another type undefined behavior, meaning the compiler can do whatever it wants. Whether or not you supply the argument really doesn't matter, and how that would fail depends on the calling convention of the system. As far as your concerned, it could allow "demons to fly out of your nose".
Casting a function pointer to another type
So that brings us back to the statement by the author:
We need the nastiest of casts, reinterpret_cast, to do conversion of pointer-to-function types. The reason is that the result of using a pointer to function of the wrong type is so unpredictable and system-dependent. For example, in the example above, the called function may write to the object pointed to by its argument, but the call pf2() didn’t supply any argument!
That is trying to make the point that with no argument specified, if the function writes the output, it will write to some uninitialized state. Basically, if you look at the function as
int foo(int* arg) {*arg=10;}
if you didn't initialize arg, the author says you could be writing anywhere. But again, there is no guarentee that this even matters. The system could store functions with the footprint int (*)(int*) and void(*)(void) in completely different memory space, in which case instead of the above problem you'd have a jump into a random location in the program. Undefined behavior is just that: undefined.
Just don't do it man.
I'm not sure if this is a proper programming question, but it's something that has always bothered me, and I wonder if I'm the only one.
When initially learning C++, I understood the concept of references, but pointers had me confused. Why, you ask? Because of how you declare a pointer.
Consider the following:
void foo(int* bar)
{
}
int main()
{
int x = 5;
int* y = NULL;
y = &x;
*y = 15;
foo(y);
}
The function foo(int*) takes an int pointer as parameter. Since I've declared y as int pointer, I can pass y to foo, but when first learning C++ I associated the * symbol with dereferencing, as such I figured a dereferenced int needed to be passed. I would try to pass *y into foo, which obviously doesn't work.
Wouldn't it have been easier to have a separate operator for declaring a pointer? (or for dereferencing). For example:
void test(int# x)
{
}
In The Development of the C Language, Dennis Ritchie explains his reasoning thusly:
The second innovation that most clearly distinguishes C from its
predecessors is this fuller type structure and especially its
expression in the syntax of declarations... given an object of any
type, it should be possible to describe a new object that gathers
several into an array, yields it from a function, or is a pointer to
it.... [This] led to a
declaration syntax for names mirroring that of the expression syntax
in which the names typically appear. Thus,
int i, *pi, **ppi; declare an integer, a pointer to an integer, a
pointer to a pointer to an integer. The syntax of these declarations
reflects the observation that i, *pi, and **ppi all yield an int type
when used in an expression.
Similarly, int f(), *f(), (*f)(); declare
a function returning an integer, a function returning a pointer to an
integer, a pointer to a function returning an integer. int *api[10],
(*pai)[10]; declare an array of pointers to integers, and a pointer to
an array of integers.
In all these cases the declaration of a
variable resembles its usage in an expression whose type is the one
named at the head of the declaration.
An accident of syntax contributed to the perceived complexity of the
language. The indirection operator, spelled * in C, is syntactically a
unary prefix operator, just as in BCPL and B. This works well in
simple expressions, but in more complex cases, parentheses are
required to direct the parsing. For example, to distinguish
indirection through the value returned by a function from calling a
function designated by a pointer, one writes *fp() and (*pf)()
respectively. The style used in expressions carries through to
declarations, so the names might be declared
int *fp(); int (*pf)();
In more ornate but still realistic cases,
things become worse: int *(*pfp)(); is a pointer to a function
returning a pointer to an integer.
There are two effects occurring.
Most important, C has a relatively rich set of ways of describing
types (compared, say, with Pascal). Declarations in languages as
expressive as C—Algol 68, for example—describe objects equally hard to
understand, simply because the objects themselves are complex. A
second effect owes to details of the syntax. Declarations in C must be
read in an `inside-out' style that many find difficult to grasp.
Sethi [Sethi 81] observed that many of the nested
declarations and expressions would become simpler if the indirection
operator had been taken as a postfix operator instead of prefix, but
by then it was too late to change.
The reason is clearer if you write it like this:
int x, *y;
That is, both x and *y are ints. Thus y is an int *.
That is a language decision that predates C++, as C++ inherited it from C. I once heard that the motivation was that the declaration and the use would be equivalent, that is, given a declaration int *p; the expression *p is of type int in the same way that with int i; the expression i is of type int.
Because the committee, and those that developed C++ in the decades before its standardisation, decided that * should retain its original three meanings:
A pointer type
The dereference operator
Multiplication
You're right to suggest that the multiple meanings of * (and, similarly, &) are confusing. I've been of the opinion for some years that it they are a significant barrier to understanding for language newcomers.
Why not choose another symbol for C++?
Backwards-compatibility is the root cause... best to re-use existing symbols in a new context than to break C programs by translating previously-not-operators into new meanings.
Why not choose another symbol for C?
It's impossible to know for sure, but there are several arguments that can be — and have been — made. Foremost is the idea that:
when [an] identifier appears in an expression of the same form as the declarator, it yields an object of the specified type. {K&R, p216}
This is also why C programmers tend to[citation needed] prefer aligning their asterisks to the right rather than to the left, i.e.:
int *ptr1; // roughly C-style
int* ptr2; // roughly C++-style
though both varieties are found in programs of both languages, varyingly.
Page 65 of Expert C Programming: Deep C Secrets includes the following: And then, there is the C philosophy that the declaration of an object should look like its use.
Page 216 of The C Programming Language, 2nd edition (aka K&R) includes: A declarator is read as an assertion that when its identifier appears in an expression of the same form as the declarator, it yields an object of the specified type.
I prefer the way van der Linden puts it.
Haha, I feel your pain, I had the exact same problem.
I thought a pointer should be declared as &int because it makes sense that a pointer is an address of something.
After a while I thought for myself, every type in C has to be read backwards, like
int * const a
is for me
a constant something, when dereferenced equals an int.
Something that has to be dereferenced, has to be a pointer.
Let's say I have a function that accepts a void (*)(void*) function pointer for use as a callback:
void do_stuff(void (*callback_fp)(void*), void* callback_arg);
Now, if I have a function like this:
void my_callback_function(struct my_struct* arg);
Can I do this safely?
do_stuff((void (*)(void*)) &my_callback_function, NULL);
I've looked at this question and I've looked at some C standards which say you can cast to 'compatible function pointers', but I cannot find a definition of what 'compatible function pointer' means.
As far as the C standard is concerned, if you cast a function pointer to a function pointer of a different type and then call that, it is undefined behavior. See Annex J.2 (informative):
The behavior is undefined in the following circumstances:
A pointer is used to call a function whose type is not compatible with the pointed-to
type (6.3.2.3).
Section 6.3.2.3, paragraph 8 reads:
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.
So in other words, you can cast a function pointer to a different function pointer type, cast it back again, and call it, and things will work.
The definition of compatible is somewhat complicated. It can be found in section 6.7.5.3, paragraph 15:
For two function types to be compatible, both shall specify compatible return types127.
Moreover, the parameter type lists, if both are present, shall agree in the number of
parameters and in use of the ellipsis terminator; corresponding parameters shall have
compatible types. If one type has a parameter type list and the other type is specified by a
function declarator that is not part of a function definition and that contains an empty
identifier list, the parameter list shall not have an ellipsis terminator and the type of each
parameter shall be compatible with the type that results from the application of the
default argument promotions. If one type has a parameter type list and the other type is
specified by a function definition that contains a (possibly empty) identifier list, both shall
agree in the number of parameters, and the type of each prototype parameter shall be
compatible with the type that results from the application of the default argument
promotions to the type of the corresponding identifier. (In the determination of type
compatibility and of a composite type, each parameter declared with function or array
type is taken as having the adjusted type and each parameter declared with qualified type
is taken as having the unqualified version of its declared type.)
127) If both function types are ‘‘old style’’, parameter types are not compared.
The rules for determining whether two types are compatible are described in section 6.2.7, and I won't quote them here since they're rather lengthy, but you can read them on the draft of the C99 standard (PDF).
The relevant rule here is in section 6.7.5.1, paragraph 2:
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
Hence, since a void* is not compatible with a struct my_struct*, a function pointer of type void (*)(void*) is not compatible with a function pointer of type void (*)(struct my_struct*), so this casting of function pointers is technically undefined behavior.
In practice, though, you can safely get away with casting function pointers in some cases. In the x86 calling convention, arguments are pushed on the stack, and all pointers are the same size (4 bytes in x86 or 8 bytes in x86_64). Calling a function pointer boils down to pushing the arguments on the stack and doing an indirect jump to the function pointer target, and there's obviously no notion of types at the machine code level.
Things you definitely can't do:
Cast between function pointers of different calling conventions. You will mess up the stack and at best, crash, at worst, succeed silently with a huge gaping security hole. In Windows programming, you often pass function pointers around. Win32 expects all callback functions to use the stdcall calling convention (which the macros CALLBACK, PASCAL, and WINAPI all expand to). If you pass a function pointer that uses the standard C calling convention (cdecl), badness will result.
In C++, cast between class member function pointers and regular function pointers. This often trips up C++ newbies. Class member functions have a hidden this parameter, and if you cast a member function to a regular function, there's no this object to use, and again, much badness will result.
Another bad idea that might sometimes work but is also undefined behavior:
Casting between function pointers and regular pointers (e.g. casting a void (*)(void) to a void*). Function pointers aren't necessarily the same size as regular pointers, since on some architectures they might contain extra contextual information. This will probably work ok on x86, but remember that it's undefined behavior.
I asked about this exact same issue regarding some code in GLib recently. (GLib is a core library for the GNOME project and written in C.) I was told the entire slots'n'signals framework depends upon it.
Throughout the code, there are numerous instances of casting from type (1) to (2):
typedef int (*CompareFunc) (const void *a,
const void *b)
typedef int (*CompareDataFunc) (const void *b,
const void *b,
void *user_data)
It is common to chain-thru with calls like this:
int stuff_equal (GStuff *a,
GStuff *b,
CompareFunc compare_func)
{
return stuff_equal_with_data(a, b, (CompareDataFunc) compare_func, NULL);
}
int stuff_equal_with_data (GStuff *a,
GStuff *b,
CompareDataFunc compare_func,
void *user_data)
{
int result;
/* do some work here */
result = compare_func (data1, data2, user_data);
return result;
}
See for yourself here in g_array_sort(): http://git.gnome.org/browse/glib/tree/glib/garray.c
The answers above are detailed and likely correct -- if you sit on the standards committee. Adam and Johannes deserve credit for their well-researched responses. However, out in the wild, you will find this code works just fine. Controversial? Yes. Consider this: GLib compiles/works/tests on a large number of platforms (Linux/Solaris/Windows/OS X) with a wide variety of compilers/linkers/kernel loaders (GCC/CLang/MSVC). Standards be damned, I guess.
I spent some time thinking about these answers. Here is my conclusion:
If you are writing a callback library, this might be OK. Caveat emptor -- use at your own risk.
Else, don't do it.
Thinking deeper after writing this response, I would not be surprised if the code for C compilers uses this same trick. And since (most/all?) modern C compilers are bootstrapped, this would imply the trick is safe.
A more important question to research: Can someone find a platform/compiler/linker/loader where this trick does not work? Major brownie points for that one. I bet there are some embedded processors/systems that don't like it. However, for desktop computing (and probably mobile/tablet), this trick probably still works.
The point really isn't whether you can. The trivial solution is
void my_callback_function(struct my_struct* arg);
void my_callback_helper(void* pv)
{
my_callback_function((struct my_struct*)pv);
}
do_stuff(&my_callback_helper);
A good compiler will only generate code for my_callback_helper if it's really needed, in which case you'd be glad it did.
You have a compatible function type if the return type and parameter types are compatible - basically (it's more complicated in reality :)). Compatibility is the same as "same type" just more lax to allow to have different types but still have some form of saying "these types are almost the same". In C89, for example, two structs were compatible if they were otherwise identical but just their name was different. C99 seem to have changed that. Quoting from the c rationale document (highly recommended reading, btw!):
Structure, union, or enumeration type declarations in two different translation units do not formally declare the same type, even if the text of these declarations come from the same include file, since the translation units are themselves disjoint. The Standard thus specifies additional compatibility rules for such types, so that if two such declarations are sufficiently similar they are compatible.
That said - yeah strictly this is undefined behavior, because your do_stuff function or someone else will call your function with a function pointer having void* as parameter, but your function has an incompatible parameter. But nevertheless, i expect all compilers to compile and run it without moaning. But you can do cleaner by having another function taking a void* (and registering that as callback function) which will just call your actual function then.
As C code compiles to instruction which do not care at all about pointer types, it's quite fine to use the code you mention. You'd run into problems when you'd run do_stuff with your callback function and pointer to something else then my_struct structure as argument.
I hope I can make it clearer by showing what would not work:
int my_number = 14;
do_stuff((void (*)(void*)) &my_callback_function, &my_number);
// my_callback_function will try to access int as struct my_struct
// and go nuts
or...
void another_callback_function(struct my_struct* arg, int arg2) { something }
do_stuff((void (*)(void*)) &another_callback_function, NULL);
// another_callback_function will look for non-existing second argument
// on the stack and go nuts
Basically, you can cast pointers to whatever you like, as long as the data continue to make sense at run-time.
Well, unless I understood the question wrong, you can just cast a function pointer this way.
void print_data(void *data)
{
// ...
}
((void (*)(char *)) &print_data)("hello");
A cleaner way would be to create a function typedef.
typedef void(*t_print_str)(char *);
((t_print_str) &print_data)("hello");
If you think about the way function calls work in C/C++, they push certain items on the stack, jump to the new code location, execute, then pop the stack on return. If your function pointers describe functions with the same return type and the same number/size of arguments, you should be okay.
Thus, I think you should be able to do so safely.
Void pointers are compatible with other types of pointer. It's the backbone of how malloc and the mem functions (memcpy, memcmp) work. Typically, in C (Rather than C++) NULL is a macro defined as ((void *)0).
Look at 6.3.2.3 (Item 1) in C99:
A pointer to void may be converted to or from a pointer to any incomplete or object type
Is there some kind of subtle difference between those:
void a1(float &b) {
b=1;
};
a1(b);
and
void a1(float *b) {
(*b)=1;
};
a1(&b);
?
They both do the same (or so it seems from main() ), but the first one is obviously shorter, however most of the code I see uses second notation. Is there a difference? Maybe in case it's some object instead of float?
Both do the same, but one uses references and one uses pointers.
See my answer here for a comprehensive list of all the differences.
Yes. The * notation says that what's being pass on the stack is a pointer, ie, address of something. The & says it's a reference. The effect is similar but not identical:
Let's take two cases:
void examP(int* ip);
void examR(int& i);
int i;
If I call examP, I write
examP(&i);
which takes the address of the item and passes it on the stack. If I call examR,
examR(i);
I don't need it; now the compiler "somehow" passes a reference -- which practically means it gets and passes the address of i. On the code side, then
void examP(int* ip){
*ip += 1;
}
I have to make sure to dereference the pointer. ip += 1 does something very different.
void examR(int& i){
i += 1;
}
always updates the value of i.
For more to think about, read up on "call by reference" versus "call by value". The & notion gives C++ call by reference.
In the first example with references, you know that b can't be NULL. With the pointer example, b might be the NULL pointer.
However, note that it is possible to pass a NULL object through a reference, but it's awkward and the called procedure can assume it's an error to have done so:
a1(*(float *)NULL);
In the second example the caller has to prefix the variable name with '&' to pass the address of the variable.
This may be an advantage - the caller cannot inadvertently modify a variable by passing it as a reference when they thought they were passing by value.
Aside from syntactic sugar, the only real difference is the ability for a function parameter that is a pointer to be null. So the pointer version can be more expressive if it handles the null case properly. The null case can also have some special meaning attached to it. The reference version can only operate on values of the type specified without a null capability.
Functionally in your example, both versions do the same.
The first has the advantage that it's transparent on the call-side. Imagine how it would look for an operator:
cin >> &x;
And how it looks ugly for a swap invocation
swap(&a, &b);
You want to swap a and b. And it looks much better than when you first have to take the address. Incidentally, bjarne stroustrup writes that the major reason for references was the transparency that was added at the call side - especially for operators. Also see how it's not obvious anymore whether the following
&a + 10
Would add 10 to the content of a, calling the operator+ of it, or whether it adds 10 to a temporary pointer to a. Add that to the impossibility that you cannot overload operators for only builtin operands (like a pointer and an integer). References make this crystal clear.
Pointers are useful if you want to be able to put a "null":
a1(0);
Then in a1 the method can compare the pointer with 0 and see whether the pointer points to any object.
One big difference worth noting is what's going on outside, you either have:
a1(something);
or:
a1(&something);
I like to pass arguments by reference (always a const one :) ) when they are not modified in the function/method (and then you can also pass automatic/temporary objects inside) and pass them by pointer to signify and alert the user/reader of the code calling the method that the argument may and probably is intentionally modified inside.