I am the beginner of learning C++. Today, I saw a pointer function like that
(*(int (*)())a)()
I was very confused with what the meaning of this and how I can understand it easily.
Let's add a typedef, to help make heads or tails out of it:
typedef int (*int_func_ptr)();
(*(int_func_ptr)a)();
So a is being cast to a function pointer of a particular prototype, dereferenced (which is redundant), and then called.
int (*)() is a function pointer type that returns int and accepts no parameters.
I presume that a is a function pointer whose type "erases" the actual type (perhaps so that one can store a bunch of different function pointers in a vector) that we need to cast to this pointer type, so (int(*)())a) will perform that casting.
Afterwards we want to call the function, so the provided code dereferences the pointer * and then calls it with parenthesis ()
Example
I have a function foo that looks like this:
int foo()
{
std::cout << "foo\n";
return 1;
}
And then via reinterpret_cast I get a pointer to a function that instead returns void (for type-erasure reasons):
void(*fptr)() = reinterpret_cast<void(*)()>(&::foo); //§5.2.10/6
Later, I want to call that function, so I need to re-cast it back to its original type, and then call it:
(*(int (*)())fptr)(); // prints `foo`
Demo
De-referencing it is actually unecessary and the following is equivalent:
((int (*)())fptr)();
The explanation for why they're equivalent boils down to "The standard says that both a function type and a function pointer type can be callable"
If you're standard savvy, you can check out §5.2.2[expr.call] that states
A function call is a postfix expression followed by parentheses containing a possibly empty, comma-separated list of initializer-clauses which constitute the arguments to the function. The postfix expression shall have function type or pointer to function type
StoryTeller and Andy have given correct answers. I'll give general rules, additionally.
StoryTeller makes a correct and useful typedef with typedef int (*int_func_ptr)();, which defines a function pointer type. Two things are to remember here.
The general language design for typedefs: it exactly mimics a declaration of an object of the given type! Simply prefixing a declaration with typedef makes the declared identifier a type alias instead of a variable. That is, if int i; declares an integer variable, typedefint i; declares i as a synonym for the type int. So declaring a function pointer variable would simply read int (*int_func_ptr)();. Prefixing this with the typedef, as StoryTeller did, makes it a type alias instead.
Casts of function pointers are notoriously confusing. One reason are the necessary parentheses:
Parentheses serve several unrelated purposes:
They group expressions to indicate subexpression precedence, as in (a+b) * c.
They delimit function arguments, both in declarations and in calls.
They delimit type names used in casts.
We have parentheses for all three purposes here!
The operator precedence is "unnatural" for function pointer declarations. This is so, of course, because they are natural for the much more frequent uses: Without parentheses, the declaration would be the more familiar looking int *int_func();, which declares a function proper which returns an int pointer. The reason is that the argument parentheses have higher priority than the dereferencing asterisk, so that in order to infer the type we mentally execute the call first, and not the dereferencing. And something that can be called is a function.1 The result of the call can be dereferenced, and that result is an int.
Compare that to the original int (*int_func_ptr)();: The additional parentheses force us to dereference first, so the identifier must be a pointer of some kind. The result of the dereferencing can be called, so it must be a function; the result of the call is an int.
Another reason why function pointer declarations or typedefs look unnatural is that the declared identifier tends to be at the center of the expression. The reason is that operators to the left and to the right of the identifier are applied (the dereferencing, the function call, and then there is finally the result type declaration all the way to the left).
The next rule is about constructing casts. The type names used in casts are constructed from corresponding variable declarations simply by omitting the variable name! This is obvious in the simple cases: since int i declares an int variable, (int) without the i is the corresponding cast.
If we apply that to the function pointer type, int (*int_func_ptr)() is transformed to the weird-looking (int (*)()) by omitting the variable name and putting the type name in parentheses as required for a cast. Note that the parentheses which force precedence of the asterisk are still there, even though there is nothing to dereference! Without them, (int *()) would be derived from int *int_func() and therefore denote a function which returns a pointer.2
It is perhaps surprising that there is exactly one place in a declaration where a variable name can syntactically be, so that even very complicated type expressions in casts are well-defined: It is this one place where a variable name fits which defines the cast type.
With these rules, let's re-examine the original expression:
(*(int (*)())a)()
On the outermost level we have two pairs of parentheses. The second pair is empty and thus must be a function call operator. That implies that the operand to the left has function type:
*(int (*)())a
The operand is an expression in parentheses for precedence. It has three parts: The asterisk, an expression in parentheses, and an identifier a. Since there is no operator between the parenthesized expression and the variable a, it must be a cast, actually the one we scrutinized above. * and the type cast have the same precedence and are evaluated right-to-left: first a is cast to a function pointer, and then * dereferences that in order to obtain the "function proper" (which are no real objects in C++). This fits because the function call operator from above will be applied to this result.
1 That C permits calling function pointers directly as well, without dereferencing first, is syntactic sugar and not considered in declarations.
2 While the expression is syntactically valid, such a cast to function is not allowed in C or C++.
Related
There's the well known spiral and right-left rules etc. for reading complicated C++ types such as
int (*(*foo)(char *,double))[9][20];
Foo is a pointer (change direction, move out of parentheses)
to a function taking a pointer to char and a double, and returning (change direction again)
a pointer to (parentheses again, change direction)
a 2d-array of dimensions 9,20 of (reached right end, spiral outside to the left)
integers.
But how do I deal with types like this if there is no identifier, such as when defining a function parameter's type:
void foo(int *(*(* )(int(* )(int (* )(int))))())
^ ^ ^
identifiers omitted
How do I identify the innermost element in an intuitive way?
By the way, even the cdecl tool gives a syntax error on this last example, but it does compile.
Surely, the compiler has a well-defined way of parsing gibberish like this. How does it know where to start?
The rules for reading complex types assume that the type has already been parsed (and you know where the "innermost" point is). The rules for parsing work from the outside in, the same as reading complex expressions in math class. When you hit parentheses, give it a name and come back to it (unless it's simple enough to handle on its own). Disclaimer: I used a text editor to locate matching parentheses. ;)
The other consideration that comes into play with this declaration is that once a type is a function, the mess in the parameter list is a separate parse. For example, when parsing void (*)(big old mess), you have a pointer to a function. The big old mess is needed for the function's signature but not for understanding that you are dealing with a function.
Moving on to the example at hand:
void foo(int *(*(* )(int(* )(int (* )(int))))())
After reading void and foo, you hit parentheses with a complex mess inside. Give that mess a name.
void foo( A )
where A is int *(*(* )(int(* )(int (* )(int))))(). So your outermost parse is a unary function returning void, and we still need to parse the parameter, A. Note that we already know what the overall purpose of this text is: it declares a function named foo. The remaining types have no name because names for parameters are optional.
A: int *( B )()
where B is *(* )(int(* )(int (* )(int))). So the parameter to foo is something whose outermost type is a nullary function that returns a pointer to int. Presumably we will discover that the "something" is a pointer, but we still need to parse B to confirm that. (OK, skip ahead a little and see that B starts with an asterisk. It's a pointer to this nullary function.)
B: *(* )( C )
where C is int(* )(int (* )(int)). This is a pointer to a unary function whose parameter is some complex type, and whose return value is a pointer to what we parsed previously (the nullary function). As with the initial parse, we have discovered another place to start reading, as the mess has been pushed into the parameter list. The parameter to foo is a pointer to a unary function whose parameter is some complex type, and whose return value is a pointer to a nullary function that returns a pointer to int.
C: int(* )( D )
where D is int (* )(int). Again, the mess has moved into the parameter list. The thing at this level is a pointer to a unary function that returns an int.
D: int (* )(int)
Finally, simplicity: a pointer to a unary function that takes an int and returns an int.
So....
This declares foo to be a unary function returning void whose parameter is a pointer to a unary function, returning a pointer to a nullary function returning a pointer to int, whose parameter is a pointer to a unary function returning int whose parameter is a pointer to a unary function returning int whose parameter is int.
The English version is about as comprehensible as the code, no? :) Let's try something more structured.
foo
Returns: void
Parameter: pointer to a function
Returns: pointer to a function
Returns: pointer to int
Parameter: pointer to a function
Returns: int
Parameter: pointer to a function
Returns: int
Parameter: int
Whew. Don't do this at home, kids. Give your intermediate types names and save the programmers some grief.
It's interesting that using the function name as a function pointer is equivalent to applying the address-of operator to the function name!
Here's the example.
typedef bool (*FunType)(int);
bool f(int);
int main() {
FunType a = f;
FunType b = &a; // Sure, here's an error.
FunType c = &f; // This is not an error, though.
// It's equivalent to the statement without "&".
// So we have c equals a.
return 0;
}
Using the name is something we already know in array. But you can't write something like
int a[2];
int * b = &a; // Error!
It seems not consistent with other parts of the language. What's the rationale of this design?
This question explains the semantics of such behavior and why it works. But I'm interested in why the language was designed this way.
What's more interesting is the function type can be implicitly converted to pointer to itself when using as a parameter, but will not be converted to a pointer to itself when using as a return type!
Example:
typedef bool FunctionType(int);
void g(FunctionType); // Implicitly converted to void g(FunctionType *).
FunctionType h(); // Error!
FunctionType * j(); // Return a function pointer to a function
// that has the type of bool(int).
Since you specifically ask for the rationale of this behavior, here's the closest thing I can find (from the ANSI C90 Rationale document - http://www.lysator.liu.se/c/rat/c3.html#3-3-2-2):
3.3.2.2 Function calls
Pointers to functions may be used either as (*pf)() or as pf().
The latter construct, not sanctioned in the Base Document, appears in
some present versions of C, is unambiguous, invalidates no old code,
and can be an important shorthand. The shorthand is useful for
packages that present only one external name, which designates a
structure full of pointers to object s and functions : member
functions can be called as graphics.open(file) instead of
(*graphics.open)(file). The treatment of function designators can
lead to some curious , but valid , syntactic forms . Given the
declarations :
int f ( ) , ( *pf ) ( ) ;
then all of the following expressions are valid function calls :
( &f)(); f(); (*f)(); (**f)(); (***f)();
pf(); (*pf)(); (**pf)(); (***pf)();
The first expression on each line was discussed in the previous
paragraph . The second is conventional usage . All subsequent
expressions take advantage of the implicit conversion of a function
designator to a pointer value , in nearly all expression contexts .
The Committee saw no real harm in allowing these forms ; outlawing
forms like (*f)(), while still permitting *a (for int a[]),
simply seemed more trouble than it was worth .
Basically, the equivalence between function designators and function pointers was added to make using function pointers a little more convenient.
It's a feature inherited from C.
In C, it's allowed primarily because there's not much else the name of a function, by itself, could mean. All you can do with an actual function is call it. If you're not calling it, the only thing you can do is take the address. Since there's no ambiguity, any time a function name isn't followed by a ( to signify a call to the function, the name evaluates to the address of the function.
That actually is somewhat similar to one other part of the language -- the name of an array evaluates to the address of the first element of the array except in some fairly limited circumstances (being used as the operand of & or sizeof).
Since C allowed it, C++ does as well, mostly because the same remains true: the only things you can do with a function are call it or take its address, so if the name isn't followed by a ( to signify a function call, then the name evaluates to the address with no ambiguity.
For arrays, there is no pointer decay when the address-of operator is used:
int a[2];
int * p1 = a; // No address-of operator, so type is int*
int (*p2)[2] = &a; // Address-of operator used, so type is int (*)[2]
This makes sense because arrays and pointers are different types, and it is possible for example to return references to arrays or pass references to arrays in functions.
However, with functions, what other type could be possible?
void foo(){}
&foo; // #1
foo; // #2
Let's imagine that only #2 gives the type void(*)(), what would the type of &foo be? There is no other possibility.
It's interesting that using the function name as a function pointer is equivalent to applying the address-of operator to the function name!
Here's the example.
typedef bool (*FunType)(int);
bool f(int);
int main() {
FunType a = f;
FunType b = &a; // Sure, here's an error.
FunType c = &f; // This is not an error, though.
// It's equivalent to the statement without "&".
// So we have c equals a.
return 0;
}
Using the name is something we already know in array. But you can't write something like
int a[2];
int * b = &a; // Error!
It seems not consistent with other parts of the language. What's the rationale of this design?
This question explains the semantics of such behavior and why it works. But I'm interested in why the language was designed this way.
What's more interesting is the function type can be implicitly converted to pointer to itself when using as a parameter, but will not be converted to a pointer to itself when using as a return type!
Example:
typedef bool FunctionType(int);
void g(FunctionType); // Implicitly converted to void g(FunctionType *).
FunctionType h(); // Error!
FunctionType * j(); // Return a function pointer to a function
// that has the type of bool(int).
Since you specifically ask for the rationale of this behavior, here's the closest thing I can find (from the ANSI C90 Rationale document - http://www.lysator.liu.se/c/rat/c3.html#3-3-2-2):
3.3.2.2 Function calls
Pointers to functions may be used either as (*pf)() or as pf().
The latter construct, not sanctioned in the Base Document, appears in
some present versions of C, is unambiguous, invalidates no old code,
and can be an important shorthand. The shorthand is useful for
packages that present only one external name, which designates a
structure full of pointers to object s and functions : member
functions can be called as graphics.open(file) instead of
(*graphics.open)(file). The treatment of function designators can
lead to some curious , but valid , syntactic forms . Given the
declarations :
int f ( ) , ( *pf ) ( ) ;
then all of the following expressions are valid function calls :
( &f)(); f(); (*f)(); (**f)(); (***f)();
pf(); (*pf)(); (**pf)(); (***pf)();
The first expression on each line was discussed in the previous
paragraph . The second is conventional usage . All subsequent
expressions take advantage of the implicit conversion of a function
designator to a pointer value , in nearly all expression contexts .
The Committee saw no real harm in allowing these forms ; outlawing
forms like (*f)(), while still permitting *a (for int a[]),
simply seemed more trouble than it was worth .
Basically, the equivalence between function designators and function pointers was added to make using function pointers a little more convenient.
It's a feature inherited from C.
In C, it's allowed primarily because there's not much else the name of a function, by itself, could mean. All you can do with an actual function is call it. If you're not calling it, the only thing you can do is take the address. Since there's no ambiguity, any time a function name isn't followed by a ( to signify a call to the function, the name evaluates to the address of the function.
That actually is somewhat similar to one other part of the language -- the name of an array evaluates to the address of the first element of the array except in some fairly limited circumstances (being used as the operand of & or sizeof).
Since C allowed it, C++ does as well, mostly because the same remains true: the only things you can do with a function are call it or take its address, so if the name isn't followed by a ( to signify a function call, then the name evaluates to the address with no ambiguity.
For arrays, there is no pointer decay when the address-of operator is used:
int a[2];
int * p1 = a; // No address-of operator, so type is int*
int (*p2)[2] = &a; // Address-of operator used, so type is int (*)[2]
This makes sense because arrays and pointers are different types, and it is possible for example to return references to arrays or pass references to arrays in functions.
However, with functions, what other type could be possible?
void foo(){}
&foo; // #1
foo; // #2
Let's imagine that only #2 gives the type void(*)(), what would the type of &foo be? There is no other possibility.
I'm not sure if this is a proper programming question, but it's something that has always bothered me, and I wonder if I'm the only one.
When initially learning C++, I understood the concept of references, but pointers had me confused. Why, you ask? Because of how you declare a pointer.
Consider the following:
void foo(int* bar)
{
}
int main()
{
int x = 5;
int* y = NULL;
y = &x;
*y = 15;
foo(y);
}
The function foo(int*) takes an int pointer as parameter. Since I've declared y as int pointer, I can pass y to foo, but when first learning C++ I associated the * symbol with dereferencing, as such I figured a dereferenced int needed to be passed. I would try to pass *y into foo, which obviously doesn't work.
Wouldn't it have been easier to have a separate operator for declaring a pointer? (or for dereferencing). For example:
void test(int# x)
{
}
In The Development of the C Language, Dennis Ritchie explains his reasoning thusly:
The second innovation that most clearly distinguishes C from its
predecessors is this fuller type structure and especially its
expression in the syntax of declarations... given an object of any
type, it should be possible to describe a new object that gathers
several into an array, yields it from a function, or is a pointer to
it.... [This] led to a
declaration syntax for names mirroring that of the expression syntax
in which the names typically appear. Thus,
int i, *pi, **ppi; declare an integer, a pointer to an integer, a
pointer to a pointer to an integer. The syntax of these declarations
reflects the observation that i, *pi, and **ppi all yield an int type
when used in an expression.
Similarly, int f(), *f(), (*f)(); declare
a function returning an integer, a function returning a pointer to an
integer, a pointer to a function returning an integer. int *api[10],
(*pai)[10]; declare an array of pointers to integers, and a pointer to
an array of integers.
In all these cases the declaration of a
variable resembles its usage in an expression whose type is the one
named at the head of the declaration.
An accident of syntax contributed to the perceived complexity of the
language. The indirection operator, spelled * in C, is syntactically a
unary prefix operator, just as in BCPL and B. This works well in
simple expressions, but in more complex cases, parentheses are
required to direct the parsing. For example, to distinguish
indirection through the value returned by a function from calling a
function designated by a pointer, one writes *fp() and (*pf)()
respectively. The style used in expressions carries through to
declarations, so the names might be declared
int *fp(); int (*pf)();
In more ornate but still realistic cases,
things become worse: int *(*pfp)(); is a pointer to a function
returning a pointer to an integer.
There are two effects occurring.
Most important, C has a relatively rich set of ways of describing
types (compared, say, with Pascal). Declarations in languages as
expressive as C—Algol 68, for example—describe objects equally hard to
understand, simply because the objects themselves are complex. A
second effect owes to details of the syntax. Declarations in C must be
read in an `inside-out' style that many find difficult to grasp.
Sethi [Sethi 81] observed that many of the nested
declarations and expressions would become simpler if the indirection
operator had been taken as a postfix operator instead of prefix, but
by then it was too late to change.
The reason is clearer if you write it like this:
int x, *y;
That is, both x and *y are ints. Thus y is an int *.
That is a language decision that predates C++, as C++ inherited it from C. I once heard that the motivation was that the declaration and the use would be equivalent, that is, given a declaration int *p; the expression *p is of type int in the same way that with int i; the expression i is of type int.
Because the committee, and those that developed C++ in the decades before its standardisation, decided that * should retain its original three meanings:
A pointer type
The dereference operator
Multiplication
You're right to suggest that the multiple meanings of * (and, similarly, &) are confusing. I've been of the opinion for some years that it they are a significant barrier to understanding for language newcomers.
Why not choose another symbol for C++?
Backwards-compatibility is the root cause... best to re-use existing symbols in a new context than to break C programs by translating previously-not-operators into new meanings.
Why not choose another symbol for C?
It's impossible to know for sure, but there are several arguments that can be — and have been — made. Foremost is the idea that:
when [an] identifier appears in an expression of the same form as the declarator, it yields an object of the specified type. {K&R, p216}
This is also why C programmers tend to[citation needed] prefer aligning their asterisks to the right rather than to the left, i.e.:
int *ptr1; // roughly C-style
int* ptr2; // roughly C++-style
though both varieties are found in programs of both languages, varyingly.
Page 65 of Expert C Programming: Deep C Secrets includes the following: And then, there is the C philosophy that the declaration of an object should look like its use.
Page 216 of The C Programming Language, 2nd edition (aka K&R) includes: A declarator is read as an assertion that when its identifier appears in an expression of the same form as the declarator, it yields an object of the specified type.
I prefer the way van der Linden puts it.
Haha, I feel your pain, I had the exact same problem.
I thought a pointer should be declared as &int because it makes sense that a pointer is an address of something.
After a while I thought for myself, every type in C has to be read backwards, like
int * const a
is for me
a constant something, when dereferenced equals an int.
Something that has to be dereferenced, has to be a pointer.
static void increment(long long *n){
(*n)++;
}
struct test{
void (*work_fn)(long long *);
};
struct test t1;
t1.work_fn = increment;
How do I actually call the function now? t1.work_fn(&n) ?
How do I actually call the function now? t1.work_fn(&n) ?
That'll work just fine.
Function pointers don't need to be explicitly dereferenced. This is because even when calling a function normally (using the actual name of the function), you're really calling it through the pointer to the function. C99 6.5.22 "Function calls" says (emphasis mine):
The expression that denotes the called function (footnote 77) shall have type pointer to function returning void or returning an object type other than an array type
Footnote 77:
Most often, this is the result of converting an identifier that is a function designator.
Note that you still can dereference the function pointer (or a normal function name - though I think you'd cause much confusion doing so) to call a function because C99 6.5.3.2/4 "Address and indirection operators" says:
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator
So all of these will end up doing the same thing (though the compiler might not be able to optimize the calls-through t1.work_fn as well):
t1.work_fn(&n);
(*t1.work_fn)(&n);
increment(&n);
(*increment)(&n);
You can call it as t1.work_fn(&n) or as (*t1.work_fn)(&n), whichever you prefer.
Symmetrically, when assigning the pointer you can do either t1.work_fn = increment or t1.work_fn = &increment. Again, it is a matter of personal coding style.
One can probably argue that for the sake of consistency one should stick to either "minimalistic" style
t1.work_fn = increment;
t1.work_fn(&n);
or to a "maximalistic" style
t1.work_fn = &increment;
(*t1.work_fn)(&n);
but not a mix of the two, so that we can have well-defined holy wars between two distinctive camps instead of four.
P.S. Of course, the "minimalistic" style is the only proper one. And one must crack eggs on the pointy end.
Yes, that's how to call it. Function names and variables containing function pointers are essentially the same thing.