Related
I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
So, I always knew that the array "objects" that are passed around in C/C++ just contained the address of the first object in the array.
How can the pointer to the array "object" and it's contained value be the same?
Could someone point me towards more information maybe about how all that works in assembly, maybe.
Short answer: A pointer to an array is defined to have the same value as a pointer to the first element of the array. That's how arrays in C and C++ work.
Pedantic answer:
C and C++ have rvalue and lvalue expressions. An lvalue is something to which the & operator may be applied. They also have implicit conversions. An object may be converted to another type before being used. (For example, if you call sqrt( 9 ) then 9 is converted to double because sqrt( int ) is not defined.)
An lvalue of array type implicitly converts to a pointer. The implicit conversion changes array to &array[0]. This may also be written out explicitly as static_cast< int * >( array ), in C++.
Doing that is OK. Casting to void* is another story. void* is a bit ugly. And casting with the parentheses as (void*)array is also ugly. So please, avoid (void*) a in actual code.
You are mixing two unrelated (and, actually, mutually exclusive) things, which creates more confusion.
Firstly, you are correctly stating that "array objects that are passed around in C/C++ just contained the address of the first object in the array". The key words here are "passed around". In reality arrays cannot be passed around as array objects. Arrays are not copyable. Whenever you are using an array-style declaration in function parameter list it is actually interpreted as pointer declaration, i.e. it is a pointer that you are "passing around", not the array. However, in such situations your equality does not hold
void foo(int a[]) {
assert((void *) &a == (void *) a); // FAIL!!!
}
The above assertion is guaranteed to fail - the equality does not hold. So, within the context of this question you have to forget about arrays that you "pass around" (at least for the syntax used in the above example). Your equality does not hold for arrays that have been replaced by pointer objects.
Secondly, actual array objects are not pointers. And there's no need to take the term object into quotation markes. Arrays are full-fledged objects, albeit with some peculiar properties. The equality in question does indeed hold for the actual arrays that have not lost their "arrayness", i.e. array object that have not been replaced by pointer objects. For example
int a[10];
assert((void *) &a == (void *) a); // Never fails
What it means is that numerically the address of the entire array is the same as the address of its first element. Nothing unusual here. In fact, the very same (in nature) equality can be observed with struct types in C/C++
struct S { int x; } a;
assert((void *) &a == (void *) &a.x); // Never fails
I.e. the address of the entire struct object is the same as the address of its first field.
How can the pointer to the array "object" and it's contained value be the same?
An array is a contiguous block of memory which stores several elements.
Obviously, the first element in the array is located at some address.
There's no data "in between" the first element and the beginning of the actual array.
Therefore, the first element has the same address as the array.
Please read the following thread
http://www.cplusplus.com/forum/beginner/29595/
It basically explains that (&a != a) due to the type difference (since &a returns the pointer to the array and a to the first element) even though they both point to the same address.
Since you are casting them both to (void*) only the address value is compared and found to be equal, meaning that ((void*) a == (void*)&a) as you've stated. This makes sense since the array's address has to be the same as the first elements.
Let's look at these two declarations:
int a[4];
int * b;
Both a and b have a type compatible with int * and can, for example, be passed as an argument to a function expecting int *:
void f(int * p);
f(a); // OK
f(b); // OK
In case of a, the compiler allocates space for 4 int values. When you use the name a, such as when calling f(a), the compiler just substitutes the address of where it allocated the first of those int values, since it knows.
In case of b, the compiler allocates space for one pointer. When you use the name b, such as when calling f(b), the compiler generates code for retrieveing the pointer value from the allocated storage.
When it comes to &, that's when the difference between a and b becomes apparent. & always means the address of the storage the compiler has allocated for your variable: &a is the address of those four int values (therefore coinciding with just a), while &b is the address of the pointer value. They have different types, too.
&a is not exactly the same as a, though, even though they compare as equal. They have a different type: &a is a pointer and a is an array. You can notice the difference, for example, if you apply the sizeof operator to these expressions: sizeof(a) will evaluate to the size of four int values, while sizeof(&a) is the size of a pointer.
Ok, So what I thought happened is that when you created an array, you allocated space for the array somewhere and you created a pointer to its first object somewhere else, and what you passed around in your code was the pointer.
This is actually the behavior of what happens when you create an array with new in C++ or with malloc in C/C++. As such,
int * a = new a[SIZE];
assert((void*)&a==(void*)a); // Always fails
What I learned is that for arrays declared in the style of int a[SIZE];, a pointer to the first element is created when you try to pass the array to a function (this is called array-pointer decay). It's interesting to note that, indeed, as AndreyT writes,
void foo(int a[]) {
assert((void *) &a == (void *) a); // Always fails
}
This shows that it's only when you try to pass arrays around that a pointer is created for arrays in the style of int a[SIZE];.
#include<iostream>
using namespace std;
int main()
{
int *p,*c;
p=(int*)10;
c=(int*)20;
cout<<(int)p<<(int)c;
}
Somebody asked me "What is wrong with the above code?" and I couldn't figure it out. Someone please help me.
The fact that int and pointer data types are not required to have the same number of bits, according to the C++ standard, is one thing - that means you could lose precision.
In addition, casting an int to an int pointer then back again is silly. Why not just leave it as an int?
I actually did try to compile this under gcc and it worked fine but that's probably more by accident than good design.
Some wanted a quote from the C++ standard (I'd have put this in the comments of that answer if the format of comments wasn't so restricted), here are two from the 1999 one:
5.2.10/3
The mapping performed by reinterpret_cast is implementation defined.
5.2.10/5
A value of integral type or enumeration type can be explicitly converted to a pointer.
A pointer converted to an integer of sufficient size (if ant such exists on the implementation)
and back to the same pointer type will have its original value; mappings between pointers and
integers are otherwise implementation-defined.
And I see nothing mandating that such implementation-defined mapping must give a valid representation for all input. Otherwise said, an implementation on an architecture with address registers can very well trap when executing
p = (int*)10;
if the mapping does not give a representation valid at that time (yes, what is a valid representation for a pointer may depend of time. For instance delete may make invalid the representation of the deleted pointer).
Assuming I'm right about what this is supposed to be, it should look like this:
int main()
{
int *p, *c;
// Something that creates whatever p and c point to goes here, a trivial example would be.
int pValue, cValue;
p = &pValue;
c = &cValue;
// The & operator retrieves the memory address of pValue and cValue.
*p = 10;
*c = 20;
cout << *p << *c;
}
In order to assign or retrieve a value to a variable referenced by a pointer, you need to dereference it.
What your code is doing is casting 10 into pointer to int (which is the memory address where the actual int resides).
addresses p and c may be larger than int.
The problem on some platforms you need
p = (int*) (long) 10;
See GLIB documentation on type conversion macros.
And for the people who might not find a use for this type of expressions, it is possible to return data inside pointer value returning functions. You can find real-world examples, where this case it is better to use this idiom, instead of allocating a new integer on the heap, and return it back - poor performance, memory fragmentation, just ugly.
You're assigning values (10 and 20) to the pointers which obviously is a potential problem if you try to read the data at those addresses. Casting the pointer to an integer is also really ugly. And your main function does not have a return statement. That is just a few things.
there is more or less everything wrong with it:
int *p,*c;
p=(int*)10;
c=(int*)20;
afterwards p is pointing to memory address 10
afterwards c is pointing to memory address 20
This doesn't look very intentional.
And I suppose that the whole program will simply crash.
I'm relatively new to C++ (about one year of experience, on and off). I'm curious about what led to the decision of type * name as the syntax for defining pointers. It seems to me that the syntax should be type & name as the & symbol is used everywhere else in code to refer to the variable's memory address. So, to use the traditional example of int pointers:
int a = 1;
int * b = &a;
would become
int a = 1;
int & b = &a
I'm sure there's some reason for this that I'm just not seeing, and I'd love to hear some input from C++ veterans.
Thanks,
-S
C++ adopts the C syntax. As revealed in "The Development of the C Language" (by Dennis Ritchie) C uses * for pointers in type declarations because it was decided that type syntax should follow use.
For each object of [a compound type], there was already a way to mention the underlying object: index the array, call the function, use the indirection operator [*] on the pointer. Analogical reasoning led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation that i, *pi, and **ppi all yield an int type when used in an expression.
Here's a more complex example:
int *(*foo)[4][];
This declaration means an expression *(*foo)[4][0] has type int, and from that (and that [] has higher precedence than unary *) you can decode the type: foo is a pointer to an array of size 4 of array of pointers to ints.
This syntax was adopted in C++ for compatibility with C. Also, don't forget that C++ has a use for & in declarations.
int & b = a;
The above line means a reference variable refering to another variable of type int. The difference between a reference and pointer roughly is that references are initialized only, and you can not change where they point, and finally they are always dereferenced automatically.
int x = 5, y = 10;
int& r = x;
int sum = r + y; // you do not need to say '*r' automatically dereferenced.
r = y; // WRONG, 'r' can only have one thing pointing at during its life, only at its infancy ;)
I think that Dennis Ritchie answered this in The Development of the C Language:
For each object of such a composed
type, there was already a way to
mention the underlying object: index
the array, call the function, use the
indirection operator on the pointer.
Analogical reasoning led to a
declaration syntax for names mirroring
that of the expression syntax in which
the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an
integer, a pointer to a pointer to an
integer. The syntax of these
declarations reflects the observation
that i, *pi, and **ppi all yield an
int type when used in an expression.
Similarly,
int f(), *f(), (*f)();
declare a function returning an
integer, a function returning a
pointer to an integer, a pointer to a
function returning an integer;
int *api[10], (*pai)[10];
declare an array of pointers to
integers, and a pointer to an array of
integers. In all these cases the
declaration of a variable resembles
its usage in an expression whose type
is the one named at the head of the
declaration.
So we use type * var to declare a pointer because this allows the declaration to mirror the usage (dereferencing) of the pointer.
In this article, Ritchie also recounts that in "NB", an extended version of the "B" programming language, he used int pointer[] to declare a pointer to an int, as opposed to int array[10] to declare an array of ints.
If you are a visual thinker, it may help to imagine the asterisk as a black hole leading to the data value. Hence, it is a pointer.
The ampersand is the opposite end of the hole, think of it as an unraveled asterisk or a spaceship wobbling about in an erratic course as the pilot gets over the transition coming out of the black hole.
I remember being very confused by C++ overloading the meaning of the ampersand, to give us references. In their desperate attempt to avoid using any more characters, which was justified by the international audience using C and known issues with keyboard limitations, they added a major source of confusion.
One thing that may help in C++ is to think of references as pre-prepared dereferenced pointers. Rather than using &someVariable when you pass in an argument, you've already used the trailing ampersand when you defined someVariable. Then again, that might just confuse you further!
One of my pet hates, which I was unhappy to see promulgated in Apple's Objective-C samples, is the layout style int *someIntPointer instead of int* someIntPointer
IMHO, keeping the asterisk with the variable is an old-fashioned C approach emphasizing the mechanics of how you define the variable, over its data type.
The data type of someIntPointer is literally a pointer to an integer and the declaration should reflect that. This does lead to the requirement that you declare one variable per line, to avoid subtle bugs such as:
int* a, b; // b is a straight int, was that our intention?
int *a, *b; // old-style C declaring two pointers
int* a;
int* b; // b is another pointer to an int
Whilst people argue that the ability to declare mixed pointers and values on the same line, intentionally, is a powerful feature, I've seen it lead to subtle bugs and confusion.
Your second example is not valid C code, only C++ code. The difference is that one is a pointer, whereas the other is a reference.
On the right-hand side the '&' always means address-of. In a definition it indicates that the variable is a reference.
On the right-hand side the '*' always means value-at-address. In a definition it indicates that the variable is a pointer.
References and pointers are similar, but not the same. This article addresses the differences.
Instead of reading int* b as "b is a pointer to int", read it as int *b: "*b is an int". Then, you have & as an anti-*: *b is an int. The address of *b is &*b, or just b.
I think the answer may well be "because that's the way K&R did it."
K&R are the ones who decided what the C syntax for declaring pointers was.
It's not int & x; instead of int * x; because that's the way the language was defined by the guys who made it up -- K&R.
I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.