Why is the Dereference operator used to declare pointers? - c++

Why is the * used to declare pointers?
It remove indirection, but doesn't remove any when you declare a pointer like int *a = &b, shouldn't it remove the indirection of &b?

Many symbols in C and C++ are overloaded. That is, their meanings depend on the context where they are used. For example, the symbol & can denote the address-of operator and the binary bitwise AND operator.
The symbol * used in a declaration denotes a pointer:
int b = 10;
int *a = &b,
but used in expressions, when applied to a variable of a pointer type, denotes the dereference operator, for example:
printf( "%d\n", *a );
It also can denote the multiplication operator, for example you can write:
printf( "%d\n", b ** a );
that is the same as
printf( "%d\n", b * *a );
Similarly, the pair of square braces can be used in a declaration of an array, like:
int a[10];
and as the subscript operator:
a[5] = 5;

Any time you have a pointer declaration with initialization like this:
type *x = expr;
it is equivalent to the separate initialization followed by assignment:
type *x;
x = expr;
It is not equivalent to
type *x;
*x = expr; /* WRONG */

It's part of the declaration, so it's not a dereference operator at all. Yet it still means the same thing.
int *a
can be read as
The value pointed by a is an int.
just like
printf("%d\n", *a);
can be read as
Print the value pointed by a.
See this answer to Correct way of declaring pointer variables in C/C++ for a bit on background on this.

Back when C was invented, a somewhat interesting choice was made. Variables would be declared in a kind of echo of how they would be used.
So
int a;
means "you can get an int out of a. Then,
int *a;
means you can get an int out of *a.
And,
int a[3];
means you can get an int out of a[index]; here, the size is put in where the index would be.
This can be chained to complex cases
int *a[3];
vs
int (*a)[3];
now if you don't know how C parsing works, this is opaque; but at least you only have to learn it once!
This is why some people think int* a is bad form, because the * is really attached to a not the int.
Initialization of the named variable is a different thing.
(some declaration) = (some expression)
the symbols in the declaration never act as expressions. They are the same symbols, just different meaning.

Related

Meaning of references, address-of, dereference and pointer

Here is the way I understand * and & symbols in C and C++.
In C, * serves two purposes. First it can be used to declare a pointer variable like so int* pointerVariable
It can however be used as a dereference operator like so *pointerVariable which returns value saved at that address, it understands how to interpret bytes at that address based on what data type we have declared that pointer is pointing to. In our case int* therefore it reads bytes saved at that address and returns back whole number.
We also have address-of operator in C like so &someVariable which returns address of bytes saved underneath someVariable name.
However in C++ (not in C), we also get a possibility to use & in declaration of reference like so int& someReference. This will turn variable someReference into a reference, which means that whatever value we pass into that variable, it will automatically get address of the value we are passing into it and it will hold it.
Do I get this correctly?
Do I get this correctly?
Yes, but it is better to think about pointers and references in terms of what you want to do.
References are very useful for all those cases where you need to refer to some object without copying it. References are simple: they are always valid and there is no change in syntax when you use the object.
Pointers are for the rest of cases. Pointers allow you to work with addresses (pointer arithmetic), require explicit syntax to refer to the object behind them (*, &, -> operators), are nullable (NULL, nullptr), can be modified, etc.
In summary, references are simpler and easier to reason about. Use pointers when a reference does not cut it.
General Syntax for defining a pointer:
data-type * pointer-name = &variable-name
The data-type of the pointer must be the same as that of the variable to which it is pointing.
void type pointer can handle all data-types.
General Syntax for defining a reference variable:
data-type & reference-name = variable-name
The data-type of the reference variable must be the same as that of the variable of which it is an alias.
Let's look at each one of them, for the purpose of explanation, I will go with a simple Swap Program both in C and C++.
Swapping two variables by the pass by reference in C
#include <stdio.h>
void swap(int *,int *); //Function prototype
int main()
{
int a = 10;
int b = 20;
printf("Before Swap: a=%d, b=%d\n",a,b);
swap(&a,&b); //Value of a,b are passed by reference
printf("After Swap: a=%d, b=%d\n",a,b);
return 0;
}
void swap(int *ptra,int *ptrb)
{
int temp = *ptra;
*ptra = *ptrb;
*ptrb = temp;
}
In the code above we have declared and initialized variable a and
b to 10 and 20 respectively.
We then pass the address of a
and b to swap function by using the addressof (&) operator. This operator gives the address of the variable.
These passed arguments are assigned to the respective formal parameters which in this case are int pointers ptra and ptrb.
To swap the variables, we first need to temporarily store the value of one of the variables. For this, we stored value pointed by the pointer ptra to a variable temp. This was done by first dereferencing the pointer by using dereference (*) operator and then assigning it to temp. dereference (*) operator is used to access the value stored in the memory location pointed to by a pointer.
Once, the value of pointed by ptra is saved, we can then assign it a new value, which in this case, we assigned it the value of variable b(again with the help of dereference (*) operator). And the ptrb was assigned the value saved in temp(original value of a). Therefore, swapping the value of a and b, by altering the memory location of those variables.
Note: We can use dereference (*) operator and the addressof (&) operator together like this, *&a, they nullify each other resulting in just a
We can write a similar program in C++ by using pointers to swap two numbers as well but the language supports another type variable known as the reference variable. It provides an alias (alternative name) for a previously defined variable.
Swapping two variables by the call by reference in C++
#include <iostream>
using namespace std;
void swap(int &,int &); //Function prototype
int main()
{
int a = 10;
int b = 20;
cout << "Before Swap: a= " << a << " b= " << b << endl;
swap(a,b);
cout << "After Swap: a= " << a << " b= " << b << endl;
return 0;
}
void swap(int &refa,int &refb)
{
int temp = refa;
refa = refb;
refb = temp;
}
In the code above when we passed the variables a and b to the function swap, what happened is the variable a and b got their respective reference variables refa and refb inside the swap. It's like giving a variable another alias name.
Now, we can directly swap the variables without the dereferencing (*) operator using the reference variables.
Rest logic remains the same.
So before we get into the differences between pointers and references, I feel like we need to talk a little bit about declaration syntax, partly to explain why pointer and reference declarations are written that way and partly because the way many C++ programmers write pointer and reference declarations misrepresent that syntax (get comfortable, this is going to take a while).
In both C and C++, declarations are composed of a sequence of declaration specifiers followed by a sequence of declarators1. In a declaration like
static unsigned long int a[10], *p, f(void);
the declaration specifiers are static unsigned long int and the declarators are a[10], *p, and f(void).
Array-ness, pointer-ness, function-ness, and in C++ reference-ness are all specified as part of the declarator, not the declaration specifiers. This means when you write something like
int* p;
it’s parsed as
int (*p);
Since the unary * operator is a unique token, the compiler doesn't need whitespace to distinguish it from the int type specifier or the p identifier. You can write it as int *p;, int* p;, int * p;, or even int*p;
It also means that in a declaration like
int* p, q;
only p is declared as a pointer - q is a regular int.
The idea is that the declaration of a variable closely matches its use in the code ("declaration mimics use"). If you have a pointer to int named p and you want to access the pointed-to value, you use the * operator to dereference it:
printf( "%d\n", *p );
The expression *p has type int, so the declaration of p is written
int *p;
This tells us that the variable p has type "pointer to int" because the combination of p and the unary operator * give us an expression of type int. Most C programmers will write the pointer declaration as shown above, with the * visibly grouped with p.
Now, Bjarne and the couple of generations of C++ programmers who followed thought it was more important to emphasize the pointer-ness of p rather than the int-ness of *p, so they introduced the
int* p;
convention. However, this convention falls down for anything but a simple pointer (or pointer to pointer). It doesn't work for pointers to arrays:
int (*a)[N];
or pointers to functions
int (*f)(void);
or arrays of pointers to functions
int (*p[N])(void);
etc. Declaring an array of pointers as
int* a[N];
just indicates confused thinking. Since [] and () are postfix, you cannot associate the array-ness or function-ness with the declaration specifiers by writing
int[N] a;
int(void) f;
like you can with the unary * operator, but the unary * operator is bound to the declarator in exactly the same way as the [] and () operators are.2
C++ references break the rule about "declaration mimics use" hard. In a non-declaration statement, an expression &x always yields a pointer type. If x has type int, &x has type int *. So & has a completely different meaning in a declaration than in an expression.
So that's syntax, let's talk about pointers vs. references.
A pointer is just an address value (although with additional type information). You can do (some) arithmetic on pointers, you can initialize them to arbitrary values (or NULL), you can apply the [] subscript operator to them as though they were an array (indeed, the array subscript operation is defined in terms of pointer operations). A pointer is not required to be valid (that is, contain the address of an object during that object's lifetime) when it's first created.
A reference is another name for an object or function, not just that object's or function's address (this is why you don't use the * operator when working with references). You can't do pointer arithmetic on references, you can't assign arbitrary values to a reference, etc. When instantiated, a reference must refer to a valid object or function. How exactly references are represented internally isn't specified.
This is the C terminology - the C++ terminology is a little different.
In case it isn't clear by now I consider the T* p; idiom to be poor practice and responsible for no small amount of confusion about pointer declaration syntax; however, since that's how the C++ community has decided to do things, that's how I write my C++ code. I don't like it and it makes me itch, but it's not worth the heartburn to argue over it or to have inconsistently formatted code.
Simple answer:
Reference variables are an alias to the data passed to them, another label.
int var = 0;
int& refVar = var;
In practical terms, var and refVar are the same object.
Its worth noting that references to heap pointer data cannot deallocate (delete) the data, as its an alias of the data;
int* var = new int{0};
int& refVar = *var;
delete refVar // error
and references to the pointer itself can deallocate (delete) the data, as its an alias of the pointer.
int* var = new int{0};
int*& refVar = var;
delete refVar // good

The difference between int* ptr; and int *ptr; C++ pointers [duplicate]

I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.

Operator Precedence in declaring more pointers in one instruction

I would like to understand why do i have to add an asterisk before each identifier when declaring more pointers with the same datatype on the same row.
Here is where i read from
Another thing that may call your attention is the line:
int * p1, * p2;
This declares the two pointers used in the previous example. But notice that there is an asterisk (*) for each pointer, in order for both to have type int* (pointer to int). This is required due to the precedence rules. Note that if, instead, the code was:
int * p1, p2;
p1 would indeed be of type int*, but p2 would be of type int. Spaces do not matter at all for this purpose. But anyway, simply remembering to put one asterisk per pointer is enough for most pointer users interested in declaring multiple pointers per statement. Or even better: use a different statement for each variable.
The operator precedence
The question: What rule is used here, what precedence is this? It is about comma or asterisk? I can't figure it out.
There is no any precedence rule. The gramma of a simple declaration looks like
decl-specifier-seq init-declarator-listopt ;
The sign * belongs to declarators not to decl-specidier-seq as for example type specifier int.
So you may for example rewrite the declaration
int * p1, * p2;
like
int ( * p1 ), ( * p2 );
where ( *p1 ) and ( *p2 ) are decalartors (thouhg in this case the parentheses are redundant)
You may not write for example
( int * ) p1, p2;
The compiler will issue an error.
Parentheses are required when a more complicated type is declared. For example let's declare a pointer to an array
int ( *p )[N];
where N is some constant.
So you may enclose declarators in parentheses.
Let's consider a more complicated declaration: of a function that returns a pointer to function and has as a parameter another function
void ( *f( int cmp( const void *, const void * ) )( int *a );
As for the precedence then rules of building declarators describe them in the gramma
For example
if you will write
int * a[10];
then it is an array of 10 elements of the type int *.
However if you will write
int ( *a[10] );
then its an array of 10 pointers to objects of the type int.
If you will write
int ( *a )[10];
then it is a pointer to an array of 10 integers.
Take into account that typedef is also a decl-specifier.
So for example this typedef
typedef int *intPtr;
you may rewrite like
int typedef *intPtr;
or even like
int typedef ( *intPtr );
One more example of declaration. Let's consider a multidimensional array. In can be declared like
int ( ( ( a )[N1] )[N2] );
though again the parentheses are redundant. However they can help to understand how arrays are implicitly converted to pointers to their first elements in expressions.
For example if you have an array
int a[N1][N2];
then to get a pointer declaration to its first element you can rewrite the declaration ,like
int ( a[N1] )[N2];
and now substitute a[N1] for *a (or for example *p).
int ( *p )[N2] = a;
There is not a precedence rule per se; rather, it is a rule that says that the int part applies to all the variables whereas the * only applies to the one right after it.
The general version of the rule is that all the specifiers in the declaration apply to every entity being declared. Specifiers include keywords like constexpr and static, as well as keywords that denote types like int and user-defined type names. The operators like * and & that modify the type specifiers to create more complex types, however, only apply to one entity at a time.
There's no operator precedence involved here. In fact there are no operators either. Operators operate on expressions, but this is a declaration.
The syntax for declarations is that:
T D1, D2, D3, D4;
means the same as:
T D1; T D2; T D3; T D4;
where:
T is declaration-specifier: no symbols, only keywords (e.g. int, const, static) and/or typedef-names.
Dn is a declarator, that is, an identifier (the name of the variable) perhaps with *, [] or (parameter-list) or grouping parentheses attached in various ways.
In your first example T is int, and the declarators are *p1 and *p2 .

Pointers: initialisation vs. declaration

I am a C++ noob and I am quite sure this is a stupid question, but I just do not quite understand why an error arises (does not arise) from the following code:
#include <iostream>
using namespace std;
int main()
{
int a,*test;
*test = &a; // this error is clear to me, since an address cannot be
// asigned to an integer
*(test = &a); // this works, which is also clear
return 0;
}
But why does this work too?
#include <iostream>
using namespace std;
int main()
{
int a, *test= &a; // Why no error here?, is this to be read as:
// *(test=&a),too? If this is the case, why is the
// priority of * here lower than in the code above?
return 0;
}
The fundamental difference between those two lines
*test= &a; // 1
int a, *test= &a; // 2
is that the first is an expression, consisting of operator calls with the known precedence rules:
operator=
/\
/ \
/ \
operator* operator&
| |
test a
whereas the second is a variable declaration and initialization, and equivalent to the declaration of int a; followed by:
int* test = &a
// ^^ ^^ ^^
//type variable expression giving
// name initial value
Neither operator* nor operator= is even used in the second line.
The meaning of the tokens * and = (and & as well as ,) is dependent on the context in which they appear: inside of an expression they stand for the corresponding operators, but in a declaration * usually appears as part of the type (meaning "pointer to") and = is used to mark the beginning of the (copy) initialization expression (, separates multiple declarations, & as "reference to" is also part of the type).
int a, *test= &a;
is equivalent of:
int a;
int* test = &a;
and perfectly valid as you initialize test which has a type of pointer to int with an address of variable a which has a type of int.
You're confusing two uses for *.
In your first example, you're using it to dereference a pointer.
In the second example, you're using it to declare a "pointer to int".
So, when you use * in a declaration, it's there to say that you're declaring a pointer.
You are actually doing an initialisation like this in first case,
int *test = &a;
It means that, you are initialising a pointer for which you mention * to tell the compiler that its a pointer.
But after initialisation doing a *test (with an asterisk) means that you are trying to access the value at the address assigned to pointer test.
In other words, doing an *test means you are getting the value of a because address of a is stored into pointer test which is done by just doing a &a.
& is the operator to get the address of any variable. And * is the operator to get the value at address.
So initialisation & assignment are inferred differently by the compiler even if the asterisk * is present in both the cases.
You just hit two of the horrible language design spots: squeezing declarations into one line and reuse of * symbol for unrelated purposes. In this case * is used to declare a pointer (when it is used as part of type signature int a,*test;) and to deference a pointer (when it is used as a statement *test = &a;). The good practice would be to declare variables one at a time, to use automatic type deduction instead of type copypasting and to use dedicated addressof method:
#include <memory> // for std::addressof
int a{};
auto const p_a{::std::addressof(a)};
There's a subtle difference there.
When you declare int a, *test, you're saying "declare a as an integer, and declare test as a pointer to an integer, with both of them uninitialized."
In your first example, you set *test to &a right after the declarations. That translates to: "Set the integer that test points to (the memory address) to the address of a." That will almost certainly crash because test wasn't initialized, so it would either be a null pointer or gibberish.
In the other example, int a, *test= &a translates to: "declare a as an uninitialized integer, and declare test as a pointer initialized to the address of a." That's valid. More verbosely, it translates to:
int a, *test;
test = &a;

Confusion with dereference operator

Just to be clear of the facts, I wanted ask
int i = 1;
int *p = &i;
i = *p;
What do you call the operator * on p? is called the dereference operator same as in line 3?
This is an example of a pointer declaration:
int *p = &i;
This is an example of a dereference operator:
i = *p;
Operators apply only to variables, not types, they are not the same thing.
* in this case is not an operator, it's a declarator. When applied to a type it turns it into a pointer to that type.
You can think of int* as a type.
Writing it this way makes it more clear:
int i = 1;
int* p = &i;
i = *p;
int i = 1;
int *p = &i;
The * is part of the syntax of a declaration. int *p says that p is of type int*.
i = *p;
That * is a dereference operator.
But the fact that both use the same syntax is not a coincidence. C's (seemingly bizarre) declaration syntax is largely based on the principle that "declaration follows usage".
One way to read the declaration
int *p;
is that it declares that *p is of type int. It follows from that that p is of type int*. Similarly:
int *a, b, c[20];
says that *a, b, and c[blah] are all of type int, so a is of type int*, b is of type int, and c is of type int[20]. (Note that the correspondence between declaration and usage isn't perfect; c[20] is just past the end of the array, and doesn't actually exist.)
This is why (at least in one school of thought) the * in a pointer declaration goes next to the identifier, not the type.
That's a pointer declaration, not a dereference operator.
From the cplusplus.com tutorial:
I want to emphasize that the asterisk sign (*) that we use when
declaring a pointer only means that it is a pointer (it is part of its
type compound specifier), and should not be confused with the
dereference operator that we have seen a bit earlier, but which is
also written with an asterisk (*). They are simply two different
things represented with the same sign.
To define pointer in this way
int *p = &i;
is somehow confusing. I suggest you use this (note the space)
int* p = &i;
But there will be a problem when defining more than one pointers in such way:
int* p = &i, * q = NULL;
So more clearly, define the pointer type first, and use that type to define variables:
typedef int* int_ptr;
int_ptr p = &i, q = NULL;