C++ Reference Variable Declaration Syntax Reasoning

C++ Reference Variable Declaration Syntax Reasoning - c++

All other declaration syntaxes in C++ make a lot of sense, for examples:
int i;
i is an int
int *i;
when i is dereferenced, the result is an int
int i[];
when i is subscripted, the result is and int
int *i[];
when i is subscriped, then the result is derefrenced, the final result is an int
But when you look at the syntax for reference variables, this otherwise consistent reasoning falls apart.
int &i = x;
“when the address of i is taken, the result is an int” makes no sense.
Am I missing something, or is this truly an exception to the apparent reasoning behind the other sytaxes? If it is an exception, why was this syntax chosen?
Edit:
This question addresses why the & symbol may have been chosen for this purpose, but not whether or not there is a universally consistent way to read declarations different from the way described above.

Once bound, a reference becomes an alias for its referent, and cannot be distinguished from it (except by decltype). Since int& is used exactly as int is, a declaration-follows-usage syntax could not work for declaring references.
The syntax for declaring references is pretty straightforward, still. Just write down a declaration for the corresponding pointer type, then replace the * used for the initial dereference by & or &&.

Related

Self-assignment of variable in its definition

The following C++ program compiles just fine (g++ 5.4 at least gives a warning when invoked with -Wall):
int main(int argc, char *argv[])
{
int i = i; // !
return 0;
}
Even something like
int& p = p;
is swallowed by the compiler.
Now my question is: Why is such an initialization legal? Is there any actual use-case or is it just a consequence of the general design of the language?

This is a side effect of the rule that a name is in scope immediately after it is declared. There's no need to complicate this simple rule just to prevent writing code that's obvious nonsense.

Just because the compiler accepts it (syntactically valid code) does not mean that it has well defined behaviour.
The compiler is not required to diagnose all cases of Undefined Behaviour or other classes of problems.
The standard gives it pretty free hands to accept and translate broken code, on the assumption that if the results were to be undefined or nonsensical the programmer would not have written that code.
So; the absense of warnings or errors from your compiler does not in any way prove that your program has well defined behaviour.
It is your responsibility to follow the rules of the language.
The compiler usually tries to help you by pointing out obvious flaws, but in the end it's on you to make sure your program makes sense.
And something like int i = i; does not make sense but is syntactically correct, so the compiler may or may not warn you, but in any case is within its rights to just generate garbage (and not tell you about it) because you broke the rules and invoked Undefined Behaviour.

I guess the gist of your question is about why the second identifier is recognized as identifying the same object as the first, in int i = i; or int &p = p;
This is defined in [basic.scope.pdecl]/1 of the C++14 standard:
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below. [Example:
unsigned char x = 12;
{ unsigned char x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
The semantics of these statements are covered by other threads:
Is int x = x; UB?
Why can a Class& be initialized to itself?
Note - the quoted example differs in semantics from int i = i; because it is not UB to evaluate an uninitialized unsigned char, but is UB to evaluate an uninitialized int.
As noted on the linked thread, g++ and clang can give warnings when they detect this.
Regarding rationale for the scope rule: I don't know for sure, but the scope rule existed in C so perhaps it just made its way into C++ and now it would be confusing to change it.
If we did say that the declared variable is not in scope for its initializer, then int i = i; might make the second i find an i from an outer scope, which would also be confusing.

Why is the dereference operator (*) also used to declare a pointer?

I'm not sure if this is a proper programming question, but it's something that has always bothered me, and I wonder if I'm the only one.
When initially learning C++, I understood the concept of references, but pointers had me confused. Why, you ask? Because of how you declare a pointer.
Consider the following:
void foo(int* bar)
{
}
int main()
{
int x = 5;
int* y = NULL;
y = &x;
*y = 15;
foo(y);
}
The function foo(int*) takes an int pointer as parameter. Since I've declared y as int pointer, I can pass y to foo, but when first learning C++ I associated the * symbol with dereferencing, as such I figured a dereferenced int needed to be passed. I would try to pass *y into foo, which obviously doesn't work.
Wouldn't it have been easier to have a separate operator for declaring a pointer? (or for dereferencing). For example:
void test(int# x)
{
}

In The Development of the C Language, Dennis Ritchie explains his reasoning thusly:
The second innovation that most clearly distinguishes C from its
predecessors is this fuller type structure and especially its
expression in the syntax of declarations... given an object of any
type, it should be possible to describe a new object that gathers
several into an array, yields it from a function, or is a pointer to
it.... [This] led to a
declaration syntax for names mirroring that of the expression syntax
in which the names typically appear. Thus,
int i, *pi, **ppi; declare an integer, a pointer to an integer, a
pointer to a pointer to an integer. The syntax of these declarations
reflects the observation that i, *pi, and **ppi all yield an int type
when used in an expression.
Similarly, int f(), *f(), (*f)(); declare
a function returning an integer, a function returning a pointer to an
integer, a pointer to a function returning an integer. int *api[10],
(*pai)[10]; declare an array of pointers to integers, and a pointer to
an array of integers.
In all these cases the declaration of a
variable resembles its usage in an expression whose type is the one
named at the head of the declaration.
An accident of syntax contributed to the perceived complexity of the
language. The indirection operator, spelled * in C, is syntactically a
unary prefix operator, just as in BCPL and B. This works well in
simple expressions, but in more complex cases, parentheses are
required to direct the parsing. For example, to distinguish
indirection through the value returned by a function from calling a
function designated by a pointer, one writes *fp() and (*pf)()
respectively. The style used in expressions carries through to
declarations, so the names might be declared
int *fp(); int (*pf)();
In more ornate but still realistic cases,
things become worse: int *(*pfp)(); is a pointer to a function
returning a pointer to an integer.
There are two effects occurring.
Most important, C has a relatively rich set of ways of describing
types (compared, say, with Pascal). Declarations in languages as
expressive as C—Algol 68, for example—describe objects equally hard to
understand, simply because the objects themselves are complex. A
second effect owes to details of the syntax. Declarations in C must be
read in an `inside-out' style that many find difficult to grasp.
Sethi [Sethi 81] observed that many of the nested
declarations and expressions would become simpler if the indirection
operator had been taken as a postfix operator instead of prefix, but
by then it was too late to change.

The reason is clearer if you write it like this:
int x, *y;
That is, both x and *y are ints. Thus y is an int *.

That is a language decision that predates C++, as C++ inherited it from C. I once heard that the motivation was that the declaration and the use would be equivalent, that is, given a declaration int *p; the expression *p is of type int in the same way that with int i; the expression i is of type int.

Because the committee, and those that developed C++ in the decades before its standardisation, decided that * should retain its original three meanings:
A pointer type
The dereference operator
Multiplication
You're right to suggest that the multiple meanings of * (and, similarly, &) are confusing. I've been of the opinion for some years that it they are a significant barrier to understanding for language newcomers.
Why not choose another symbol for C++?
Backwards-compatibility is the root cause... best to re-use existing symbols in a new context than to break C programs by translating previously-not-operators into new meanings.
Why not choose another symbol for C?
It's impossible to know for sure, but there are several arguments that can be — and have been — made. Foremost is the idea that:
when [an] identifier appears in an expression of the same form as the declarator, it yields an object of the specified type. {K&R, p216}
This is also why C programmers tend to[citation needed] prefer aligning their asterisks to the right rather than to the left, i.e.:
int *ptr1; // roughly C-style
int* ptr2; // roughly C++-style
though both varieties are found in programs of both languages, varyingly.

Page 65 of Expert C Programming: Deep C Secrets includes the following: And then, there is the C philosophy that the declaration of an object should look like its use.
Page 216 of The C Programming Language, 2nd edition (aka K&R) includes: A declarator is read as an assertion that when its identifier appears in an expression of the same form as the declarator, it yields an object of the specified type.
I prefer the way van der Linden puts it.

Haha, I feel your pain, I had the exact same problem.
I thought a pointer should be declared as &int because it makes sense that a pointer is an address of something.
After a while I thought for myself, every type in C has to be read backwards, like
int * const a
is for me
a constant something, when dereferenced equals an int.
Something that has to be dereferenced, has to be a pointer.

Why can't a function have a reference argument in C?

For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification? It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?

Because references are a C++ feature.

References are merely syntactic vinegar for pointers. Their implementation is identical, but they hide the fact that the called function might modify the variable. The only time they actually fill an important role is for making other C++ features possible - operator overloading comes to mind - and depending on your perspective these might also be syntactic vinegar.

For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification?
It was not a part of the specification. The syntax "type&" for references were introduced in C++.
It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
I am not sure if it qualifies as a semantic difference, but references offer better protection against dereferencing nulls.

Because the & operator has only 2 meanings in C:
address of its operand (unary),
and, the bitwise AND operator (binary).
int &i; is not a valid declaration in C.

For a function argument, the difference between pointer and reference is not that big a deal, but in many cases (e.g. member variables) having references substantially limits what you can do, since it cannot be rebound.

References were not present in C. However, C did have what amounts to mutable arguments passed by reference. Example:
int foo(int in, int *out) { return (*out)++ + in; }
// ...
int x = 1; int y = 2;
x = foo(x, &y);
// x == y == 3.
However, it was a common error to forget to dereference "out" in every usage in more complicated foo()s. C++ references allowed a smoother syntax for representing mutable members of the closure. In both languages, this can confound compiler optimizations by having multiple symbols referring to the same storage. (Consider "foo(x,x)". Now it's undefined whether the "++" occurs after only "*out" or also after "in", since there's no sequence point between the two uses and the increment is only required to happen sometime after the value of the left expression is taken.)
But additionally, explicit references disambiguate two cases to a C++ compiler. A pointer passed into a C function could be a mutable argument or a pointer to an array (or many other things, but these two adequately illustrate the ambiguity). Contrast "char *x" and "char *y". (... or fail to do so, as expected.) A variable passed by reference into a C++ function is unambiguously a mutable member of the closure. If for instance we had
// in class baz's scope
private: int bar(int &x, int &y) {return x - y};
public : int foo(int &x, int &y) {return x + bar(x,y);}
// exit scope and wander on ...
int a = 1; int b = 2; baz c;
a = c.foo(a,b);
We know several things:
bar() is only called from foo(). This means bar() can be compiled so that its two arguments are found in foo()'s stack frame instead of it's own. It's called copy elision and it's a great thing.
Copy elision gets even more exciting when a function is of the form "T &foo(T &)", the compiler knows a temporary is going in and coming out, and the compiler can infer that the result can be constructed in place of the argument. Then no copying of the temporary in or the result out need be compiled in. foo() can be compiled to get its argument from some enclosing stack frame and write its result directly to some enclosing stack frame.
a recent article about copy elision and (surprise) it works even better if you pass by value in modern compilers (and how rvalue references in C++0x will help the compilers skip even more pointless copies), see http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ .

using a static const int in a struct/class

struct A {
static const int a = 5;
struct B {
static const int b = a;
};
};
int main() {
return A::B::b;
}
The above code compiles. However if you go by Effective C++ book by Scott Myers(pg 14);
We need a definition for a in addition to the declaration.
Can anyone explain why this is an exception?

C++ compilers allow static const integers (and integers only) to have their value specified at the location they are declared. This is because the variable is essentially not needed, and lives only in the code (it is typically compiled out).
Other variable types (such as static const char*) cannot typically be defined where they are declared, and require a separate definition.
For a tiny bit more explanation, realize that accessing a global variable typically requires making an address reference in the lower-level code. But your global variable is an integer whose size is this typically around the size of an address, and the compiler realizes it will never change, so why bother adding the pointer abstraction?

By really pedantic rules, yes, your code needs a definition for that static integer.
But by practical rules, and what all compilers implement because that's how the rules of C++03 are intended - no, you don't need a definition.
The rules for such static constant integers are intended to allow you to omit the definition if the integer is used only in such situations where a value is immediately read, and if the static member can be used in constant expressions.
In your return statement, the value of the member is immediately read, so you can omit the definition of the static constant integer member if that's the only use of it. The following situation needs a definition, however:
struct A {
static const int a = 5;
struct B {
static const int b = a;
};
};
int main() {
int *p = &A::B::b;
}
No value is read here - but instead the address of it is taken. Therefore, the intent of the C++03 Standard is that you have to provide a definition for the member like the following in some implementation file.
const int A::B::b;
Note that the actual rules appearing in the C++03 Standard says that a definition is not required only where the variable is used where a constant expression is required. That rule, however, if strictly applied, is too strict. It would only allow you to omit a definition for situation like array-dimensions - but would require a definition in cases like a return statement. The corresponding defect report is here.
The wording of C++0x has been updated to include that defect report resolution, and to allow your code as written.

However, if you try the ternary operand without "defining" static consts, you get a linker error in GCC 4x:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13795
So, although constructs like int k = A::CONSTVAL; are illegal in the current standard, they are supported. But the ternary operand is not. Some operators are more equal than others, if you get my drift :)
So much for "lax" rules. I suggest you write code conforming to the standard if you do not want surprises.

In general, most (and recent) C++ compilers allow static const ints
You just lucky, perhaps not. Try older compiler, such as gcc 2.0 and it will vehemently punish you with-less-than-pretty error message.

Why do we use "type * var" instead of "type & var" when defining a pointer?

I'm relatively new to C++ (about one year of experience, on and off). I'm curious about what led to the decision of type * name as the syntax for defining pointers. It seems to me that the syntax should be type & name as the & symbol is used everywhere else in code to refer to the variable's memory address. So, to use the traditional example of int pointers:
int a = 1;
int * b = &a;
would become
int a = 1;
int & b = &a
I'm sure there's some reason for this that I'm just not seeing, and I'd love to hear some input from C++ veterans.
Thanks,
-S

C++ adopts the C syntax. As revealed in "The Development of the C Language" (by Dennis Ritchie) C uses * for pointers in type declarations because it was decided that type syntax should follow use.
For each object of [a compound type], there was already a way to mention the underlying object: index the array, call the function, use the indirection operator [*] on the pointer. Analogical reasoning led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation that i, *pi, and **ppi all yield an int type when used in an expression.
Here's a more complex example:
int *(*foo)[4][];
This declaration means an expression *(*foo)[4][0] has type int, and from that (and that [] has higher precedence than unary *) you can decode the type: foo is a pointer to an array of size 4 of array of pointers to ints.
This syntax was adopted in C++ for compatibility with C. Also, don't forget that C++ has a use for & in declarations.
int & b = a;
The above line means a reference variable refering to another variable of type int. The difference between a reference and pointer roughly is that references are initialized only, and you can not change where they point, and finally they are always dereferenced automatically.
int x = 5, y = 10;
int& r = x;
int sum = r + y; // you do not need to say '*r' automatically dereferenced.
r = y; // WRONG, 'r' can only have one thing pointing at during its life, only at its infancy ;)

I think that Dennis Ritchie answered this in The Development of the C Language:
For each object of such a composed
type, there was already a way to
mention the underlying object: index
the array, call the function, use the
indirection operator on the pointer.
Analogical reasoning led to a
declaration syntax for names mirroring
that of the expression syntax in which
the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an
integer, a pointer to a pointer to an
integer. The syntax of these
declarations reflects the observation
that i, *pi, and **ppi all yield an
int type when used in an expression.
Similarly,
int f(), *f(), (*f)();
declare a function returning an
integer, a function returning a
pointer to an integer, a pointer to a
function returning an integer;
int *api[10], (*pai)[10];
declare an array of pointers to
integers, and a pointer to an array of
integers. In all these cases the
declaration of a variable resembles
its usage in an expression whose type
is the one named at the head of the
declaration.
So we use type * var to declare a pointer because this allows the declaration to mirror the usage (dereferencing) of the pointer.
In this article, Ritchie also recounts that in "NB", an extended version of the "B" programming language, he used int pointer[] to declare a pointer to an int, as opposed to int array[10] to declare an array of ints.

If you are a visual thinker, it may help to imagine the asterisk as a black hole leading to the data value. Hence, it is a pointer.
The ampersand is the opposite end of the hole, think of it as an unraveled asterisk or a spaceship wobbling about in an erratic course as the pilot gets over the transition coming out of the black hole.
I remember being very confused by C++ overloading the meaning of the ampersand, to give us references. In their desperate attempt to avoid using any more characters, which was justified by the international audience using C and known issues with keyboard limitations, they added a major source of confusion.
One thing that may help in C++ is to think of references as pre-prepared dereferenced pointers. Rather than using &someVariable when you pass in an argument, you've already used the trailing ampersand when you defined someVariable. Then again, that might just confuse you further!
One of my pet hates, which I was unhappy to see promulgated in Apple's Objective-C samples, is the layout style int *someIntPointer instead of int* someIntPointer
IMHO, keeping the asterisk with the variable is an old-fashioned C approach emphasizing the mechanics of how you define the variable, over its data type.
The data type of someIntPointer is literally a pointer to an integer and the declaration should reflect that. This does lead to the requirement that you declare one variable per line, to avoid subtle bugs such as:
int* a, b; // b is a straight int, was that our intention?
int *a, *b; // old-style C declaring two pointers
int* a;
int* b; // b is another pointer to an int
Whilst people argue that the ability to declare mixed pointers and values on the same line, intentionally, is a powerful feature, I've seen it lead to subtle bugs and confusion.

Your second example is not valid C code, only C++ code. The difference is that one is a pointer, whereas the other is a reference.
On the right-hand side the '&' always means address-of. In a definition it indicates that the variable is a reference.
On the right-hand side the '*' always means value-at-address. In a definition it indicates that the variable is a pointer.
References and pointers are similar, but not the same. This article addresses the differences.

Instead of reading int* b as "b is a pointer to int", read it as int *b: "*b is an int". Then, you have & as an anti-*: *b is an int. The address of *b is &*b, or just b.

I think the answer may well be "because that's the way K&R did it."
K&R are the ones who decided what the C syntax for declaring pointers was.
It's not int & x; instead of int * x; because that's the way the language was defined by the guys who made it up -- K&R.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js