Returning a reference and returning a value - c++

I am not quite sure that I have understood why there is a problem when we return the reference of a local random variable. So let's say that we have this example.
int *myFunc() {
int phantom = 4;
return &phantom;
}
Then the usual argument is that when the function is used, the memory of the variable phantom is no longer available after the execution of the code line int phantom = 4; so it cannot be returned (at least this is what I have understood so far). On the other hand, for the function,
int myFunc() {
int phantom = 4;
return phantom;
}
the value of the integer variable phantom will return. (I see the returning of the value as the dereferencing of an underlying pointer for the variable phantom).
What do I miss here?? Why in the first case there is a compilation error and in the second case everything works??

The first doesn't return a reference, it returns a pointer. A pointer to a local variable who will go out of scope once the function ends, leaving you with a stray pointer to a variable that doesn't exist anymore. That's why you get a compiler warning (usually not an actual error).
The second code copies the value. The local variable inside the function will never need to be referenced or used once the return statement finished.

You dont miss much. Only that in the first case there will be no compiler error.
[...] the memory of the variable phantom is no longer available after the execution of the code line int phantom = 4; so it cannot be returned
No, it can be returned and compilers may issue a warning for that but afaik no error. However, you should not!!
Btw, the memory is available, but it is undefined behaviour to access it after the function returned (not after the line int phantom = 4;).
In the second case:
I see the returning of the value as the dereferencing of an underlying pointer for the variable phantom
You are thinking too complicated here. Returning a value from a function may be implemented by using a pointer, but thats an implementation detail. The only thing you have to care about here is that what is returned is the value. So no problems in the second case.

I am not quite sure that I have understood why there is a problem when
we return the reference of a local random variable
Because the C++ standard says it is undefined behaviour if you use such a function, and you want to avoid undefined behaviour in your program. What your program does should be determined by the rules of the C++ language, not be random.
Note that you return a pointer and not a reference. But it's undefined behaviour in both cases.
Then the usual argument is that when the function is used, the memory
of the variable phantom is no longer available after the execution of
the code line int phantom = 4; so it cannot be returned (at least this
is what I have understood so far).
This is an implementation-centric point of view and may aid you in understanding the problem.
Nevertheless, it's important to distinguish between the observable behaviour of a program and the compiler's internal tricks to produce that behaviour. You don't even know whether any memory is occupied by a variable anyway. Considering the "as-if" rule and compiler optimisations, the entire function may have been removed, even if the behaviour was defined. That's just one example of what might really happen behind the scenes.
But again, it's undefined behaviour anyway, so anything may happen.
The question is, then, why does the C++ standard not define a behaviour for the case when you return a pointer like this and then try to access the pointee? The answer to that would be that it doesn't make sense. The object named by the local variable phantom ends its life when the function returns. So you would have a pointer to something which no longer exists, yet it's still an int*, and dereferencing a non-nullptr int* should yield an int. That's a contradiction, and the C++ standard just does not bother to resolve such a meaningless situation.
Note how this observation is based on C++ language rules, not on compiler-implementation issues.
Why in the first case there is a compilation error and in the second
case everything works??
It's certainly a warning and not an error, unless your compiler options are such that every warning is turned into an error. The compiler must not reject the code, because it's not ill-formed.
Still, your compiler tries to be helpful in the first case, because it wants to keep you from creating a program with undefined behaviour.
In the second case, the behaviour is not undefined. Returning by value means that a copy is made of the object which you want to return. The copy is made before the original is destroyed, and the caller then receives that copy. That's not meaningless and not a contradiction in any way, so it's safe and defined behaviour.
In the first case, returning by value does not help you, because although the pointer itself is safely copied, its contents are what eventually causes undefined behaviour.

The first case
int* myFunc()
{
int phantom = 4;
return &phantom; // you are returning the address of phantom
} // but phantom will not "exist" outside of myfunc
Doesn't work because variable phantom is a local variable and it lives only during the execution of myfunc. After that, it's gone.
You are returning an address of a variable, that will practically "not exist" any more.
The rule: never return pointers or references to local variables.
This is OK:
int myFunc()
{
int phantom = 4;
return phantom; // you are returning by value;
// it doesn't matter where phantom "lives"
}
int main()
{
int x = myFunc(); // the value returned by myFunc will be copied to x
}

Try returning the pointer and dereference it to get the value.
#include <iostream>
using namespace std;
int *myFunc()
{
int number = 4;
int *phantom = &number;
return phantom;
}
int main()
{
cout << myFunc() << endl; //0x....
cout << *myFunc() << endl; //4
return 0;
}

Related

Function returning reference to local variable [duplicate]

The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?

dereferencing a pointer when passing by reference

what happens when you dereference a pointer when passing by reference to a function?
Here is a simple example
int& returnSame( int &example ) { return example; }
int main()
{
int inum = 3;
int *pinum = & inum;
std::cout << "inum: " << returnSame(*pinum) << std::endl;
return 0;
}
Is there a temporary object produced?
Dereferencing the pointer doesn't create a copy; it creates an lvalue that refers to the pointer's target. This can be bound to the lvalue reference argument, and so the function receives a reference to the object that the pointer points to, and returns a reference to the same. This behaviour is well-defined, and no temporary object is involved.
If it took the argument by value, then that would create a local copy, and returning a reference to that would be bad, giving undefined behaviour if it were accessed.
The Answer To Your Question As Written
No, this behavior is defined. No constructors are called when pointer types are dereferenced or reference types used, unless explicitly specified by the programmer, as with the following snippet, in which the new operator calls the default constructor for the int type.
int* variable = new int;
As for what is really happening, as written, returnSame(*pinum) is the same variable as inum. If you feel like verifying this yourself, you could use the following snippet:
returnSame(*pinum) = 10;
std::cout << "inum: " << inum << std::endl;
Further Analysis
I'll start by correcting your provided code, which it doesn't look like you tried to compile before posting it. After edits, the one remaining error is on the first line:
int& returnSame( int &example ) { return example; } // semi instead of colon
Pointers and References
Pointers and references are treated in the same way by the compiler, they differ in their use, not so much their implementation. Pointer types and reference types store, as their value, the location of something else. Pointer dereferencing (using the * or -> operators) instructs the compiler to produce code to follow the pointer and perform the operation on the location it refers to rather than the value itself. No new data is allocated when you dereference a pointer (no constructors are called).
Using references works in much the same way, except the compiler automatically assumes that you want the value at the location rather than the location itself. As a matter of fact, it is impossible to refer to the location specified by a reference in the same way pointers allow you to: once assigned, a reference cannot be reseated (changed) (that is, without relying on undefined behavior), however you can still get its value by using the & operator on it. It's even possible to have a NULL reference, though handling of these is especially tricky and I don't recommend using them.
Snippet analysis
int *pinum = & inum;
Creates a pointer pointing to an existing variable, inum. The value of the pointer is the memory address that inum is stored in. Creating and using pointers will NOT call a constructor for a pointed-to object implicitly, EVER. This task is left to the programmer.
*pinum
Dereferencing a pointer effectively produces a regular variable. This variable may conceptually occupy the same space that another named variable uses, or it may not. in this case, *pinum and inum are the same variable. When I say "produces", it's important to note than no constructors are called. This is why you MUST initialize pointers before using them: Pointer dereferencing will NEVER allocate storage.
returnSame(*pinum)
This function takes a reference and returns the same reference. It's helpful to realize that this function could be written with pointers as well, and behave exactly the same way. References do not perform any initialization either, in that they do not call constructors. However, it is illegal to have an uninitialized reference, so running into uninitialized memory through them is not as common a mistake as with pointers. Your function could be rewritten to use pointers in the following way:
int* returnSamePointer( int *example ) { return example; }
In this case, you would not need to dereference the pointer before passing it, but you would need to dereference the function's return value before printing it:
std::cout << "inum: " << *(returnSamePointer(pinum)) << std::endl;
NULL References
Declaring a NULL reference is dangerous, since attempting to use it will automatically attempt to dereference it, which will cause a segmentation fault. You can, however, safely check if a reference is a null reference. Again, I highly recommend not using these ever.
int& nullRef = *((int *) NULL); // creates a reference to nothing
bool isRefNull = (&nullRef == NULL); // true
Summary
Pointer and Reference types are two different ways to accomplish the same thing
Most of the gotchas that apply to one apply to the other
Neither pointers nor references will call constructors or destructors for referenced values implicitly under any circumstances
Declaring a reference to a dereferenced pointer is perfectly legal, as long as the pointer is initialized properly
A compiler doesn't "call" anything. It just generates code. Dereferencing a pointer would at the most basic level correspond to some sort of load instruction, but in the present code the compiler can easily optimize this away and just print the value directly, or perhaps shortcut directly to loading inum.
Concerning your "temporary object": Dereferencing a pointer always gives an lvalue.
Perhaps there's a more interesting question hidden in your question, though: How does the compiler implement passing function arguments as references?

const casting an int in a class vs outside a class

I read on the wikipedia page for Null_pointer that Bjarne Stroustrup suggested defining NULL as
const int NULL = 0;
if "you feel you must define NULL." I instantly thought, hey.. wait a minute, what about const_cast?
After some experimenting, I found that
int main() {
const int MyNull = 0;
const int* ToNull = &MyNull;
int* myptr = const_cast<int*>(ToNull);
*myptr = 5;
printf("MyNull is %d\n", MyNull);
return 0;
}
would print "MyNull is 0", but if I make the const int belong to a class:
class test {
public:
test() : p(0) { }
const int p;
};
int main() {
test t;
const int* pptr = &(t.p);
int* myptr = const_cast<int*>(pptr);
*myptr = 5;
printf("t.p is %d\n", t.p);
return 0;
}
then it prints "t.p is 5"!
Why is there a difference between the two? Why is "*myptr = 5;" silently failing in my first example, and what action is it performing, if any?
First of all, you're invoking undefined behavior in both cases by trying to modify a constant variable.
In the first case the compiler sees that MyNull is declared as a constant and replaces all references to it within main() with a 0.
In the second case, since p is within a class the compiler is unable to determine that it can just replace all classInstance.p with 0, so you see the result of the modification.
Firstly, what happens in the first case is that the compiler most likely translates your
printf("MyNull is %d\n", MyNull);
into the immediate
printf("MyNull is %d\n", 0);
because it knows that const objects never change in a valid program. Your attempts to change a const object leads to undefined behavior, which is exactly what you observe. So, ignoring the undefined behavior for a second, from the practical point of view it is quite possible that your *myptr = 5 successfully modified your Null. It is just that your program doesn't really care what you have in your Null now. It knows that Null is zero and will always be zero and acts accordingly.
Secondly, in order to define NULL per recommendation you were referring to, you have to define it specifically as an Integral Constant Expression (ICE). Your first variant is indeed an ICE. You second variant is not. Class member access is not allowed in ICE, meaning that your second variant is significantly different from the first. The second variant does not produce a viable definition for NULL, and you will not be able to initialize pointers with your test::p even though it is declared as const int and set to zero
SomeType *ptr1 = Null; // OK
test t;
SomeType *ptr2 = t.p; // ERROR: cannot use an `int` value to initialize a pointer
As for the different output in the second case... undefined behavior is undefined behavior. It is unpredictable. From the practical point of view, your second context is more complicated, so the compiler was unable to prefrom the above optimization. i.e. you are indeed succeeded in breaking through the language-level restrictions and modifying a const-qualified variable. Language specification does not make it easy (or possible) for the compilers to optimize out const members of the class, so at the physical level that p is just another member of the class that resides in memory, in each object of that class. Your hack simply modifies that memory. It doesn't make it legal though. The behavior si still undefined.
This all, of course, is a rather pointless exercise. It looks like it all began from the "what about const_cast" question. So, what about it? const_cast has never been intended to be used for that purpose. You are not allowed to modify const objects. With const_cast, or without const_cast - doesn't matter.
Your code is modifying a variable declared constant so anything can happen. Discussing why a certain thing happens instead of another one is completely pointless unless you are discussing about unportable compiler internals issues... from a C++ point of view that code simply doesn't have any sense.
About const_cast one important thing to understand is that const cast is not for messing about variables declared constant but about references and pointers declared constant.
In C++ a const int * is often understood to be a "pointer to a constant integer" while this description is completely wrong. For the compiler it's instead something quite different: a "pointer that cannot be used for writing to an integer object".
This may apparently seem a minor difference but indeed is a huge one because
The "constness" is a property of the pointer, not of the pointed-to object.
Nothing is said about the fact that the pointed to object is constant or not.
The word "constant" has nothing to do with the meaning (this is why I think that using const it was a bad naming choice). const int * is not talking about constness of anything but only about "read only" or "read/write".
const_cast allows you to convert between pointers and references that can be used for writing and pointer or references that cannot because they are "read only". The pointed to object is never part of this process and the standard simply says that it's legal to take a const pointer and using it for writing after "casting away" const-ness but only if the pointed to object has not been declared constant.
Constness of a pointer and a reference never affects the machine code that will be generated by a compiler (another common misconception is that a compiler can produce better code if const references and pointers are used, but this is total bogus... for the optimizer a const reference and a const pointer are just a reference and a pointer).
Constness of pointers and references has been introduced to help programmers, not optmizers (btw I think that this alleged help for programmers is also quite questionable, but that's another story).
const_cast is a weapon that helps programmers fighting with broken const-ness declarations of pointers and references (e.g. in libraries) and with the broken very concept of constness of references and pointers (before mutable for example casting away constness was the only reasonable solution in many real life programs).
Misunderstanding of what is a const reference is also at the base of a very common C++ antipattern (used even in the standard library) that says that passing a const reference is a smart way to pass a value. See this answer for more details.

Why do compilers give a warning about returning a reference to a local stack variable if it is undefined behaviour?

The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?

Construct object with itself as reference?

I just realised that this program compiles and runs (gcc version 4.4.5 / Ubuntu):
#include <iostream>
using namespace std;
class Test
{
public:
// copyconstructor
Test(const Test& other);
};
Test::Test(const Test& other)
{
if (this == &other)
cout << "copying myself" << endl;
else
cout << "copying something else" << endl;
}
int main(int argv, char** argc)
{
Test a(a); // compiles, runs and prints "copying myself"
Test *b = new Test(*b); // compiles, runs and prints "copying something else"
}
I wonder why on earth this even compiles. I assume that (just as in Java) arguments are evaluated before the method / constructor is called, so I suspect that this case must be covered by some "special case" in the language specification?
Questions:
Could someone explain this (preferably by referring to the specification)?
What is the rationale for allowing this?
Is it standard C++ or is it gcc-specific?
EDIT 1: I just realised that I can even write int i = i;
EDIT 2: Even with -Wall and -pedantic the compiler doesn't complain about Test a(a);.
EDIT 3: If I add a method
Test method(Test& t)
{
cout << "in some" << endl;
return t;
}
I can even do Test a(method(a)); without any warnings.
The reason this "is allowed" is because the rules say an identifiers scope starts immediately after the identifier. In the case
int i = i;
the RHS i is "after" the LHS i so i is in scope. This is not always bad:
void *p = (void*)&p; // p contains its own address
because a variable can be addressed without its value being used. In the case of the OP's copy constructor no error can be given easily, since binding a reference to a variable does not require the variable to be initialised: it is equivalent to taking the address of a variable. A legitimate constructor could be:
struct List { List *next; List(List &n) { next = &n; } };
where you see the argument is merely addressed, its value isn't used. In this case a self-reference could actually make sense: the tail of a list is given by a self-reference. Indeed, if you change the type of "next" to a reference, there's little choice since you can't easily use NULL as you might for a pointer.
As usual, the question is backwards. The question is not why an initialisation of a variable can refer to itself, the question is why it can't refer forward. [In Felix, this is possible]. In particular, for types as opposed to variables, the lack of ability to forward reference is extremely broken, since it prevents recursive types being defined other than by using incomplete types, which is enough in C, but not in C++ due to the existence of templates.
I have no idea how this relates to the specification, but this is how I see it:
When you do Test a(a); it allocates space for a on the stack. Therefore the location of a in memory is known to the compiler at the start of main. When the constructor is called (the memory is of course allocated before that), the correct this pointer is passed to it because it's known.
When you do Test *b = new Test(*b);, you need to think of it as two steps. First the object is allocated and constructed, and then the pointer to it is assigned to b. The reason you get the message you get is that you're essentially passing in an uninitialized pointer to the constructor, and the comparing it with the actual this pointer of the object (which will eventually get assigned to b, but not before the constructor exits).
The second one where you use new is actually easier to understand; what you're invoking there is exactly the same as:
Test *b;
b = new Test(*b);
and you're actually performing an invalid dereference. Try to add a << &other << to your cout lines in the constructor, and make that
Test *b = (Test *)0xFOOD1E44BADD1E5;
to see that you're passing through whatever value a pointer on the stack has been given. If not explicitly initialized, that's undefined. But even if you don't initialize it with some sort of (in)sane default, it'll be different from the return value of new, as you found out.
For the first, think of it as an in-place new. Test a is a local variable not a pointer, it lives on the stack and therefore its memory location is always well defined - this is very much unlike a pointer, Test *b which, unless explicitly initialized to some valid location, will be dangling.
If you write your first instantiation like:
Test a(*(&a));
it becomes clearer what you're invoking there.
I don't know a way to make the compiler disallow (or even warn) about this sort of self-initialization-from-nowhere through the copy constructor.
The first case is (perhaps) covered by 3.8/6:
before the lifetime of an object has
started but after the storage which
the object will occupy has been
allocated or, after the lifetime of an
object has ended and before the
storage which the object occupied is
reused or released, any lvalue which
refers to the original object may be
used but only in limited ways. Such an
lvalue refers to allocated storage
(3.7.3.2), and using the properties of
the lvalue which do not depend on its
value is well-defined.
Since all you're using of a (and other, which is bound to a) before the start of its lifetime is the address, I think you're good: read the rest of that paragraph for the detailed rules.
Beware though that 8.3.2/4 says, "A reference shall be initialized to refer to a valid object or function." There is some question (as a defect report on the standard) what "valid" means in this context, so possibly you can't bind the parameter other to the unconstructed (and hence, "invalid"?) a.
So, I'm uncertain what the standard actually says here - I can use an lvalue, but not bind it to a reference, perhaps, in which case a isn't good, while passing a pointer to a would be OK as long as it's only used in the ways permitted by 3.8/5.
In the case of b, you're using the value before it's initialized (because you dereference it, and also because even if you got that far, &other would be the value of b). This clearly is not good.
As ever in C++, it compiles because it's not a breach of language constraints, and the standard doesn't explicitly require a diagnostic. Imagine the contortions the spec would have to go through in order to mandate a diagnostic when an object is invalidly used in its own initialization, and imagine the data flow analysis that a compiler might have to do to identify complex cases (it may not even be possible at compile time, if the pointer is smuggled through an externally-defined function). Easier to leave it as undefined behavior, unless anyone has any really good suggestions for new spec language ;-)
If you crank your warning levels up, your compiler will probably warn you about using uninitialized stuff. UB doesn't require a diagnostic, many things that are "obviously" wrong may compile.
I don't know the spec reference, but I do know that accessing an uninitialized pointer always results in undefined behaviour.
When I compile your code in Visual C++ I get:
test.cpp(20): warning C4700:
uninitialized local variable 'b' used