I've been curious about this for some time. Say we have the following cases for accessing a data member of a class stored in dynamically allocated memory:
class C {
public :
C() = default;
int a = 4;
}
int main () {
C * ptr = new C();
std::cout << "pointer->::" << ptr->a << std::endl;
std::cout << "dereference*().::" << (*ptr).a << std::endl;
}
I'm sure the pointer method is the preferred method, and my guess is that dereferencing the pointer provides a reference, at least in C++. But in C, where there are no references (and assuming appropriate modifications to convert the class to a struct etc), would dereferencing and accessing the member like this result in a temporary shallow copy? Is this something that the compiler optimizes out?
In both C and C++, *p (dereferencing) and p->m (member access through pointer) are "lvalue expressions". An lvalue "evaluates to the object identity". *p or p->m does not (by itself) create a copy, it refers to a variable that already exists.
Dereferencing a pointer does not create a copy neither for C neither for C++.
The result of a dereferencing operation is a lvalue that describe a pointed object.
But in any case the behaviour is the same for C and C++.
Related
I am struggling to understand the difference in behaviour of a raw pointer and a unique_ptr. I have class A with a variable x and class B with a pointer to an instance of A:
class A
{
public:
int x;
};
A::A(int y) : x(y)
{
}
class B
{
public:
B(A &);
A *p;
};
B::B(A &a)
{
p = &a;
}
This behaves as I would expect:
int main()
{
A a(2);
B b(a);
cout << a.x << " " << b.p->x << endl;
a.x = 4;
cout << a.x << " " << b.p->x << endl;
}
gives
2 2
4 4
Changing the raw pointer to a std::unique_ptr gives a different result:
class A
{
public:
int x;
};
A::A(int y) : x(y)
{
}
class B
{
public:
B(A &);
std::unique_ptr<A> p;
};
B::B(A &a)
{
p = std::make_unique<A>(a);
}
gives
2 2
4 2
Have I fundamentally misunderstood something about unique_ptrs?
make_unique creates a fresh object, one that that unique_pt has exclusive access to. So in the second example you have two objects, not one and when you set change the value of a.x in the first object it doesn't effect the other object held by the unique_ptr.
A unique pointer needs to own whatever it points to. Your code can be made to work - just substituting unique_ptr type and leaving everything else unchanged (no make_unique). But it will have undefined behavior, since you’ll create a unique pointer to an object that is owned elsewhere.
To compare apples to apples, the raw pointer code should read p=new A(a);. That’s what make_unique does.
Try reading the following expression from the smart pointer version:
std::make_unique<A>(a);
"make a unique A from a" (no mention of pointers!)
The result is a unique_ptr, but when reading the expression, read it as making (an object whose type is) the template parameter. The function parameters are parameters to the constructor. In this case, you are making an A object from an A object, which pulls in the copy constructor.
Once you understand that the smart pointer version is making a new A object (and your raw pointer version does not), your results should make sense.
The "unique" in "unique A" might be tricky to understand. Think of it as an object that no one else can lay claim to. It might be a copy of another object, but, taking the role of the unique_ptr, it is your copy, your responsibility to clean up after, and no one else's. Your preciousss, which you will not share (c.f. std::make_shared).
Note that a local variable (like the a in the main function) is the responsibility of the compiler, so it is ineligible to be the object to which a unique_ptr points (or any smart pointer, for that matter).
I have some difficulties with understanding what is really done behind returning values in C++.
Let's have following code:
class MyClass {
public:
int id;
MyClass(int id) {
this->id = id;
cout << "[" << id << "] MyClass::ctor\n";
}
MyClass(const MyClass& other) {
cout << "[" << id << "] MyClass::ctor&\n";
}
~MyClass() {
cout << "[" << id << "] MyClass::dtor\n";
}
MyClass& operator=(const MyClass& r) {
cout << "[" << id << "] MyClass::operator=\n";
return *this;
}
};
MyClass foo() {
MyClass c(111);
return c;
}
MyClass& bar() {
MyClass c(222);
return c;
}
MyClass* baz() {
MyClass* c = new MyClass(333);
return c;
}
I use gcc 4.7.3.
Case 1
When I call:
MyClass c1 = foo();
cout << c1.id << endl;
The output is:
[111] MyClass::ctor
111
[111] MyClass::dtor
My understanding is that in foo object is created on the stack and then destroyed upon return statement because it's end of a scope. Returning is done by object copying (copy constructor) which is later assigned to c1 in main (assignment operator). If I'm right why there is no output from copy constructor nor assignment operator? Is this because of RVO?
Case 2
When I call:
MyClass c2 = bar();
cout << c2.id << endl;
The output is:
[222] MyClass::ctor
[222] MyClass::dtor
[4197488] MyClass::ctor&
4197488
[4197488] MyClass::dtor
What is going on here? I create variable then return it and variable is destroyed because it is end of a scope. Compiler is trying copy that variable by copy constructor but It is already destroyed and that's why I have random value? So what is actually in c2 in main?
Case 3
When I call:
MyClass* c3 = baz();
cout << c3->id << endl;
The output is:
[333] MyClass::ctor
333
This is the simplest case? I return a dynamically created pointer which lies on heap, so memmory is allocated and not automatically freed. This is the case when destructor isn't called and I have memory leak. Am I right?
Are there any other cases or things that aren't obvious and I should know to fully master returning values in C++? ;) What is a recommended way to return a object from function (if any) - any rules of thumb upon that?
May I just add that case #2 is one of the cases of undefined behavior in the C++ language, since returning a reference to a local variable is illegal. This is because a local variable has a precisely defined lifetime, and - by returning it by a reference - you're returning a reference to a variable that does not exist anymore when the function returns. Therefore, you exhibit undefined behavior and the value of the given variable is practically random. As is the result of the rest of your program, since Anything at all can happen.
Most compilers will issue a warning when you try to do something like this (either return a local variable by reference, or by address) - gcc, for example, tells me something like this :
bla.cpp:37:13: warning: reference to local variable ‘c’ returned [-Wreturn-local-addr]
You should remember, however, that the compiler is not at all required to issue any kind of warning when a statement that may exhibit undefined behavior occurs. Situations such as this one, though, must be avoided at all costs, because they're practically never right.
Case 1:
MyClass foo() {
MyClass c(111);
return c;
}
...
MyClass c1 = foo();
is a typical case when RVO can be applied. This is called copy-initialization and the assignment operator is not used since the object is created in place, unlike the situation:
MyClass c1;
c1 = foo();
where c1 is constructed, temporary c in foo() is constructed, [ copy of c is constructed ], c or copy of c is assigned to c1, [ copy of c is destructed] and c is destructed. (what exactly happens depends on whether the compiler eliminates the redundant copy of c being created or not).
Case 2:
MyClass& bar() {
MyClass c(222);
return c;
}
...
MyClass c2 = bar();
invokes undefined behavior since you are returning a reference to local (temporary) variable c ~ an object with automatic storage duration.
Case 3:
MyClass* baz() {
MyClass* c = new MyClass(333);
return c;
}
...
MyClass c2 = bar();
is the most straightforward one since you control what happens yet with a very unpleasant consequence: you are responsible for memory management, which is the reason why you should avoid dynamic allocation of this kind always when it is possible (and prefer Case 1).
1) Yes.
2) You have a random value because your copy c'tor and operator= don't copy the value of id. However, you are correct in assuming there is no relying on the value of an object after it has been deleted.
3) Yes.
Recently, I found an interesting discussion on how to allow read-only access to private members without obfuscating the design with multiple getters, and one of the suggestions was to do it this way:
#include <iostream>
class A {
public:
A() : _ro_val(_val) {}
void doSomething(int some_val) {
_val = 10*some_val;
}
const int& _ro_val;
private:
int _val;
};
int main() {
A a_instance;
std::cout << a_instance._ro_val << std::endl;
a_instance.doSomething(13);
std::cout << a_instance._ro_val << std::endl;
}
Output:
$ ./a.out
0
130
GotW#66 clearly states that object's lifetime starts
when its constructor completes successfully and returns normally. That is, control reaches the end of the constructor body or an earlier return statement.
If so, we have no guarantee that the _val memeber will have been properly created by the time we execute _ro_val(_val). So how come the above code works? Is it undefined behaviour? Or are primitive types granted some exception to the object's lifetime?
Can anyone point me to some reference which would explain those things?
Before the constructor is called an appropriate amount of memory is reserved for the object on Freestore(if you use new) or on stack if you create object on local storage. This implies that the memory for _val is already allocated by the time you refer it in Member initializer list, Only that this memory is not properly initialized as of yet.
_ro_val(_val)
Makes the reference member _ro_val refer to the memory allocated for _val, which might actually contain anything at this point of time.
There is still an Undefined Behavior in your program because, You should explicitly initialize _val to 0(or some value,you choose)in the constructor body/Member Initializer List.The output 0 in this case is just because you are lucky it might give you some other values since _val is left unInitialized. See the behavior here on gcc 4.3.4 which demonstrates the UB.
But as for the Question, Yes indeed the behavior is Well-Defined.
The object's address does not change.
I.e. it's well-defined.
However, the technique shown is just premature optimization. You don't save programmers' time. And with modern compiler you don't save execution time or machine code size. But you do make the objects un-assignable.
Cheers & hth.,
In my opinion, it is legal (well-defined) to initialize a reference with an uninitialized object. That is legal but standard (well, the latest C++11 draft, paragraph 8.5.3.3) recommends using a valid (fully constructed) object as an initializer:
A reference shall be initialized to refer to a valid object or function.
The next sentence from the same paragraph throws a bit more light at the reference creation:
[Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.]
I understand that reference creation means binding reference to an object obtained by dereferencing its pointer and that probably explains that the minimal prerequisite for initialization of reference of type T& is having an address of the portion of the memory reserved for the object of type T (reserved, but not yet initialized).
Accessing uninitialized object through its reference can be dangerous.
I wrote a simple test application that demonstrates reference initialization with uninitialized object and consequences of accessing that object through it:
class C
{
public:
int _n;
C() : _n(123)
{
std::cout << "C::C(): _n = " << _n << " ...and blowing up now!" << std::endl;
throw 1;
}
};
class B
{
public:
// pC1- address of the reference is the address of the object it refers
// pC2- address of the object
B(const C* pC1, const C* pC2)
{
std::cout << "B::B(): &_ro_c = " << pC1 << "\n\t&_c = " << pC2 << "\n\t&_ro_c->_n = " << pC1->_n << "\n\t&_c->_n = " << pC2->_n << std::endl;
}
};
class A
{
const C& _ro_c;
B _b;
C _c;
public:
// Initializer list: members are initialized in the order how they are
// declared in class
//
// Initializes reference to _c
//
// Fully constructs object _b; its c-tor accesses uninitialized object
// _c through its reference and its pointer (valid but dangerous!)
//
// construction of _c fails!
A() : _ro_c(_c), _b(&_ro_c, &_c), _c()
{
// never executed
std::cout << "A::A()" << std::endl;
}
};
int main()
{
try
{
A a;
}
catch(...)
{
std::cout << "Failed to create object of type A" << std::endl;
}
return 0;
}
Output:
B::B(): &_ro_c = 001EFD70
&_c = 001EFD70
&_ro_c->_n = -858993460
&_c->_n = -858993460
C::C(): _n = 123 ...and blowing up now!
Failed to create object of type A
My professor in C++ has shown us this as an example in overloading the operator new (which i believe is wrong):
class test {
// code
int *a;
int n;
public:
void* operator new(size_t);
};
void* test::operator new(size_t size) {
test *p;
p=(test*)malloc(size);
cout << "Input the size of array = ?";
cin >> p->n;
p->a = new int[p->n];
return p;
}
Is this right?
It's definitely "not right", in the sense that it's giving me the creeps.
Since test has no user-declared constructors, I think it could work provided that the instance of test isn't value-initialized (which would clear the pointer). And provided that you write the corresponding operator delete.
It's clearly a silly example, though - user interaction inside an overloaded operator new? And what if an instance of test is created on the stack? Or copied? Or created with test *tp = new test(); in C++03? Or placement new? Hardly user-friendly.
It's constructors which must be used to establish class invariants (such as "I have an array to use"), because that's the only way to cover all those cases. So allocating an array like that is the kind of thing that should be done in a constructor, not in operator new. Or better yet, use a vector instead.
As far as the standard is concerned - I think that since the class is non-POD the implementation is allowed to scribble all over the data in between calling operator new and returning it to the user, so this is not guaranteed to work even when used carefully. I'm not entirely sure, though. Conceivably your professor has run it (perhaps many years ago when he first wrote the course), and if so it worked on his machine. There's no obvious reason why an implementation would want to do anything to the memory in the specific case of this class.
I believe that is "wrong" because he
access the object before the
constructor.
I think you're correct on this point too - casting the pointer returned from malloc to test* and accessing members is UB, since the class test is non-POD (because it has private non-static data members) and the memory does not contain a constructed instance of the class. Again, though, there's no reason I can immediately think of why an implementation would want to do anything that stops it working, so I'm not surprised if in practice it stores the intended value in the intended location on my machine.
Did some Standard checking. Since test has private non-static members, it is not POD. So new test default-initializes the object, and new test() value-initializes it. As others have pointed out, value-initialization sets members to zero, which could come as a surprise here.
Default-initialization uses the implicitly defined default constructor, which omits initializers for members a and n.
12.6.2p4: After the call to a constructor for class X has completed, if a member of X is neither specified in the constructor's mem-initializers, nor default-initialized, nor value-initialized, nor given a value during execution of the body of the constructor, the member has indeterminate value.
Not "the value its memory had before the constructor, which is usually indeterminate." The Standard directly says the members have indeterminate value if the constructor doesn't do anything about them.
So given test* p = new test;, p->a and p->n have indeterminate value and any rvalue use of them results in Undefined Behavior.
The creation/destruction of objects in C++ is divided into two tasks: memory allocation/deallocation and object initialization/deinitialization. Memory allocation/deallocation is done very differently depending on an object's storage class (automatic, static, dynamic), object initialization/deinitialization is done using the object's type's constructor/destructor.
You can customize object initialization/deinitialization by providing your own constructors/destructor. You can customize the allocation of dynamically allocated objects by overloading operator new and operator delete for this type. You can provide different versions of these operators for single objects and arrays (plus any number of additional overloads).
When you want to fine-tune the construction/destruction of objects of a specific type you first need to decide whether you want to fiddle with allocation/deallocation (of dynamically allocated objects) or with initialization/deinitialization. Your code mixes the two, violating one of C++' most fundamental design principle, all established praxis, every known C++ coding standard on this planet, and your fellow-workers' assumptions.
Your professor is completely misunderstanding the purpose of operator new whose only task is to allocate as much memory as was asked and to return a void* to it.
After that the constructor is called to initialize the object at that memory location. This is not up to the programmer to avoid.
As the class doesn't have a user-defined constructor, the fields are supposed to be uninitialized, and in such a case the compiler has probably freedom to initialize them to some magic value in order to help finding use of uninitialized values (e.g for debug builds). That would defeat the extra work done by the overloaded operator.
Another case where the extra work will be wasted is when using value-initialization: new test();
This is very bad code because it takes initialization code that should be part of a constructor and puts it in operator new which should only allocate new memory.
The expression new test may leak memory (that allocated by p->a = new int[p->n];) and the expression new test() definitely will leak memory. There is nothing in the standard that prevents the implementation zeroing, or setting to an alternate value, the memory returned by a custom operator new before that memory is initialized with an object even if the subsequent initialization wouldn't ordinarily touch the memory again. If the test object is value-initialized the leak is guaranteed.
There is also no easy way to correctly deallocate a test allocated with new test. There is no matching operator delete so the expression delete t; will do the wrong thing global operator delete to be called on memory allocated with malloc.
This does not work.
Your professor code will fail to initialize correctly in 3/4 of cases.
It does not initialize objects correctly (new only affects pointers).
The default constructor generated for tests has two modes.
Zero Initialization (which happens after new, but POD are set to zero)
Default Initialization (POD are uninitialized)
Running Code (comments added by hand)
$ ./a.exe
Using Test::new
Using Test::new
A Count( 0) // zero initialized: pointer leaked.
A Pointer(0)
B Count( 10) // Works as expected because of default init.
B Pointer(0xd20388)
C Count( 1628884611) // Uninitialized as new not used.
C Pointer(0x611f0108)
D Count( 0) // Zero initialized because it is global (static storage duration)
D Pointer(0)
The Code
#include <new>
#include <iostream>
#include <stdlib.h>
class test
{
// code
int *a;
int n;
public:
void* operator new(size_t);
// Added dredded getter so we can print the values. (Quick Hack).
int* getA() const { return a;}
int getN() const { return n;}
};
void* test::operator new(size_t size)
{
std::cout << "Using Test::new\n";
test *p;
p=(test*)malloc(size);
p->n = 10; // Fixed size for simple test.
p->a = new int[p->n];
return p;
}
// Objects that have static storage duration are zero initialized.
// So here 'a' and 'n' will be set to 0
test d;
int main()
{
// Here a is zero initialized. Resulting in a and n being reset to 0
// Thus you have memory leaks as the reset happens after new has completed.
test* a = new test();
// Here b is default initialized.
// So the POD values are undefined (so the results are what you prof expects).
// But the standard does not gurantee this (though it will usually work because
// of the it should work as a side effect of the 'zero cost principle`)
test* b = new test;
// Here is a normal object.
// New is not called so its members are random.
test c;
// Print out values
std::cout << "A Count( " << a->getN() << ")\n";
std::cout << "A Pointer(" << a->getA() << ")\n";
std::cout << "B Count( " << b->getN() << ")\n";
std::cout << "B Pointer(" << b->getA() << ")\n";
std::cout << "C Count( " << c.getN() << ")\n";
std::cout << "C Pointer(" << c.getA() << ")\n";
std::cout << "D Count( " << d.getN() << ")\n";
std::cout << "D Pointer(" << d.getA() << ")\n";
}
A valid example of what the professor failed to do:
class test
{
// code
int n;
int a[1]; // Notice the zero sized array.
// The new will allocate enough memory for n locations.
public:
void* operator new(size_t);
// Added dredded getter so we can print the values. (Quick Hack).
int* getA() const { return a;}
int getN() const { return n;}
};
void* test::operator new(size_t size)
{
std::cout << "Using Test::new\n";
int tmp;
std::cout << How big?\n";
std::cin >> tmp;
// This is a half arsed trick from the C days.
// It should probably still work.
// Note: This may be what the professor should have wrote (if he was using C)
// This is totally horrible and whould not be used.
// std::vector is a much:much:much better solution.
// If anybody tries to convince you that an array is faster than a vector
// The please read the linked question below where that myth is nailed into
// its over sized coffin.
test *p =(test*)malloc(size + sizeof(int) * tmp);
p->n = tmp;
// p->a = You can now overflow a upto n places.
return p;
}
Is std::vector so much slower than plain arrays?
As you show this is wrong. You can also see how easy it is to get this wrong.
There usually isn't any reason for it unless you are trying to manage your own memory allocations and in a C++ environment you would be better off learning the STL and write custom allocators.
boost::shared_ptr has an unusual constructor
template<class Y> shared_ptr(shared_ptr<Y> const & r, T * p);
and I am a little puzzled as to what this would be useful for. Basically it shares ownership with r, but .get() will return p. not r.get()!
This means you can do something like this:
int main() {
boost::shared_ptr<int> x(new int);
boost::shared_ptr<int> y(x, new int);
std::cout << x.get() << std::endl;
std::cout << y.get() << std::endl;
std::cout << x.use_count() << std::endl;
std::cout << y.use_count() << std::endl;
}
And you will get this:
0x8c66008
0x8c66030
2
2
Note that the pointers are separate, but they both claim to have a use_count of 2 (since they share ownership of the same object).
So, the int owned by x will exist as long as x or y is around. And if I understand the docs correct, the second int never gets destructed. I've confirmed this with the following test program:
struct T {
T() { std::cout << "T()" << std::endl; }
~T() { std::cout << "~T()" << std::endl; }
};
int main() {
boost::shared_ptr<T> x(new T);
boost::shared_ptr<T> y(x, new T);
std::cout << x.get() << std::endl;
std::cout << y.get() << std::endl;
std::cout << x.use_count() << std::endl;
std::cout << y.use_count() << std::endl;
}
This outputs (as expected):
T()
T()
0x96c2008
0x96c2030
2
2
~T()
So... what is the usefulness of this unusual construct which shares ownership of one pointer, but acts like another pointer (which it does not own) when used.
It is useful when you want to share a class member and an instance of the class is already a shared_ptr, like the following:
struct A
{
int *B; // managed inside A
};
shared_ptr<A> a( new A );
shared_ptr<int> b( a, a->B );
they share the use count and stuff. It is optimization for memory usage.
To expand on leiz's and piotr's answers, this description of shared_ptr<> 'aliasing' is from a WG21 paper, "Improving shared_ptr for C++0x, Revision 2":
III. Aliasing Support
Advanced users often require the
ability to create a shared_ptr
instance p that shares ownership with
another (master) shared_ptr q but
points to an object that is not a base
of *q. *p may be a member or an
element of *q, for example. This
section proposes an additional
constructor that can be used for this
purpose.
An interesting side effect of this
increase of expressive power is that
now the *_pointer_cast functions can
be implemented in user code. The
make_shared factory function presented
later in this document can also be
implemented using only the public
interface of shared_ptr via the
aliasing constructor.
Impact:
This feature extends the interface of
shared_ptr in a backward-compatible
way that increases its expressive
power and is therefore strongly
recommended to be added to the C++0x
standard. It introduces no source- and
binary compatibility issues.
Proposed text:
Add to shared_ptr
[util.smartptr.shared] the following
constructor:
template<class Y> shared_ptr( shared_ptr<Y> const & r, T * p );
Add the following to
[util.smartptr.shared.const]:
template<class Y> shared_ptr( shared_ptr<Y> const & r, T * p );
Effects: Constructs a shared_ptr instance that stores p and shares ownership with r.
Postconditions: get() == p && use_count() == r.use_count().
Throws: nothing.
[Note: To avoid the possibility of a dangling pointer, the user
of this constructor must ensure that p remains valid at least
until the ownership group of r is destroyed. --end note.]
[Note: This constructor allows creation of an empty shared_ptr
instance with a non-NULL stored pointer. --end note.]
You can also use this to keep dynamic casted pointers, i.e.:
class A {};
class B: public A {};
shared_ptr<A> a(new B);
shared_ptr<B> b(a, dynamic_cast<B*>(a.get()));
You might have a pointer to some driver or a lower level api's data structure that may allocate additional data by its lower level api or other means. In this case it might be interesting to increase the use_count but return the additional data if the first pointer owns the other data pointers.
I have put shared_ptr's aliasing constructor in use in my little library:
http://code.google.com/p/infectorpp/ (just my simple IoC container)
The point is that since I needed a shared_ptr of known type to be returned from a polymorphic class (that does not know the type). I was not able to implicitly convert the shared_ptr to the type I needed.
In the file "InfectorHelpers.hpp" (line 72-99) you can see that in action for the type IAnyShared.
Aliasing constructor creates shared_ptr that does not delete the pointers they are actually pointing to, but they still increase the reference counter to the original object and that can be tremendously usefull.
Basically you can create a pointer to anything using aliasing constructor and threat it as a reference counter.
//my class
std::shared_ptr<T> ist;
int a; //dummy variable. I need its adress
virtual std::shared_ptr<int> getReferenceCounter(){
return std::shared_ptr<int>(ist,&a); //not intended for dereferencing
}
virtual void* getPtr(); //return raw pointer to T
now we have both "a reference counter" and a pointer to a istance of T, enough data to create something with the aliasing constructor
std::shared_ptr<T> aPtr( any->getReferenceCounter(), //share same ref counter
static_cast<T*>(any->getPtr()) ); //potentially unsafe cast!
I don't pretend to have invented this use for the aliasing constructor, but I never seen someone else doing the same. If you are guessing if that dirty code works the answer is yes.
For "shared_ptr<B> b(a, dynamic_cast<B*>(a.get()));"
I think it is not the recommended way using smart pointer.
The recommended way of doing this type conversion should be:
shared_ptr<B> b(a);
Since in Boost document it is mentioned that:
shared_ptr<T> can be implicitly
converted to shared_ptr<U> whenever T*
can be implicitly converted to U*. In
particular, shared_ptr<T> is
implicitly convertible to shared_ptr<T> const,
to shared_ptr<U> where U is an
accessible base of T, and to
shared_ptr<void>.
In addition to that, we also have dynamic_pointer_cast
which could directly do conversion on Smart Pointer object and both of these two methods would be much safer than the manually casting raw pointer way.