Can I set a member variable before constructor call? - c++

I started to implement an ID based memory pool, where every element has an id, which is basically an index in a vector. In this special case I know the index before I construct the object itself so I thought I set the ID before I call the constructor.
Some details
Allocating an object from an ID based pool is the following:
allocate a free id from the pool
get a memory address based on the id value
construct the object on the memory address
set the ID member of the object
and the deallocation is based on that id
here is the code (thanks jrok):
#include <new>
#include <iostream>
struct X
{
X()
{
// id come from "nothing"
std::cout << "X constructed with id: " << id << std::endl;
}
int id;
};
int main()
{
void* buf = operator new(sizeof(X));
// can I set the ID before the constructor call
((X*)buf)->id = 42;
new (buf) X;
std::cout << ((X*)buf)->id;
}
EDIT
I found a stock solution for this in boost sandbox:
sandbox Boost.Tokenmap

Can I set a member variable before constructor call?
No, but you can make a base class with ID that sets ID within its constructor (and throws exception if ID can't be allocated, for example). Derive from that class, and at the moment derived class enter constructor, ID will be already set. You could also manage id generation within another class - either within some kind of global singleton, or you could pass id manager as a first parameter to constructor.
typedef int Id;
class IdObject{
public:
Id getId() const{
return id;
}
protected:
IdManager* getIdManager() ...
IdObject()
:id(0){
IdManager* manager = getIdManager();
id = manager->generateId();
if (!id)
throw IdException;
manager->registerId(id, this);
}
~IdObject(){
if (id)
getIdManager()->unregisterId(id, this);
}
private:
Id id;
IdObject& operator=(IdObject &other){
}
IdObject(IdObject &other)
:id(0){
}
};
class DerivedObject: public IdObject{
public:
DerivedObject(){
//at this point, id is set.
}
};
This kind of thing.

Yes, you can do what you're doing, but it's really not a good idea. According to the standard, your code invokes Undefined Behaviour:
3.8 Object lifetime [basic.life]
The lifetime of an object is a runtime property of the object. An object is said to have non-trivial initialization
if it is of a class or aggregate type and it or one of its members is initialized by a constructor other than a trivial
default constructor. [ Note: initialization by a trivial copy/move constructor is non-trivial initialization. —
end note ] The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained, and
— if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
— if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
— the storage which the object occupies is reused or released.
Before the lifetime of an object has started but after the storage which the object will occupy has been
allocated or, after the lifetime of an object has ended and before the storage which the object occupied is
reused or released, any pointer that refers to the storage location where the object will be or was located
may be used but only in limited ways. For an object under construction or destruction, see 12.7. Otherwise,
such a pointer refers to allocated storage (3.7.4.2), and using the pointer as if the pointer were of type void*,
is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited
ways, as described below. The program has undefined behavior if:
— the pointer is used to access a non-static data member or call a non-static member function of the
object
When your code invokes Undefined Behaviour, the implementation is allowed to do anything it wants to. In most cases nothing will happen - and if you're lucky your compiler will warn you - but occasionally the result will be unexpectedly catastrophic.
You describe a pool of N objects of the same type, using a contiguous array as the underlying storage. Note that in this scenario you do not need to store an integer ID for each allocated object - if you have a pointer to the allocated object, you can derive the ID from the offset of the object within the array like so:
struct Object
{
};
const int COUNT = 5; // allow enough storage for COUNT objects
char storage[sizeof(Object) * COUNT];
// interpret the storage as an array of Object
Object* pool = static_cast<Object*>(static_cast<void*>(storage));
Object* p = pool + 3; // get a pointer to the third slot in the pool
int id = p - pool; // find the ID '3' for the third slot

No, you cannot set anything in an object before its constructor is called. However, you have a couple of choices:
Pass the ID to the constructor itself, so it can store the ID in the object.
Allocate extra memory in front of the object being constructed, store the ID in that extra memory, then have the object access that memory when needed.

If you know the object's to-be address, which is the case for your scenario, then yes you can do that kind of thing. However, it is not well-defined behaviour, so it's most probably not a good idea (and in every case not good design). Although it will probably "work fine".
Using a std::map as suggested in a comment above is cleaner and has no "ifs" and "whens" of UB attached.
Despite writing to a known memory address will probably be "working fine", an object doesn't exist before the constructor is run, so using any of its members is bad mojo.
Anything is possible. No compiler will likely do any such thing, but the compiler might for example memset the object's storage with zero before running the constructor, so even if you don't set your ID field, it's still overwritten. You have no way of knowing, since what you're doing is undefined.

Is there a reason you want to do this before the constructor call?
Allocating an object from an ID based pool is the following:
1) allocate a free id from the pool
2) get a memory address based on the id value
3) construct the object on the memory address
4) set the ID member of the object and the deallocation is based on that id
According to your steps, you are setting the ID after the constructor.
so I thought I set the ID before I call the constructor.
I hate to be blunt, but you need to have a better reason than that to wade into the undefined behaviour territory. Remember, as programmers, there is a lot we're learning all the time and unless there is absolutely no way around it, we need to stay away from minefields, undefined behavior being one of them.
As other people have pointed out, yes you can do it, but that's like saying you can do rm -rf / as root. Doesn't mean you should :)
C makes it easy to shoot yourself in the foot. C++ makes it harder, but when you do, you blow away your whole leg! — Bjarne Stroustrup

Related

Is accessing memory after a destructor call undefined behavior?

I'm wondering if the following is undefined?
int main()
{
struct Doggy { int a; ~Doggy() {} };
Doggy* p = new Doggy[100];
p[50].~Doggy();
p[50].a = 3; // Is this not allowed? The destructor was called on an
// object occupying that area of memory.
// Can I access it safely?
if (p[50].a == 3);
}
I guess this is generally good to know, but the reason I'm specifically wanting to know is that I have a data structure consisting of an array, where the buckets can be nullable by setting a value, kind of like buckets in a hash table array. And when the bucket is emptied the destructor is called, but then checking and setting the null state after the destructor is called I'm wondering if it's illegal.
To elaborate a little, say I have an array of objects and each object can be made to represent null in each bucket, such as:
struct Handle
{
int value = 0; // Zero is null value
~Handle(){}
};
int main()
{
Handle* p = new Handle[100];
// Remove object 50
p[50].~Handle();
p[50].value = 0; // Set to null
if (p[50].value == 0) ; // Then it's null, can I count on this?
// Is this defined? I'm accessing memory that was occupied by
// object that was destroyed.
}
Yes it'll be UB:
[class.dtor/19]
Once a destructor is invoked for an object, the object's lifetime ends; the behavior is undefined if the destructor is invoked for an object whose lifetime has ended ([basic.life]).
[Example 2: If the destructor for an object with automatic storage duration is explicitly invoked, and the block is subsequently left in a manner that would ordinarily invoke implicit destruction of the object, the behavior is undefined. — end example]
p[50].~Handle(); and later delete[] p; will make it call the destructor for an object whose lifetime has ended.
For p[50].value = 0; after the lifetime of the object has ended, this applies:
[basic.life/6]
Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a pointer refers to allocated storage ([basic.stc.dynamic.allocation]), and using the pointer as if the pointer were of type void* is well-defined. Indirection through such a pointer is permitted but the resulting lvalue may only be used in limited ways, as described below. The program has undefined behavior if:
6.2 - the pointer is used to access a non-static data member or call a non-static member function of the object
Yes, it's mostly. Handle::value is just an offset to a pointer of type Handle, so it's just going to work wherever you point it to, even if the containing object isn't currently constructed. If you were to use anything with virtual keyword, this would end up broken though.
p[50].~Handle(); this however is a different beast. You should never invoke destructors manually unless you have also explicitly invoked the constructor with placement new. Still not illegal, but dangerous.
delete[] p; (omitted in your example!) is where you end up with double-destruction, at which point you are well beyond UB, straight up in the "it's broken" domain.

C++ move semantics and built-in types

I am trying to understand move semantics using the following code.
#include <iostream>
#include <utility>
using namespace std;
class A
{
public:
int a;
A():a{1}{}
A(A&& rref):a{rref.a}{cout<<"move constructor called"<<endl;}
A(const A& ref):a{ref.a}{cout<<"copy constructor"<<endl;}
};
int main(){
A original; // original object
cout<<"original.a = "<<original.a<< "| address original.a="<< &(original.a)<<endl;
A movedto (std::move(original)); // calls A(A&&)
cout<<"original.a = "<<original.a<< "| address original.a"<< &(original.a)<<endl;
cout<<"movedto.a = "<<movedto.a<<"| address movedto.a"<< &(movedto.a)<<endl;
return 0;
}
Which give the following output
original.a = 1| address original.a=0x7fff1611b6d0
move constructor called
original.a = 1| address original.a0x7fff1611b6d0
movedto.a = 1| address movedto.a0x7fff1611b6e0
As it can be seen, the address of original.a and movedto.a are different, hence the member a underwent a copy operation in A(A&& rref):a{rref.a}.
I am aware that move degrades to copy for built-in types. My question is that what do I do if I want to hijack (not copy) an instance of the class such as this one. Assume that I have 100 members (instead of just one) of built-in type, making copy expensive.
One obvious way would be to store the object on the heap and use reference semantics to pass it around. But would like to stay with value semantics and still be able to circumvent the copying.
It is not possible and/or doesn't make sense.
We start with:
A original; // original object
And the main part of the question is:
What do I do if I want to hijack (not copy) an instance of the class
such as this one?
So that would mean that we end up with:
A movedto; // new object that has all of original's members
But the caveat here is that we want "to circumvent the copying" and we don't want to use references or pointers, i.e "reference semantics", only the "stack" or "value semantics".
If we want movedto to have the same members at the same memory locations that were already allocated then we can just create a reference to original:
A& movedto{original}; // references members at the same memory locations.
But part of this question states that we are not using references because presumably we want this object to have a different lifetime. So if we want to keep original's members "alive" and allocated beyond the end of the current block then we immediately find that we are not in control of that underlying memory.
In this question original is an object with automatic storage durtion. Objects with automatic storage duration have their lifetimes managed automatically according to their scope. The compiler may have used a stack to store it and the compiler may use a stack pointer that gets moved downwards each time an object is added but the C++ standard doesn't specify how it should be done. We do know that the standard specifies that the objects with automatic storage duration will be destroyed in the reverse order they were created when the scope ends.
So trying to control where an object with automatic storage durtaion is created does not make sense and assigning the members of such an object to another doesn't make sense either. The memory is allocated automatically.
If we want to reuse the variables that were already allocated as part of an object with automatic storage duration (stack/value semantics) then we're using memory that will be deallocated when that object's lifetime ends. We must use dynamic storage for that (i.e. the "heap", or "reference semantics").

Swapping storage buffers containing placement new created objects

I recently saw a piece of code which used storage buffers to create objects and then simply swapped the buffers in order to avoid the copying overhead. Here is a simple example using integers:
std::aligned_storage_t<sizeof(int), alignof(int)> storage1;
std::aligned_storage_t<sizeof(int), alignof(int)> storage2;
new (&storage1) int(1);
new (&storage2) int(2);
std::swap(storage1, storage2);
int i1 = reinterpret_cast<int&>(storage1);
int i2 = reinterpret_cast<int&>(storage2);
//this prints 2 1
std::cout << i1 << " " << i2 << std::endl;
This feels like undefined behaviour in the general case (specifically swapping the buffers and then accessing the objects as if they were still there) but I am not sure what the standard says about such usage of storage and placement new. Any feedback is much appreciated!
I suspect there are a few factors rendering this undefined, but we only need one:
[C++11: 3.8/1]: [..] The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
the storage which the object occupies is reused or released.
All subsequent use is use after end-of-life, which is bad and wrong.
The key is that each buffer is being reused.
So, although I would expect this to work in practice at least for trivial types (and for some classes), it's undefined.
The following may have been able to save you:
[C++11: 3.8/7]: If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object [..]
…except that you are not creating a new object.
It may or may not be worth noting here that, surprisingly, the ensuing implicit destructor calls are both well-defined:
[C++11: 3.8/8]: If a program ends the lifetime of an object of type T with static (3.7.1), thread (3.7.2), or automatic (3.7.3) storage duration and if T has a non-trivial destructor, the program must ensure that an object of the original type occupies that same storage location when the implicit destructor call takes place; otherwise the behavior of the program is undefined.

Returning reference to data memeber in cpp

What I know, returning a reference to a local variable is the same as returning pointer to local variable and this causes memory leak in C++.
But does this apply to data members?
The code:
class MyClass
{
public:
std::string& getId();
private:
std::string id;
};
MyClass std::string& getId()
{
return id;
}
int main()
{
MyClass* c = new MyClass;
std::string brokenRef = c->getId();
// or may be std::string& brokenRef = c->getId();
delete c;
cout << brokenRef << endl; // <<< this should be a ref to unknown location, correct?
}
Thanks.
Yes, it applies. Even though your MyClass instance is not strictly local to main, but dynamically allocated and deallocated before the reference. Has the same effect, though.
The code as it stands is correct, because you copy the string while it is valid. The commented-out reference version is truly a broken reference.
In line
std::string brokenRef = c->getId();
You create a new instance of string and intialize it with a string referenced by reference returned by getId(). From this point on brokenRef lives completely independent life from MyClass object. Therefore brokenRef happily outlived MyClass object you destroyed.
You could have achieved desired affect by assigning reference to a reference variable:
std::string& brokenRef = c->getId();
In addition to this, I think you mixed terms memory leak and dangling pointer (dangling reference). Returning a pointer or a reference of a member does not cause memory leaks. But using them (dereferencing) after object is destroyed (so memory where members used to be stored is freed and they are becoming dangling) causes undefined behaviour and very likely crash.
It's okay in your example, because you assign to string, if you turn it into string& then it will be invalid as soon as you delete c.
(and it doesn't exactly cause memory leak).
as long as your MyClass is not deleted the reference is valid.
or if you would have declared it as a stack variable then as long as its in scope the members in the class instance are valid.
You will get some value but it will be some random value that as the object that contains it has been destroyed.
ideally I would delete and set that to null
so in your e.g.
c=null;
Object Oriented C++ has same object lifetime constraints as every OO based language
The life-cycle for an object begins when it is created, and ends whenit is destroyed. In a C++ class definition, a member function with thesame name as the class is a constructor. This is a function whichis called automatically whenever an instance of the class is created. Constructorsare typically used to initialize the data members of the object to theirdefault state, but may also be used to allocate resources (memory, files,etc.). For any class, a number of constructor functions may be declared,each taking different types of arguments, providing different ways of initializinginstances. A default constructor for a class is a constructor ofthat class that can be called without any arguments. A default constructorfor a class will be automatically generated if no constructor has beenexplicitly declared for that class. A copy constructor for a classis a constructor that can be called to copy an object of that class (ithas a single argument of the corresponding type). A copy constructor iscalled when, for example, an argument object is passed by value to a function,or when an object is initialized with the value of another object. A copyconstructor for a class will be automatically generated if no copy constructorhas been explicitly declared for that class. A member function with thesame name as the class with a leading tilde (~) is a destructor.This is a function that is called automatically when the object is deleted.The destructor is typically used to deallocate any memory allocated forthe object (and may also release any other resources acquired during construction).Constructors and destructors are not required in class definitions.
There are several ways to create objects in a C++ program. One is todefine a variable as being of a particular class, either as a global variableor as a local variable within a block. When the declaration is encounteredduring program execution, space is allocated for the object and the constructor,if any, for the object is called. Similarly, when an object variable goesout of scope, its destructor is called automatically. Another way to createan object is to declare a variable that is a pointer to the object classand call the C++ new operator, which will allocatespace for the object and call the constructor, if any, for the object.In this case, the pointer variable must be explicitly deallocated withthe delete operator. The constructor for theobject is executed when new is called, andthe destructor is executed when delete is called.An object can also be constructed by the explicit use of a constructorin an expression.
When a class is derived from another class, it inherits its parent class'constructor and destructor. Parent constructors are invoked before derivedconstructors. Destructors are invoked in the opposite direction, proceedingfrom the derived class upward through its parent chain.
More here http://www.objs.com/x3h7/cplus.htm

Is it undefined to initialize a class member in overloaded operator new?

Take a small example where, I am trying to find out if a variable is allocated on heap or not:
struct A
{
bool isOnHeap;
A () {} // not touching isOnHeap
~A () {}
void* operator new (size_t size)
{
A* p = (A*) malloc(size);
p->isOnHeap = true; // setting it to true
return p;
}
void operator delete (void *p) { free(p); }
};
It gives expected result in g++-4.5 (with warning for stack object). Is it ill defined
to do such operations ?
You can't initialize class members in an overloaded operator new because the object's lifetime hasn't started. You can only initialize members during the construction of the object.
You have no guarantee that the implementation won't wipe the memory between the time operator new returns and the time the object's construction starts or that during object construction members that are specified to have an indeterminate value by the standard (e.g. because they are POD and not explicitly initialized in the constructor like isOnHeap) aren't deliberately set to something by the implementation.
Note that A has a non-trivial constructor (it is user-declared), so its lifetime doesn't start when the storage for the object is allocated (ISO/IEC 14882:2003, 3.8 [basic.life] / 1) and the program has undefined behavior if it uses a pointer to the storage for the object to access a non-static data member (3.8 / 5). Even if A was a POD type, it's value after the completion of the new-expression would still be indeterminate rather than necessarily being related to the values in the bytes in the storage for the object before the new-expression was evaluated.
As Charles said, the object only comes to lifetime after it has been newed, so setting data within your implementation of new is rather dangerous.
Also, when your developers use tools like Lint, there's a big chance that it complains that the member isOnHeap is not initialized in the constructor. If then someone thinks "hey, Lint is right, let's initialize isOnHeap in the constructor of A", this will undermine the mechanism that you try to achieve.
There is a second case of which you probably didn't think. Suppose that someone writes this:
class MyClass
{
public:
...
private:
struct A m_a;
};
int main()
{
MyClass *myVariable = new MyClass();
}
Then your implementation of new will not be called. Nevertheless the instance of A is allocated on the heap (as part of the MyClass instance).
Can you explain why you want to know whether something has been allocated on the heap or not? Maybe there's another, more elegant solution to your problem.
Even when not considering the operator new itself (which is nonstandard and I would even say ugly, but knowing the exact details of some particular compiler it might be workable), there is another problem with this, which renders it useless anyway: You have no guarantee the value od isOnHeap will not be true when allocated on the stack. The stack is not initialized and any garbage from function invocations done before can be found there.