Using the address of a member variable as an ID - c++

I'm trying to avoid declaring enums or using strings. Although the rationale to do so may seem dubious, the full explanation is irrelevant.
My question is fairly simple. Can I use the address of a member variable as a unique ID?
More specifically, the requirements are:
The ID won't have to be serialised.
IDs will be protected members - only to be used internally by the owning object (there is no comparison of IDs even between same class instances).
Subclasses need access to base class IDs and may add their new IDs.
So the first solution is this:
class SomeClass
{
public:
int mBlacks;
void AddBlack( int aAge )
{
// Can &mBlacks be treated as a unique ID?
// Will this always work?
// Is void* the right type?
void *iId = &mBlacks;
// Do something with iId and aAge
// Like push a struct of both to a vector.
}
};
While the second solution is this:
class SomeClass
{
public:
static int const *GetBlacksId()
{
static const int dummy = 0;
return &dummy;
}
void AddBlack( int aAge )
{
// Do something with GetBlacksId and aAge
// Like push a struct of both to a vector.
}
};

No other int data member of this object, and no mBlacks member of a different instance of SomeClass in the same process, has the same address as the mBlacks member of this instance of SomeClass. So you're safe to use it as a unique ID within the process.
An empty base class subobject of SomeClass could have the same address as mBlacks (if SomeClass had any empty base classes, which it doesn't), and the char object that's the first byte of mBlacks has the same address as mBlacks. Aside from that, no other object has the same address.
void* will work as the type. int* will work too, but maybe you want to use data members with different types for different ids.
However, the ID is unique to this instance. A different instance of the same type has a different ID. One of your comments suggests that this isn't actually what you want.
If you want each value of the type to have a unique ID, and for all objects that have the same value to have the same ID, then you'd be better of composing the ID from all of the significant fields of the object. Or just compare objects for equality instead of their IDs, with a suitable operator== and operator!=.
Alternatively if you want the ID to uniquely identify when a value was first constructed other than by copy constructors and copy assignment (so that all objects that are copies of the same "original" share an ID), then the way to do that would be to assign a new unique ID in all the other constructors, store it in a data member, and copy it in the copy constructor and copy assignment operator.
The canonical way to get a new ID is to have a global[*] counter that you increment each time you take a value. This may need to be made thread-safe depending what programs use the class (and how they use it). Values then will be unique within a given run of the program, provided that the counter is of a large enough type.
Another way is to generate a 128 bit random number. It's not theoretically satisfying, but assuming a decent source of randomness the chance of a collision is no larger than the chance of your program failing for some unavoidable reason like cosmic ray-induced data corruption. Random IDs are easier than sequential IDs when the sources of objects are widely distributed (for example if you need IDs that are unique across different processes or different machines). You can if you choose use some combination of the MAC address of the machine, a random number, the time, a per-process global[*] counter, the PID and anything else you think of and lay your hands on (or a standard UUID). But this might be overkill for your needs.
[*] needn't strictly be global - it can be a private static data member of the class, or a static local variable of a function.

Related

Storing a list/map of object types in a class in C++11/C++14

I am writing a template for a C++ class (a registry) that has methods like Create and Delete, which instantiates and stores shared pointers to objects, but the Create method returns a reference to the created object rather than the shared pointer itself (the particular paradigm here being that no pointers, even smart pointers, exposed in the public interface).
The object registry that can deal with polymorphic types, in the sense that the registry is specialized for the base class and then Create is a template function that can be specialized for any polymorphically-derived class of the base class. It then returns a reference to the create object, of the derived class. The class also has an ID system, so any objects can be also referred to via that.
I require a Get method of type auto that can return the object (given its ID) in the same type is was created in. Obviously the objects are stored as a list of shared pointers to the base class, so this requires a dynamic_cast.
However, I cannot think of a way of storing the original object type when it is created. I need something akin to a std::map<[object ID], [object type]> stored as a member variable for the registry.
I've considered concatenating std::tuples but adding a new object changes its type, so it can't be stored as a member of the registry. I've also considered tricks of having a typedef within a new class that inherits from a virtual base class, so it can be stored in a list of pointers to the base class, but then using dynamic_cast to access the derived class requires knowing the object type in the first place.
Making a member list of std::functions that call another function (instantiated during Create) also won't work because the return types are different and auto cannot be used within std::function. I've also tried various tricks with variadic templates.
All solutions on SO I've seen are unsuitable because these are two methods (Create and Get) being called wrt the same class, so the information needs to be contained in the particular instance of the class itself.
Is this task impossible?
It's not impossible; but you made it impossible.
The system you're asking for doesn't require a lot of technicalities apart from using templates for the Get function. Let's break it down:
You want to create a system whereby you can instantiate (e.g. Create) classes that are of an appropriate 'base' and then store them in an associative-container, in which case you chose map.
Your map is defined thus:
std::map<[object ID], [object type]> m_map;
Now, given this information. Why, might I ask, would you want to return a reference to the object? Even more so, your Create function can be simplified a lot easier to something like this:
void System::create(int id, Base *b)
{
m_map.emplace(id, b); // Assuming object ID is of type int
}
If you have your create function implemented thus, then the following is permissible:
class Child : public Base
{
public:
Child();
Child(const std::string &name);
virtual ~Child();
};
int main()
{
System s;
s.create(1, new Child("Roger"));
}
You are probably not interested in using the manual approach of creating objects, but something more automated. Without introducing new technical measures to our infant System class:
static Child *create(const std::string &name)
{
return new Child(name);
}
Which allows the following usage:
s.create(2, Child::create("William"));
You want to be able to retrieve classes of a derived type based on such. Sans the pun, there's no need to create a highly specialised auto function. You know the type you want to get ''at compile time''; whereas auto and decltype C++14 are more concerned with types that are unknown until run-time. Assuming you know what type you want, our function is much easier:
template<typename T>
T Get(int id)
{
std::map<..>::iterator i = m_map.find(id);
if (i != m_map.end())
return dynamic_cast<T>(i->second);
else return nullptr;
}
Which now allows the following usage, continuing our int main()..
class Children : public Base
{
Children();
virtual ~Children();
void add(Child *c);
};
int main()
{
System s;
s.create(1, Child::create("Roger"));
s.create(2, Child::create("William"));
s.create(3, new Children());
s.get<Children*>(3)->add(s.get<Child*>(2)); // Add william to group
return 0;
}
The advantage is that you now have a system that is able to deal with many objects that derive from ''Base'' without having to know which objects actually derive from it! This makes our System class very versatile and extensible. It also means that any object-creation methods are the responsibility of the ''Base'' classes; e.g. Child and Children in our case. For the latter we did not implement an object-factory method because it was not practical at this time.
You want to delete an object from your registry, thus:
void System::delete(int id)
{
m_map.erase(id);
}
Now we have a pretty functional registry system that can serve any class. It's important that these registries aren't abused to serve ''too'' generic types. It's better to stratify which family of classes warrants their own registry.
Things to take into account:
When you add objects to your map, they are automatically converted into the Base type, but because of polymorphism the pointer is really pointing to a different location in memory with its own set of values and functionality. This is why it's possible to dynamically convert a type to another so you can get back the derived type. It's in fact a lot better to refer to objects outside the system through ids (handles) rather than the references to what they prescribe.
Please note, I'm using raw pointers for this example. If you want to use smart pointers, do take into account that maps already handle memory for you. If they didn't, it wouldn't be possible to use the memory when using the Get function. It's a matter of style, but also a highly controversial one. Valid objections.
Also, very important:
Consider using std::unordered_map if your system involves getting objects through the Get function. The reason for this is simple: the objects are unordered. This makes it easier to iterate through an unordered_map to retrieve objects contained. Whereas in an ordered_map std::map the Get function would have to go through all the objects until it finds the one it needs. For this reason: use std::unordered_map when you know you're going to retrieve values/objects; and use std::map when you know you're only going to iterate over them.
The usual way to do this sort of thing is to make the Get method a template method, something like:
class Registry {
template <class T> T &Get(id_t id) {
... fetch the smart pointer from the registry
return dynamic_cast<T &>(*ptr); }
This requires the caller of Get know what type of object it is getting (and will throw a std::bad_cast if it gets it wrong):
auto &obj = registry->Get<DerivedType>(id);
However, this approach exposes references in the interface, which are really pointers, which you say you want to avoid.
If you really want to avoid exposing all pointers, you need to provide a way of manipulating objects in the registry using only their ids. One way to do this is to create a DerivedTypeManipulator singleton for every derived type you store in the registry, which exposes all the operations on the derived type, but via an id rather than a pointer or reference.
This doesn't really solve the problem of needing to know the derived type in code that needs to do anything specific to a derived type, however.

Automated Object Creation, How to Control/Increment Names?

I've hit a bit of a hurdle with creating objects using user input for variables. Basically the program determines what type of object the user wants, creates it, and then asks if the user wants to add another object. In this example the object is a manager, which is a subclass of employee.
void add_manager()
{
string name;
//Performs a bunch of checks to ensure the input is valid.
name = get_string_input("Please input the name of the employee.");
//Creates a manager object.
manager manager1(name);
//Goes back to previous function, restarts process of finding out employee type.
ask_employee();
}
I will be storing pointers to each object in a dynamic array elsewhere. The point of the array is just to get values out of each object to use in some printouts, so I was expecting to just loop over the array, get the value of each, and print. (Rough example)
The part I'm not sure about is how to change the object constructor call so the objects are made as manager1, manager2 etc. There will be a varied number made due to what the user wants, and I was hoping to keep them in a way to tell the difference.
Since I will be accessing the objects via pointers, do the object names even need to be different? Or can objects and pointers all have the same name?
Since managers objects can be in an infinite number, you can't name them all. C++ is a statically typed language. You strictly need to keep your manager objects in an array-like structure:
std::vector<manager *> vManagers;
void add_manager()
{
string name;
//Performs a bunch of checks to ensure the input is valid.
name = get_string_input("Please input the name of the employee.");
//Creates a manager object.
vManagers.push_back(new manager(name));
//Goes back to previous function, restarts process of finding out employee type.
while(ask_employee()
{
name = get_string_input("Please input the name of the employee.");
vManagers.push_back(new manager(name));
}
}
So that when you need manager object you can call:
vManagers[n]->GetData();
But note that you need to delete manager object pointers in apppriate places to avoid leaks:
delete vManagers[n];
vManagers[n] = NULL;
Do the object names even need to be different?
They don't need to be different, when you store them in the array there is no connection between the objects and their names, e.g. they can be all instantiated using manager11. (Additionally, you can't use their name for search.)
What differentiates the objects are the values of their data members. Thus, if you want to search the objects define an object id or name as data member and then you can use it in your search criterion to find a particular manager.
Edit 1:
One way to create an object counter is by defining a data member:
static int counter = 0;. Then you increment your counter in the constructor to reflect object instantiation and decrement it in the destructor.
Edit 2:
If you want to store objects it would be better to use vector<object_type> container_name (instead of arrays). To do this you need to define a vector outside the object you want to store. In case of storing pointers to type object_type, you can do something like:
vector<object_type*> container_name;
object_type* ObjectInstance = new object_type(parameters);
// store in vector
container_name.emplace_back(ObjectInstance);
1.The names you give variables are available to the program only at compile time, when you turn it from source into an executable file. Afterwards, when you want to create objects , those kinds of information are no longer available. The program only knows about the machine addresses where operands to machine instructions are located.

C++ object representation

I have a doubt: I can declare a pointer to a class member function
void (*MyClass::myFunc)(void);
and I can declare a pointer to a class member variable
int (MyClass::*var);
My question is: how is an object (composed by member functions and member variables) structured in memory (asm-level) ?
I'm not sure because, except for polymorphism and runtime virtual functions, I can declare a pointer to a member function even without an object and this implies that the code functions are shared among multiple classes (although they require a *this pointer to work properly)
But what about the variables? How come I can declare a pointer to a member variable even without an object instance? Of course I need one to use it, but the fact that I can declare a pointer without an object makes me think a class object in memory represents its variables with pointers to other memory regions.
I'm not sure if I explained properly my doubt, if not just let me know and I'll try to explain it better
Classes are stored in memory quite simply - almost the same way as structures. If you inspect the memory in the place, where the class instance is stored, you'll notice, that its fields are simply packed one after another.
There's a difference though, if your class have virtual methods. In such case the first thing stored in a class instance is a pointer to a virtual method table, which allows virtual methods to work properly. You can read more about this on the Internet, that's a little more advanced topic. Luckily, you don't have to worry about that, compiler does it all for you (I mean, handling VMT, not worrying).
Let's go to the methods. When you see:
void MyClass::myFunc(int i, int j) { }
Actually the compiler converts it into something like:
void myFunc(MyClass * this, int i, int j) { }
And when you call:
myClassInstance->myFunc(1, 2);
Compiler generates the following code:
myFunc(myClassInstance, 1, 2);
Please keep in mind, that this is a simplification - sometimes it's a little more complicated than this (especially when we discuss the virtual method calls), but it shows more or less, how classes are handled by the compiler. If you use some low-level debugger such as WinDbg, you can inspect parameters of the method call and you'll see, that the first parameter is usually a pointer to class instance you called the method on.
Now, all classes of the same type share their methods' binaries (compiled code). Therefore there is no point in making copy of them for each class instance, so there is only one copy held in the memory and all instances use it. It should be clear now, why can you get the pointer to method even if you have no instance of the class.
However, if you want to call the method kept in a variable, you always have to provide a class instance, which can be passed by the hidden "this" parameter.
Edit: In response to comments
You can read more about pointer members in another SO question. I guess, that pointer to member stores the difference between the beginning of classes instance and the specified field. When you try to retrieve the value of a field using the pointer-to-member, compiler locates the beginning of classes instance and move by amount of bytes stored in pointer-to-member to reach the specified field.
Each class instance has its own copy of non-static fields - otherwise they wouldn't be much of a use for us.
Notice, that similarly to pointers to methods, you cannot use pointer to member directly, you again have to provide a class instance.
A proof of what I say would be in order, so here it is:
class C
{
public:
int a;
int b;
};
// Disassembly of fragment of code:
int C::*pointerToA = &C::a;
00DB438C mov dword ptr [pointerToA],0
int C::*pointerToB = &C::b;
00DB4393 mov dword ptr [pointerToB],4
Can you see the values stored in pointerToA and pointerToB? Field a is distant by 0 bytes from the beginning of classes instance, so value 0 is stored in pointerToA. On the other hand, field b is stored after the field a, which is 4 bytes long, so value 4 is stored in pointerToB.

Best Practice : How to get a unique identifier for the object

I've got several objects and need to generate a unique identifier for them which will not be changed/repeated during the lifetime of each object.
Basically I want to get/generate a unique id for my objects, smth like this
int id = reinterpret_cast<int>(&obj);
or
int id = (int)&obj;
I understand the codes above are bad ideas, as int might not be large enough to store the address etc.
So whats the best practice to get a unique identifier from the object, which will be a portable solution ?
Depending on your "uniqueness"-requirements, there are several options:
If unique within one address space ("within one program execution") is OK and your objects stay where they are in memory then pointers are fine. There are pitfalls however: If your objects live in containers, every reallocation may change your objects' identity and if you allow copying of your objects, then objects returned from some function may have been created at the same address.
If you need a more global uniqueness, for instance because you are dealing with communicating programs or data that is persistent, use GUIDs/UUIds, such as boost.uuid.
You could create unique integers from some static counter, but beware of the pitfalls:
Make sure your increments are atomic
Protect against copying or create your custom copy constructors, assignment statements.
Personally, my choice has been UUIDs whenever I can afford them, because they provide me some ease of mind, not having to think about all the pitfalls.
If the objects need to be uniquely identified, you can generate the unique id in the constructor:
struct Obj
{
int _id;
Obj() { static int id = 0; _id = id++; }
};
You'll have to decide how you want to handle copies/assignments (same id - the above will work / different id's - you'll need a copy constructor and probably a static class member instead of the static local variable).
When I looked into this issue, I fairly quickly ended up at the Boost UUID library (universally unique identifier, http://www.boost.org/doc/libs/1_52_0/libs/uuid/). However, as my project grew, I switched over to Qt's GUID library (globally unique identifier, https://doc.qt.io/qt-5/quuid.html).
A lesson learned for me though was to start declaring your own UUID class and hide the implementation so that you can switch to whatever you find suitable later on.
I hope that helps.
If your object is a class then you could have a static member variable which you intestinal to 0. Then in the constructor you store this value into the class instance and increment the static variable:
class
Indexed
{
public:
Indexed() :
m_myIndex( m_nextIndex++ )
{ }
int getIndex() const
{ return m_myIndex; }
private:
const int m_myIndex;
static int m_nextIndex;
};
If you need unique id for distributed environment use boost::uuid
It does not look like a bad idea to use the object address as the unique (for this run) identifier, directly. Why to cast it into integer? Just compare pointers with ==:
MyObject *obj1, *obj2;
...
if (obj1 == obj2) ...
This will not work, of course, if you need to write IDs to database or the like. Same values for pointers are possible between runs. Also, do not overload comparison operator (==).

class member function fails when at least two instances of class declared before

I have two classes, BTLeafNode and BTNonLeafNode, each of which is derived from my class BTreeNode. BTreeNode has a protected data member buffer which is a 1024 byte character array. BTreeNode has a template function inserttemp which stores int-T pairs in the buffer, where T is the type that the function is called with. Each class has its own function insert which calls inserttemp. BTNonLeafNode stores int-PageId pairs (PageId is basically an int) and BTLeafNode stores int-RecordId pairs (Record Id consists of a PageId and an int) into the buffer. I haven't tested BTNonLeafNode yet, but for some reason when I there are only two instances of BTLeafNode and I call the insert function it works fine, but for any instance of BTLeafNode which was declared after at least two instances of BTLeafNode were declared, it screws up. The part of memory that should store the int in RecordId instead stores the int of the next int-RecordId pair (these two pieces of data are stored next to each other in the buffer).
I'm really confused because I don't understand why declaring an instance would mess up the function. There aren't any global variables. You don't even need to do anything with the declared instances, as long as you declare them it messes up the function.
At this point there is no code posted, so we can't exactly see what's going on, but if you have not declared any data-member as a static data-member, and each derived instance of a BTreeNode owns it's own private buffer, then I can pretty much guarantee that the problem has nothing to-do with inheritance or how many instances of the derived object you declare, but is most likely a problem with your insertion algorithm. It may be a subtle bug which does not present itself in every case, hence the reason why some insertions work, but again, since each instance object has it's own memory buffer, then the only way that an instance object could have it's memory buffer screwed up is if the algorithm in the member function that is accessing the instance-object's array has a problem.
Also you said there is a 1024-byte character array in each instance of a BTreeNode, but you're storing pairs using a template function ... are you doing some type of cast of the pair to a unsigned char* and using memcpy() to allocate the pair structure in the buffer? If you are, then there's a lot that can go wrong if you're not careful about how you increment and cast your pointers.