Which containers to store objects for access via different identifiers?

Which containers to store objects for access via different identifiers? - c++

I have to access my objects (multiple instances from one class) via several different identifiers and don't know which is the best way to store the mapping from identifier to object.
I act as a kind of "connector" between two worlds and each has its own identifiers I have to use / support.
If possible I'd like to prevent using pointers.
The first idea was to put the objects in a List/Vector and then create a map for each type of identifier. Soon I had to realize that the std-containers doesn't support storing references.
The next idea was to keep the objects inside the vector and just put the index in the map. The problem here is that I didn't find an index_of for vector and storing the index inside the object only works as long as nobody uses insert or erase.
The only identifer I have when creating the objects is a string and for performance I don't want to use this string as identifer for a map.
Is this a problem solved best with pointers or does anybody have an idea how to deal with it?
Thanks

Using pointers seems reasonable. Here's a suggested API that you could implement:
class WidgetDatabase {
public:
// Returns true if widget was inserted.
// If there is a Widget in *this with the same name and/or id,
// widget is not inserted.
bool Insert(const std::string& name, int id, const Widget& widget);
// Caller does NOT own returned pointer (do not delete it!).
// null is returned if there is no such Widget.
const Widget* GetByName(const string& name) const;
const Widget* GetById(int id) const;
private:
std::set<Widget> widgets_;
std::map<std::string, Widget*> widgets_by_name_;
std::map<int, Widget*> widgets_by_id_;
};
I think this should be pretty straightforward to implement. You just need to make sure to maintain the following invariant:
w is in widgets_ iff a pointer to it is in widgets_by_*
I think the main pitfall that you'll encounter is making sure is that name and id are not already in widgets_by_* when Insert is called.
It should be easy to make this thread safe; just throw in a mutex member variable, and some local lock_guards. Optionally, use shared_lock_guard in the Get* methods to avoid contention; this will be especially helpful if your use-case involves more reading than writing.

Have you considered an in-memory SQLite database? SQL gives you many ways of accessing the same data. For example, your schema might look like this:
CREATE TABLE Widgets {
-- Different ways of referring to the same thing.
name STRING,
id INTEGER,
-- Non-identifying characteristics.
mass_kg FLOAT,
length_m FLOAT,
cost_cents INTEGER,
hue INTEGER;
}
Then you can query using different identifiers:
SELECT mass_kg from Widgets where name = $name
or
SELECT mass_kg from Widgets where id = $id
Of course, SQL allows you to do much more than this. This will allow you to easily extend your library's functionality in the future.
Another advantage is that SQL is declarative, which usually makes it more concise and readable.
In recent versions, SQLite supports concurrent access to the same database. The concurrency model has gotten stronger over time, so you'll have to make sure you understand the model that is offered by the version that you're using. The latest version of the docs can be found on sqlite's website.

Related

How to correctly use map with smart pointers and custom classes as key and value

I'm trying to make map in which i will hold Teams as key and vector of Employees which are polymorphic as value. Some of the data will be loaded from file in future and the user will be able to add new teams and employees at any time.
This is the map that i came up with:
std::map<std::unique_ptr<Team>, std::unique_ptr<std::vector<std::unique_ptr<Employee>>>> teams;
And this is some test code where i tried to add new team and a member to it:
bool Company::addTeam(const std::string & projectName)
{
teams.emplace(std::unique_ptr<Team>(new Team(projectName)), std::unique_ptr<std::vector<std::unique_ptr<Employee>>>(new std::vector<std::unique_ptr<Employee>>()));
for (std::map<std::unique_ptr<Team>, std::unique_ptr<std::vector<std::unique_ptr<Employee>>>>::iterator it = teams.begin(); it != teams.end(); ++it) {
std::cout << it->first.get()->getProjectName();
it->second.get()->emplace_back(std::unique_ptr<Employee>(new Programmer("David", "Beckham", "9803268780", "Djoe Beckham", Employee::LevelThree, projectName)));
std::cout << "\n" << it->second.get()->at(0)->toString();
}
return false;
}
The code runs fine and im able to print the employee data but after closing the application it throws exception and visual studio open delete_scalar.cpp and triggers a breakpoint at this code :
void __CRTDECL operator delete(void* const block) noexcept
{
#ifdef _DEBUG
_free_dbg(block, _UNKNOWN_BLOCK); // break point
#else
free(block);
#endif
}
I'm trying to find out what i'm doing wrong and how to fix it. I have empty destructors for all Employees classes if that have something to do with the problem. If my idea looks very stupid and there is easier way to accomplish my goal please tell me. Thanks in advance.

Design-wise, one person can have multiple roles in different teams at the same time. It can be another class Role linking a person to a team. Neither team, no role conceptually own person objects though, so Role could use plain pointers to link Person with Team.

Your idea could work, if you would not have a problem within one of your class. Due to the missing parts, it's difficult to tell.
However, this is not the way to go !!
A map is meant to work with an index by value. A typical use case is to find back an item with an already existing key. Of course, you could use pointers as key, but the pointer would then act as a kind of id without any polymorphic operations.
On the other side, a unique_ptr is designed to ensure unique ownership. So only one unique copy of each pointer value exist. This makes it very difficult to use as map value:
auto team2 = make_unique<Team>();
teams [team2] = make_unique<vector<unique_ptr<Employee>>>(); // doesn't compile
The above code doesn't compile, because team2 is a unique_ptr and cannot be copied into the indexing parameter. Using it for searching or inserting an item, would require to move it:
teams [std::move(team2)] = make_unique<vector<unique_ptr<Employee>>>(); //compiles
assert (team2); // ouch
But once moved, the unique_ptr value is no longer in team2 which is now empty since the unique pointer is in the map key and it's unique. This means that you will never ever find back an added team.
Better alternatives ?
If you would want to really use a polymorphic pointer as a map key, you should at least use a shared_ptr, so that several copies can exist in your code. But I'd suggest that you use values only as a key
Now to the value part of the map. There is no benefit of making a unique_ptr of a vector: the vector itself is not polymorphic, and vectors are well designed for copying, moving and so on. Furthermore sizeof(vector<...>) is small even for very large vectors.
Use a vector<unique_ptr<Employee>> if you want the vector in the map to own the Employees, or vector<shared_ptr<Employee>> if you intend to share the content of the vector.

Access to custom objects in unordered_set

Please help to figure out the logic of using unordered_set with custom structures.
Consider I have following class
struct MyClass {
int id;
// other members
};
used with shared_ptr
using CPtr = std::shared_ptr<MyClass>;
Because of fast access by key I supposed to use an unordered_set with a custom hash and the MyClass::id member as a key):
template <class T> struct CHash;
template<> struct CHash<CPtr>
{
std::size_t operator() (const CPtr& c) const
{
return std::hash<decltype(c->id)> {} (c->id);
}
};
using std::unordered_set<CPtr, CHash>;
Right now, unordered_set still seems to be an appropriate container. However standard find() functions for sets are assumed to be const to ensure keys won't be changed. I intend to change objects guaranteeing keeping keys unchanged. So, the questions are:
1) How to realize easy accessing to element of set by int key reserving possibility to change element, something like
auto element = my_set.find(5);
element->b = 3.3;
It is possible to add converting constructor and use something like
auto element = my_set.find(MyClass (5));
But it doesn't solve the problem with constness and what if the class is huge.
2) Am I actually going wrong way? Should I use another container? For example unordered_map, that will store one more int key for each entry consuming more memory.

A pointer doesn't project its constness to the object it points to. Meaning, if you have a constant reference to a std::shared_ptr (as in a set) you can still modify the object via this pointer. Whether or not that is something you should do a is a different question and it doesn't solve your lookup problem.
OF course, if you want to lookup a value by a key, then this is what std::unordered_map was designed for so I'd have a closer look there. The main problem I see with this approach is not so much the memory overhead (unordered_set and unordered_map as well as shared_ptr have noticeable memory overhead anyway), but that you have to maintain redundant information (id in the object and id as a key).
If you have not many insertions and you don't absolutely need the (on average) constant lookup time and memory overhead is really important to you, you could consider a third solution (besides using a third-party or self written data structure of courses): namely to write a thin wrapper around a sorted std::vector<std::shared_ptr<MyClass>> or - if appropriate - even better std::vector<std::unique_ptr<MyClass>> that uses std::upper_bound for lookups.

I think you are going a wrong way using unordered_set,because unordered_set's definition is very clear that:
Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - they can be inserted and removed, though.
You can see its definition in site:
http://www.cplusplus.com/reference/unordered_set/unordered_set/.
And hope it is helpful for you.Thanks.

Reusing Qt's QString COW / ref counting in a string registry

I work on a project that is supposed to have large object count (in the range of millions), and even though object names are not mandatory, they are supported for the user convenience. It would be quite a big overhead to have an empty string member or even a string pointer for every object considering that named objects will be few and far in between. Also it just so happens that object names are very frequently reused.
So my solution was to create a name registry, basically a QHash<Object*, QString*> which tracks every object that has a name assigned to it, and another string registry, which is a QHash<QString, uint> which tracks the usage count for every string. When objects are named, they are registered in the name registry, when the name is changed or the object is deleted they are unregistered, and the name registry itself manages the string registry, creates the strings, tracks the usage count and if necessary removes entries that are no longer used.
I feel like the second registry may be redundant, since Qt already employs reference counting for its COW scheme, so I wonder how can I employ that already existing functionality to eliminate the string registry and manage the strings usage count and lifetime more elegantly?

user3735658, for some reason I tend to believe that hash table does not carry the original key string in it, only the hash. So maybe your concern of redundant QString object is not valid (?). It is a bit of question, though... Theoretically, there should not be actual string there. So, you can probably set the key to just anything not valid in the context of your app e.g. "UnnamedObj666" in case of object not tied to the string but still be able to find it via the hash-table /this is where it gets tricky, maybe not, because of inability to resolve collisions by matching with original/.
And, I maybe not replying exactly as you asked but it may work as well, how about
QHash<QString, QSharedDataPointer<YourType1>> collection1;
QHash<QString, QSharedDataPointer<YourType2>> collection2;
QHash<QString, QSharedDataPointer<YourType3>> collection3;
or maybe just one
QHash<QString, QSharedDataPointer<YourBasicType>> collection;
Using QSharedDataPointer here appears to be the solution as long as you derive YourType from QSharedData to carry the reference counter immediately with the object. This way we have the proper tracking system for all the "floating" references used pretty much anywhere in the program. Of course you create the instance just once and then provide a const ref to QSharedDataPointer to consumer of your object.
There is one problem with the solution above though, when the last named object destructed we still have the entry in the hash table but we can reuse it if so.

Best Practice : How to get a unique identifier for the object

I've got several objects and need to generate a unique identifier for them which will not be changed/repeated during the lifetime of each object.
Basically I want to get/generate a unique id for my objects, smth like this
int id = reinterpret_cast<int>(&obj);
or
int id = (int)&obj;
I understand the codes above are bad ideas, as int might not be large enough to store the address etc.
So whats the best practice to get a unique identifier from the object, which will be a portable solution ?

Depending on your "uniqueness"-requirements, there are several options:
If unique within one address space ("within one program execution") is OK and your objects stay where they are in memory then pointers are fine. There are pitfalls however: If your objects live in containers, every reallocation may change your objects' identity and if you allow copying of your objects, then objects returned from some function may have been created at the same address.
If you need a more global uniqueness, for instance because you are dealing with communicating programs or data that is persistent, use GUIDs/UUIds, such as boost.uuid.
You could create unique integers from some static counter, but beware of the pitfalls:
Make sure your increments are atomic
Protect against copying or create your custom copy constructors, assignment statements.
Personally, my choice has been UUIDs whenever I can afford them, because they provide me some ease of mind, not having to think about all the pitfalls.

If the objects need to be uniquely identified, you can generate the unique id in the constructor:
struct Obj
{
int _id;
Obj() { static int id = 0; _id = id++; }
};
You'll have to decide how you want to handle copies/assignments (same id - the above will work / different id's - you'll need a copy constructor and probably a static class member instead of the static local variable).

When I looked into this issue, I fairly quickly ended up at the Boost UUID library (universally unique identifier, http://www.boost.org/doc/libs/1_52_0/libs/uuid/). However, as my project grew, I switched over to Qt's GUID library (globally unique identifier, https://doc.qt.io/qt-5/quuid.html).
A lesson learned for me though was to start declaring your own UUID class and hide the implementation so that you can switch to whatever you find suitable later on.
I hope that helps.

If your object is a class then you could have a static member variable which you intestinal to 0. Then in the constructor you store this value into the class instance and increment the static variable:
class
Indexed
{
public:
Indexed() :
m_myIndex( m_nextIndex++ )
{ }
int getIndex() const
{ return m_myIndex; }
private:
const int m_myIndex;
static int m_nextIndex;
};

If you need unique id for distributed environment use boost::uuid

It does not look like a bad idea to use the object address as the unique (for this run) identifier, directly. Why to cast it into integer? Just compare pointers with ==:
MyObject *obj1, *obj2;
...
if (obj1 == obj2) ...
This will not work, of course, if you need to write IDs to database or the like. Same values for pointers are possible between runs. Also, do not overload comparison operator (==).

How to create artificial nodes in QAbstractItemModel for QTreeView

my question is about Qt and its QAbstractItemModel.
I have a map of strings and doubles (std::map<stringclass, double>) which I would like to present in a Qt widget. While I could use QTableView for that, I would like to exploit the fact that the keys of the map are of form "abc.def.ghi" where there can be multiple strings that can start with "abc.def" and even more that start with "abc".
So I would like to setup a tree data model to present the items in a QTreeView like
(-) abc
|--(-)def
|--ghi 3.1415
|--jkl 42.0815
|--(+)pqr
|--(+)xyz
The keys of my std::map are the leaves of the tree, where all other nodes would be temporary and auxillary constructs to support the folding for user convenience.
Unfortunately, the methods rowCount, index, columnCount, and data have const-modifiers, so I cannot simply setup a auxillary data structure for the headers inside my QAbstractItemModel derivate and change that data structure in there.
What would be the best practice for that? Should I setup another class layer between my std::map and the QAbstractItemModel or is there a smarter way to do this?
Edit 1: The std::map can change while the QTreeView is shown and used, so the auxillary nodes might be thrown away and reconstructed. My assumption is that the best way to handle this is to restructure the QAbstractItemModel - or should I simply throw that model away and assign a newly constructred one to the QTreeView? In that case I could set-up all nodes within the constructor without being bothered by the const-ness of the methods, I guess.

I would parse the map and create a tree data structure based on it. Make sure you sync the model when you change the map.
If this sync step gets too complicated you might want to hold your data in a tree structure from the start and convert to a map when necessary.
Parsing the map on the fly in the model functions seems like a bad idea to me, you'd want these functions to be as fast as possible.

I don't see how const-modifiers would really be an issue.
What members of your QAbstractItemModel derivate would you want to modify when rowCount, index, columnCount and data methods are called ? You may very well store a reference to your map, and compute everything from it. No need to modify the map itself to extract the needed information (as far as i can tell !).
EDIT after EDIT1 and comments :
If your map is bound to be modified, use it as your base structure in your own class.
If you can't keep a reference to your map because the model's lifetime might exceed the map's, use smart pointers to make sure it does not happen.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js