Store pointers to objects in multiple containers - c++

For the sake of presenting my question, let's assume I have a set of pointers (same type)
{p1, p2, ..., pn}
I would like to store them in multiple containers as I need different access strategy to access them. Suppose I want to store them in two containers, linked list and a hash table. For linked list, I have the order and for hash table I have the fast access. Now, the problem is that if I remove a pointer from one container, I'll need to remember to remove from other container. This makes the code hard to maintain. So the question is that are there other patterns or data structures to manage situation like this? Would smart pointer help here?

If I understand correctly, you want to link your containers so that removing from one removes from all. I don't think this is directly possible. Possible solutions:
re-design whole object architecture, so pointer is not in many containers.
Use Boost Multi-index Containers Library to achieve all features you want in one container.
Use a map key instead of direct pointer to track objects, and keep the pointer itself in one map.
use std::weak_ptr so you can check if item has been deleted somewhere else, and turn it to std::shared_ptr while it is used (you need one container to have "master" std::shared_ptr to keep object around when not used)
create function/method/class to delete objects, which knows all containers, so you don't forget accidentally, when all deletion is in one place.

Why don't you create your own class which contains the both std::list and std::unordred_map and provide accessing functions and provide removal functions in a way that you can access them linearly with the list and randomly with the unordred_map, and the deletion will be deleting from both containers and insertion will insert to both. ( kind of a wrapper class :P )
Also you can consider about using std::map, and providing it a comparison function which will always keep your data structure ordered in the desired way and also you can randomly access the elements with log N access time.

As usually, try to isolate this logic to make things easier to support. Some small class with safe public interface (sorry, I didn't compile this, it is just a pseudocode).
template<class Id, Ptr>
class Store
{
public:
void add(Id id, Ptr ptr)
{
m_ptrs.insert(ptr);
m_ptrById.insert(std::make_pair(id, ptr));
}
void remove(Ptr ptr)
{
// remove in sync as well
}
private:
std::list<Ptr> m_ptrs;
std::map<Id, Ptr> m_ptrById;
};
Then use Store for keeping your pointers in sync.

If I understand your problem correctly, you are less concern with memory management (new/delete issue) and more concern with the actual "book keeping" of which element is valid or not.
So, I was thinking of wrapping each point with a "reference counter"
template< class Point >
class BookKeeping {
public:
enum { LIST_REF = 0x01,
HASH_REF = 0x02 };
BookKeeping( const Point& p ): m_p(p), m_refCout( 0x3 ) {} // assume object created in both containers
bool isValid() const { return m_refCount == 0x3; } // not "freed" from any container
void remove( unsigned int from ) { m_refCount = m_refCount & ! from ; }
private:
Point m_p;
unsigned int m_refCount;
};

See the answer (the only one, by now) to this similar question. In that case a deque is proposed instead of a list, since the OP only wanted to insert/remove at the ends of the sequence.
Anyway, you might prefer to use the Boost Multi-index Containers Library.

Related

Overloading push_back() in vector to allow non-duplicate elements

Can we overload the push_back() method in std::vector to allow non-duplicate elements? I know std::set and std::unordered_set are supposed to avoid duplicate elements, but std::set sorts the elements and std::unordered_set stores the elements in no particular order. I need to retrieve the elements in the order they are inserted, while ensuring duplicate elements are not inserted.
Edit: There's a possible duplicate for this question here. The best solution to this duplicate proposes to have an auxiliary data structure and another custom method "add". This doesn't look good for me since(I'll put it in a separate documentation) the users inserting data in std::vector rarely refer to the documentation for any custom functions. If there's no efficient way though, this can be a last resort.
Many people advise against it, but it seems there's some kind of urban legend going around that doing so will cause the universe to undergo vacuum decay and reality as we know it will dissolve.
You can publicly inherit from std::vector. But you have to think about what you can do with that.
If you inherit from vector, it is highly recommended that you don't add any data members to it. This can cause object slicing (google "c++ object slicing".) You also need to keep in mind that vector is not using virtual functions. That means you cannot override member functions. You can only shadow them, so it's not guaranteed that it will always be your push_back() function that gets called. The original will get called if you pass an object of your class to something that takes a reference to a vector, for example.
So in the end, you'd need to add a push_back_unique() function instead. But that in turns means that can be served by a simple free function instead. So inheriting vector isn't needed. This of course means there's never a guarantee that the elements in the vector will be unique. Other code might use push_back() instead somewhere.
Inheriting vector makes sense if you want to add completely new convenience functions that don't impose or lift any restrictions that vector has. If you want something that looks like a vector but really isn't (because it has different behavior and/or restrictions), you should implement your own type that delegates the container functionality to vector by either inheriting privately from it, or by having it as a private data member, and then replicate the vector API through public wrapper functions.
But this is very tedious to implement. Usually, you don't really need all the API from vector. So I'd say just write a smaller class around vector that only provides the functionality you need. And that functionality sounds like it's going to be pretty much read-only, since allowing write access to the elements allows for setting an element to the same value as another, breaking the container's uniqueness. So you could do something like:
template<typename T>
class UniqueVector
{
public:
void push_back(T&& elem)
{
if (std::find(vec_.begin(), vec_.end(), elem) == vec_.end()) {
vec_.push_back(std::forward(elem));
}
}
const T& operator[](size_t index) const
{
return vec_[index];
}
auto begin() const
{
return vec_.cbegin();
}
auto end() const
{
return vec_.cend();
}
private:
std::vector<T> vec_;
};
If you still want to allow write access to individual elements, then you can provide non-const functions that check if the value that is passed is already in the vector. Like:
void assign_if_unique(size_t index, T&& value)
{
if (std::find(vec_.begin(), vec_.end(), value) == vec_.end()) {
vec_[index] = std::forward(value);
}
}
This is a minimal example. You should obviously add the functions you actually want. Like size(), empty(), and whatever else you need.
You should first define a free function1 to implement your feature:
template<class T>
std::vector<T>&
push_back_unique(std::vector<T>& dest, T const& src)
{ /* ... */ }
If you use this a lot, and if make sense regarding your program, you might want to define an operator to do so:
template<class T>
std::vector<T>& operator<<(std::vector<T>& dest, T const& src)
{ return push_back_unique(dest, src); }
This allows:
std::vector<int> data;
data << 5 << 8 << 13 << 5 << 21;
for (auto n : data) std::cout << n << " "; // prints 5 8 13 21
1) This is because inheriting from standard containers is often bad practice and brings pitfalls.

Optimize search in std::deque

I'm doing a program that has a different kind of objects and all of them are children of a virtual class. I'm doing this looking for the advantages of polymorphism that allow me to call from a manager class a certain method of all the objects without checking the specific kind of object it is.
The point is the different kind of objects need sometimes get a list of objects of a certain type.
In that moment my manager class loop thought all the objects and check the type of the object. It creates a list and return it like this:
std::list<std::shared_ptr<Object>> ObjectManager::GetObjectsOfType(std::string type)
{
std::list<std::shared_ptr<Object>> objectsOfType;
for (int i = 0; i < m_objects.size(); ++i)
{
if (m_objects[i]->GetType() == type)
{
objectsOfType.push_back(m_objects[i]);
}
}
return objectsOfType;
}
m_objects is a deque. I know iterate a data structure is normally expensive but I want to know if is possible to polish it a little bit because now this function takes a third of all the time used in the program.
My question is: is there any design pattern or fuction that I'm not taking into account in order to reduce the cost of this operation in my program?
In the code as given, there is just a single optimization that can be done locally:
for (auto const& obj : m_objects)
{
if (obj->GetType() == type)
{
objectsOfType.push_back(obj);
}
}
The rationale is that operator[] is generally not the most efficient way to access a deque. Having said that, I don't expect a major improvement. Your locality of reference is very poor: You're essentially looking at two dereferences (shared_ptr and string).
A logical approach would be to make m_objects a std::multimap keyed by type.
Some things you can do to speed up:
Store the type on the base class, this will remove a somewhat expensive virtual lookup.
If type is a string, etc. change to a
simpel type like an enum or int
A vector is more effiecient to
traverse than a deque
if staying with deque, use iterators or a range based for loop to avoid the random lookups (which are more expensive in deque)
Range based looks like this:
for (auto const& obj : m_objects)
{
if (obj->GetType() == type)
{
objectsOfType.push_back(obj);
}
}
Update: Also I would recommend against using a std::list (unless for some reason you have to) as it is not really performing well in many cases - again std::vector springs to the rescue !

Is it a bad practice to return a std container from a interface class?

I have meet such a question.
I need to design a interface class, which looks like to be
struct IIDs
{
....
const std::set<int>& getAllIDs() = 0; //!< I want the collection of int to be sorted.
}
void foo()
{
const std::set<int>& ids = pIIDs->getAllIDs();
for(std::set<int>::const_iterator it = ids.begin();....;..) {
// do something
}
}
I think that return a std's container is a bit of inappropriate, for that it will force the implement to use a std::set to store the value of IDs, But If I write it as follow :
struct IIDs
{
....
int count() const = 0;
int at(int index) = 0; //!< the itmes should be sorted
}
void foo()
{
for (int i = 0; i < pIIDs->count(); ++i) {
int val = pIIDs->at(u);
...
}
}
I found that none of the std's containers could provide those requests:
the complexity of index lookup needed to less or equal than O(log n).
the complexity of insertion need to less or equal than O(log n).
the items must be sorted.
So I just have to use the example.1, Is those can be acceptable?
STL containers and template code in general should never be used across a DLL boundary.
The thing you have to keep in mind when returning complex types like STL containers is that if your call ever crosses the boundary between two different DLLs (or a DLL and an application) running different memory managers your application will most likely crash spectacularly.
The templates that make up the STL code will be executed within the implementation DLL, creating all the memory used by the container there. Later when it leaves scope in your calling code, your own memory manager will attempt to deallocate memory it doesn't own, resulting in a crash.
If you know your code won't cross DLL boundaries, and will only ever be called in the context of a single memory manager, then you're fine as far as memory management is concerned.
However, even in cases where you're only returning references, such as your example above, where the lifetime of the container would be entirely managed by the interface implementation code, unless you know that the exact same version of the STL and the exact same compiler and linker settings were used for compiling the implementation as the caller, you're asking for trouble.
The problem i see is you are returning the collection by const references, that mean that you have a member of that collection type and are returning a reference to it, if you are returning a local variable to the function (invalid memory access problems).
If it's a member variable is better provide access to begin and end iterator. If is local variable you could returned by value (C++11 should optimize and no copy anything). If it's DLL boundary try for all mean not use any C++ types, only C types.
In terms of design, and for good generic code, prefer the STL way: return iterators, leaving the container type an implementation detail of IIDs, and hide your types with typdefs
struct IIDs
{
typedef std::set<int> Container;
typedef Container::iterator IDIterator;
// We only expose iterators to the data
IDIterator begin(); //!< I want the collection of int to be sorted.
IDIterator end();
// ...
};
There are various approaches:
if you want to minimise the coupling of client code on the IIDs implementation and ensure iteration is completed while the IIDs object still exists, then use a visitor pattern: the calling code just has to supply some function to be called for each of the member elements in turn and is not responsible for the iteration itself
Visitor example:
struct IIDs
{
template <typename T>
void visit(T& t)
{
for (int i : ids_) t(i);
}
...
private:
std::set<int> ids_;
};
if you want to give the caller more freedom to mix other code in with the container traversal, and have multiple concurrent independent traversals, then provide iterators, but be aware that the client code could keep an iterator hanging around longer than the IIDs object itself - you may or may not want to handle that scenario gracefully

C++ - Proper way of using std::vector & related memory management

Hy, I would like to ask a question that puzzles me.
I've a class like this:
class A {
private:
std::vector<Object*>* my_array_;
...
public
std::vector<Object*>& my_array(); // getter
void my_array(const std::vector<Object*>& other_array); // setter
};
I wanted to ask you, based on your experience, what is the correct way of implementing the setter and getter in a (possible) SAFE manner.
The first solution came to my mind is the following.
First, when I do implement the setter, I should:
A) check the input is not a referring to the data structure I already hold;
B) release the memory of ALL objects pointed by my_array_
C) copy each object pointed by other_array and add its copy to my_array_
D) finally end the function.
The getter may produce a copy of the inner array, just in case.
The questions are many:
- is this strategy overkilling?
- does it really avoid problems?
- somebody really uses it or are there better approaches?
I've tried to look for the answer to this question, but found nothing so particularly focused on this problem.
That of using smart pointers is a very good answer, i thank you both.. it seems I can not give "useful answer" to more than one so I apologize in advance. :-)
From your answers however a new doubt has raised.
When i use a vector containing unique_ptr to objects, I will have to define a deep copy constructor. Is there a better way than using an iterator to copy each element in the vector of objects, given that now we are using smart pointers?
I'd normally recommend not using a pointer to a vector as a member, but from your question it seems like it's shared between multiple instances.
That said, I'd go with:
class A {
private:
std::shared_ptr<std::vector<std::unique_ptr<Object> > > my_array_;
public
std::shared_ptr<std::vector<std::unique_ptr<Object> > > my_array(); // getter
void my_array(std::shared_ptr<std::vector<std::unique_ptr<Object> > > other_array); // setter
};
No checks necessary, no memory management issues.
If the inner Objects are also shared, use a std::shared_ptr instead of the std::unique_ptr.
I think you are overcomplicating things having a pointer to std::vector as data member; remember that C++ is not Java (C++ is more "value" based than "reference" based).
Unless there is a strong reason to use a pointer to a std::vector as data member, I'd just use a simple std::vector stored "by value".
Now, regarding the Object* pointers in the vector, you should ask yourself: are those observing pointers or are those owning pointers?
If the vector just observes the Objects (and they are owned by someone else, like an object pool allocator or something), you can use raw pointers (i.e. simple Object*).
But if the vector has some ownership semantics on the Objects, you should use shared_ptr or unique_ptr smart pointers. If the vector is the only owner of Object instances, use unique_ptr; else, use shared_ptr (which uses a reference counting mechanism to manage object lifetimes).
class A
{
public:
// A vector which owns the pointed Objects
typedef std::vector<std::shared_ptr<Object>> ObjectArray;
// Getter
const ObjectArray& MyArray() const
{
return m_myArray
}
// Setter
// (new C++11 move semantics pattern: pass by value and move from the value)
void MyArray(ObjectArray otherArray)
{
m_myArray = std::move(otherArray);
}
private:
ObjectArray m_myArray;
};

Item in multiple lists

So I have some legacy code which I would love to use more modern techniques. But I fear that given the way that things are designed, it is a non-option. The core issue is that often a node is in more than one list at a time. Something like this:
struct T {
T *next_1;
T *prev_1;
T *next_2;
T *prev_2;
int value;
};
this allows the core have a single object of type T be allocated and inserted into 2 doubly linked lists, nice and efficient.
Obviously I could just have 2 std::list<T*>'s and just insert the object into both...but there is one thing which would be way less efficient...removal.
Often the code needs to "destroy" an object of type T and this includes removing the element from all lists. This is nice because given a T* the code can remove that object from all lists it exists in. With something like a std::list I would need to search for the object to get an iterator, then remove that (I can't just pass around an iterator because it is in several lists).
Is there a nice c++-ish solution to this, or is the manually rolled way the best way? I have a feeling the manually rolled way is the answer, but I figured I'd ask.
As another possible solution, look at Boost Intrusive, which has an alternate list class a lot of properties that may make it useful for your problem.
In this case, I think it'd look something like this:
using namespace boost::intrusive;
struct tag1; struct tag2;
typedef list_base_hook< tag<tag1> > base1;
typedef list_base_hook< tag<tag2> > base2;
class T: public base1, public base2
{
int value;
}
list<T, base_hook<base1> > list1;
list<T, base_hook<base2> > list2;
// constant time to get iterator of a T item:
where_in_list1 = list1.iterator_to(item);
where_in_list2 = list2.iterator_to(item);
// once you have iterators, you can remove in contant time, etc, etc.
Instead of managing your own next/previous pointers, you could indeed use an std::list. To solve the performance of remove problem, you could store an iterator to the object itself (one member for each std::list the element can be stored in).
You can extend this to store a vector or array of iterators in the class (in case you don't know the number of lists the element is stored in).
I think the proper answer depends on how performance-critical this application is. Is it in an inner loop that could potentially cost the program a user-perceivable runtime difference?
There is a way to create this sort of functionality by creating your own classes derived from some of the STL containers, but it might not even be worth it to you. At the risk of sounding tiresome, I think this might be an example of premature optimization.
The question to answer is why this C struct exists in the first place. You can't re-implement the functionality in C++ until you know what that functionality is. Some questions to help you answer that are,
Why lists? Does the data need to be in sequence, i.e., in order? Does the order mean something? Does the application require ordered traversal?
Why two containers? Does membership in the container indicated some kind of property of the element?
Why a double-linked list specifically? Is O(1) insertion and deletion important? Is reverse-iteration important?
The answer to some or all of these may be, "no real reason, that's just how they implemented it". If so, you can replace that intrusive C-pointer mess with a non-intrusive C++ container solution, possibly containing shared_ptrs rather than ptrs.
What I'm getting at is, you may not need to re-implement anything. You may be able to discard the entire business, and store the values in proper C++ containers.
How's this?
struct T {
std::list<T*>::iterator entry1, entry2;
int value;
};
std::list<T*> list1, list2;
// init a T* item:
item = new T;
item->entry1 = list1.end();
item->entry2 = list2.end();
// add a T* item to list 1:
item->entry1 = list1.insert(<where>, item);
// remove a T* item from list1
if (item->entry1 != list1.end()) {
list1.remove(item->entry1); // this is O(1)
item->entry1 = list1.end();
}
// code for list2 management is similar
You could make T a class and use constructors and member functions to do most of this for you. If you have variable numbers of lists, you can use a list of iterators std::vector<std::list<T>::iterator> to track the item's position in each list.
Note that if you use push_back or push_front to add to the list, you need to do item->entry1 = list1.end(); item->entry1--; or item->entry1 = list1.begin(); respectively to get the iterator pointed in the right place.
It sounds like you're talking about something that could be addressed by applying graph theory. As such the Boost Graph Library might offer some solutions.
list::remove is what you're after. It'll remove any and all objects in the list with the same value as what you passed into it.
So:
list<T> listOne, listTwo;
// Things get added to the lists.
T thingToRemove;
listOne.remove(thingToRemove);
listTwo.remove(thingToRemove);
I'd also suggest converting your list node into a class; that way C++ will take care of memory for you.
class MyThing {
public:
int value;
// Any other values associated with T
};
list<MyClass> listOne, listTwo; // can add and remove MyClass objects w/o worrying about destroying anything.
You might even encapsulate the two lists into their own class, with add/remove methods for them. Then you only have to call one method when you want to remove an object.
class TwoLists {
private:
list<MyClass> listOne, listTwo;
// ...
public:
void remove(const MyClass& thing) {
listOne.remove(thing);
listTwo.remove(thing);
}
};