Optimize search in std::deque

Optimize search in std::deque - c++

I'm doing a program that has a different kind of objects and all of them are children of a virtual class. I'm doing this looking for the advantages of polymorphism that allow me to call from a manager class a certain method of all the objects without checking the specific kind of object it is.
The point is the different kind of objects need sometimes get a list of objects of a certain type.
In that moment my manager class loop thought all the objects and check the type of the object. It creates a list and return it like this:
std::list<std::shared_ptr<Object>> ObjectManager::GetObjectsOfType(std::string type)
{
std::list<std::shared_ptr<Object>> objectsOfType;
for (int i = 0; i < m_objects.size(); ++i)
{
if (m_objects[i]->GetType() == type)
{
objectsOfType.push_back(m_objects[i]);
}
}
return objectsOfType;
}
m_objects is a deque. I know iterate a data structure is normally expensive but I want to know if is possible to polish it a little bit because now this function takes a third of all the time used in the program.
My question is: is there any design pattern or fuction that I'm not taking into account in order to reduce the cost of this operation in my program?

In the code as given, there is just a single optimization that can be done locally:
for (auto const& obj : m_objects)
{
if (obj->GetType() == type)
{
objectsOfType.push_back(obj);
}
}
The rationale is that operator[] is generally not the most efficient way to access a deque. Having said that, I don't expect a major improvement. Your locality of reference is very poor: You're essentially looking at two dereferences (shared_ptr and string).
A logical approach would be to make m_objects a std::multimap keyed by type.

Some things you can do to speed up:
Store the type on the base class, this will remove a somewhat expensive virtual lookup.
If type is a string, etc. change to a
simpel type like an enum or int
A vector is more effiecient to
traverse than a deque
if staying with deque, use iterators or a range based for loop to avoid the random lookups (which are more expensive in deque)
Range based looks like this:
for (auto const& obj : m_objects)
{
if (obj->GetType() == type)
{
objectsOfType.push_back(obj);
}
}
Update: Also I would recommend against using a std::list (unless for some reason you have to) as it is not really performing well in many cases - again std::vector springs to the rescue !

Related

C++ class with container of pointers to internal data members: copying/assignment

Suppose I have a class Widget with a container data member d_members, and another container data member d_special_members containing pointers to distinguished elements of d_members. The special members are determined in the constructor:
#include <vector>
struct Widget
{
std::vector<int> d_members;
std::vector<int*> d_special_members;
Widget(std::vector<int> members) : d_members(members)
{
for (auto& member : d_members)
if (member % 2 == 0)
d_special_members.push_back(&member);
}
};
What is the best way to implement the copy constructor and operator=() for such a class?
The d_special_members in the copy should point to the copy of d_members.
Is it necessary to repeat the work that was done in the constructor? I hope this can be avoided.
I would probably like to use the copy-and-swap idiom.
I guess one could use indices instead of pointers, but in my actual use case d_members has a type like std::vector< std::pair<int, int> > (and d_special_members is still just std::vector<int*>, so it refers to elements of pairs), so this would not be very convenient.
Only the existing contents of d_members (as given at construction time) are modified by the class; there is never any reallocation (which would invalidate the pointers).
It should be possible to construct Widget objects with d_members of arbitrary size at runtime.
Note that the default assignment/copy just copies the pointers:
#include <iostream>
using namespace std;
int main()
{
Widget w1({ 1, 2, 3, 4, 5 });
cout << "First special member of w1: " << *w1.d_special_members[0] << "\n";
Widget w2 = w1;
*w2.d_special_members[0] = 3;
cout << "First special member of w1: " << *w1.d_special_members[0] << "\n";
}
yields
First special member of w1: 2
First special member of w1: 3

What you are asking for is an easy way to maintain associations as data is moved to new memory locations. Pointers are far from ideal for this, as you have discovered. What you should be looking for is something relative, like a pointer-to-member. That doesn't quite apply in this case, so I would go with the closest alternative I see: store indices into your sub-structures. So store an index into the vector and a flag indicating the first or second element of the pair (and so on, if your structure gets even more complex).
The other alternative I see is to traverse the data in the old object to figure out which element a given special pointer points to -- essentially computing the indices on the fly -- then find the corresponding element in the new object and take its address. (Maybe you could use a calculation to speed this up, but I'm not sure that would be portable.) If there is a lot of lookup and not much copying, this might be better for overall performance. However, I would rather maintain the code that stores indices.

The best way is to use indices. Honestly. It makes moves and copies just work; this is a very useful property because it's so easy to get silently wrong behavior with hand written copies when you add members. A private member function that converts an index into a reference/pointer does not seem very onerous.
That said, there may still be similar situations where indices aren't such a good option. If you, for example have a unordered_map instead of a vector, you could of course still store the keys rather than pointers to the values, but then you are going through an expensive hash.
If you really insist on using pointers rather that indices, I'd probably do this:
struct Widget
{
std::vector<int> d_members;
std::vector<int*> d_special_members;
Widget(std::vector<int> members) : d_members(members)
{
for (auto& member : d_members)
if (member % 2 == 0)
d_special_members.push_back(&member);
}
Widget(const Widget& other)
: d_members(other.d_members)
, d_special_members(new_special(other))
{}
Widget& operator=(const Widget& other) {
d_members = other.d_members;
d_special_members = new_special(other);
}
private:
vector<int*> new_special(const Widget& other) {
std::vector<int*> v;
v.reserve(other.d_special_members.size());
std::size_t special_index = 0;
for (std::size_t i = 0; i != d_members.size(); ++i) {
if (&other.d_members[i] == other.d_special_members[special_index]) {
v.push_back(&d_members[i});
++special_index;
}
}
return v;
}
};
My implementation runs in linear time and uses no extra space, but exploits the fact (based on your sample code) that there are no repeats in the pointers, and that the pointers are ordered the same as the original data.
I avoid copy and swap because it's not necessary to avoid code duplication and there just isn't any reason for it. It's a possible performance hit to get strong exception safety, that's all. However, writing a generic CAS that gives you strong exception safety with any correctly implemented class is trivial. Class writers should usually not use copy and swap for the assignment operator (there are, no doubt, exceptions).

This work for me for vector of pairs, though it's terribly ugly and I would never use it in real code:
std::vector<std::pair<int, int>> d_members;
std::vector<int*> d_special_members;
Widget(const Widget& other) : d_members(other.d_members) {
d_special_members.reserve(other.d_special_members.size());
for (const auto p : other.d_special_members) {
ptrdiff_t diff = (char*)p - (char*)(&other.d_members[0]);
d_special_members.push_back((int*)((char*)(&d_members[0]) + diff));
}
}
For sake of brevity I used only C-like type cast, reinterpret_cast would be better. I am not sure whether this solution does not result in undefined behavior, in fact I guess it does, but I dare to say that most compilers will generate a working program.

I think using indexes instead of pointers is just perfect. You don't need any custom copy code then.
For convenience you may want to define a member function converting the index to actual pointer you want. Then your members can be of arbitrary complexity.
private:
int* getSpecialMemberPointerFromIndex(int specialIndex)
{
return &d_member[specialIndex];
}

Is it a bad practice to return a std container from a interface class?

I have meet such a question.
I need to design a interface class, which looks like to be
struct IIDs
{
....
const std::set<int>& getAllIDs() = 0; //!< I want the collection of int to be sorted.
}
void foo()
{
const std::set<int>& ids = pIIDs->getAllIDs();
for(std::set<int>::const_iterator it = ids.begin();....;..) {
// do something
}
}
I think that return a std's container is a bit of inappropriate, for that it will force the implement to use a std::set to store the value of IDs, But If I write it as follow :
struct IIDs
{
....
int count() const = 0;
int at(int index) = 0; //!< the itmes should be sorted
}
void foo()
{
for (int i = 0; i < pIIDs->count(); ++i) {
int val = pIIDs->at(u);
...
}
}
I found that none of the std's containers could provide those requests:
the complexity of index lookup needed to less or equal than O(log n).
the complexity of insertion need to less or equal than O(log n).
the items must be sorted.
So I just have to use the example.1, Is those can be acceptable?

STL containers and template code in general should never be used across a DLL boundary.
The thing you have to keep in mind when returning complex types like STL containers is that if your call ever crosses the boundary between two different DLLs (or a DLL and an application) running different memory managers your application will most likely crash spectacularly.
The templates that make up the STL code will be executed within the implementation DLL, creating all the memory used by the container there. Later when it leaves scope in your calling code, your own memory manager will attempt to deallocate memory it doesn't own, resulting in a crash.
If you know your code won't cross DLL boundaries, and will only ever be called in the context of a single memory manager, then you're fine as far as memory management is concerned.
However, even in cases where you're only returning references, such as your example above, where the lifetime of the container would be entirely managed by the interface implementation code, unless you know that the exact same version of the STL and the exact same compiler and linker settings were used for compiling the implementation as the caller, you're asking for trouble.

The problem i see is you are returning the collection by const references, that mean that you have a member of that collection type and are returning a reference to it, if you are returning a local variable to the function (invalid memory access problems).
If it's a member variable is better provide access to begin and end iterator. If is local variable you could returned by value (C++11 should optimize and no copy anything). If it's DLL boundary try for all mean not use any C++ types, only C types.

In terms of design, and for good generic code, prefer the STL way: return iterators, leaving the container type an implementation detail of IIDs, and hide your types with typdefs
struct IIDs
{
typedef std::set<int> Container;
typedef Container::iterator IDIterator;
// We only expose iterators to the data
IDIterator begin(); //!< I want the collection of int to be sorted.
IDIterator end();
// ...
};

There are various approaches:
if you want to minimise the coupling of client code on the IIDs implementation and ensure iteration is completed while the IIDs object still exists, then use a visitor pattern: the calling code just has to supply some function to be called for each of the member elements in turn and is not responsible for the iteration itself
Visitor example:
struct IIDs
{
template <typename T>
void visit(T& t)
{
for (int i : ids_) t(i);
}
...
private:
std::set<int> ids_;
};
if you want to give the caller more freedom to mix other code in with the container traversal, and have multiple concurrent independent traversals, then provide iterators, but be aware that the client code could keep an iterator hanging around longer than the IIDs object itself - you may or may not want to handle that scenario gracefully

C++ Iterating over a list of a certain subclass

I have two classes. The superclass is a "Component" class, and the subclass is a "Transform" class.
The framework I'm using has a function that returns a list of components of a certain type. However, the list will return them as Component, since the type isn't restricted to a specific subclass (however it's the way I'm using it).
So, in the following scenario, I know that all the returned components will be of the Transform subclass. What I'm doing is I'm iterating over the list and then casting each component to Transform. Here is my code:
std::list<Cistron::Component*,std::allocator<Cistron::Component*>> playerTransforms = objectManager->getComponents(player,"Transform");
std::list<Cistron::Component*>::iterator playerComponentIterator = playerTransforms.begin();
for (playerComponentIterator; playerComponentIterator != playerTransforms.end(); playerComponentIterator++)
{
Transform *tmpTransform = static_cast<Transform*> (*playerComponentIterator);
std::cout << tmpTransform->x ;
std::cout << tmpTransform->y ;
}
How efficient is this? I'm quite new to C++, so I have no idea if there's a better way of doing this.

This isn't a good design, your compiler should generate a warning in this case. Normally, you should upcast your pointer using dynamic_cast. This cast has some runtime cost - aproximately the same as virtual method call but it will generate exception if you try to cast incompatible pointers.
Try to redesign your app to eliminate this code. You should only call virtual methods of the Component class, you shouldn't cast pointer to Component to pointer to Transform. This thing indicate bad design.
One possible desigion is to make getComponents a template method to eliminate cast:
template<class T>
list<T*> getComponents(Player* player, std::string name) {
...
}
or maybe just this:
list<Transform*> getTransfromComponents(Player* player) {...}
In a case when you can't refactor this code, you can always transform your list:
list<Component*> rlist = ...
list<Transform*> llist;
// Upcast all
transform(rlist.begin(),
rlist.end(),
back_inserter(llist),
[](Component* r) {
return dynamic_cast<Transform*>(r);
});
// Remove all null values
llist.remove(nullptr);

The std::list is usually implemented as double-linked list, which means that elements are scattered through the memory, which means that iterating through it is slow. Check: Why is it so slow iterating over a big std::list?
But what I would worry more about is the use of reflection:
objectManager->getComponents(player,"Transform");
that might actually be the real bottleneck of this piece of code.

Store pointers to objects in multiple containers

For the sake of presenting my question, let's assume I have a set of pointers (same type)
{p1, p2, ..., pn}
I would like to store them in multiple containers as I need different access strategy to access them. Suppose I want to store them in two containers, linked list and a hash table. For linked list, I have the order and for hash table I have the fast access. Now, the problem is that if I remove a pointer from one container, I'll need to remember to remove from other container. This makes the code hard to maintain. So the question is that are there other patterns or data structures to manage situation like this? Would smart pointer help here?

If I understand correctly, you want to link your containers so that removing from one removes from all. I don't think this is directly possible. Possible solutions:
re-design whole object architecture, so pointer is not in many containers.
Use Boost Multi-index Containers Library to achieve all features you want in one container.
Use a map key instead of direct pointer to track objects, and keep the pointer itself in one map.
use std::weak_ptr so you can check if item has been deleted somewhere else, and turn it to std::shared_ptr while it is used (you need one container to have "master" std::shared_ptr to keep object around when not used)
create function/method/class to delete objects, which knows all containers, so you don't forget accidentally, when all deletion is in one place.

Why don't you create your own class which contains the both std::list and std::unordred_map and provide accessing functions and provide removal functions in a way that you can access them linearly with the list and randomly with the unordred_map, and the deletion will be deleting from both containers and insertion will insert to both. ( kind of a wrapper class :P )
Also you can consider about using std::map, and providing it a comparison function which will always keep your data structure ordered in the desired way and also you can randomly access the elements with log N access time.

As usually, try to isolate this logic to make things easier to support. Some small class with safe public interface (sorry, I didn't compile this, it is just a pseudocode).
template<class Id, Ptr>
class Store
{
public:
void add(Id id, Ptr ptr)
{
m_ptrs.insert(ptr);
m_ptrById.insert(std::make_pair(id, ptr));
}
void remove(Ptr ptr)
{
// remove in sync as well
}
private:
std::list<Ptr> m_ptrs;
std::map<Id, Ptr> m_ptrById;
};
Then use Store for keeping your pointers in sync.

If I understand your problem correctly, you are less concern with memory management (new/delete issue) and more concern with the actual "book keeping" of which element is valid or not.
So, I was thinking of wrapping each point with a "reference counter"
template< class Point >
class BookKeeping {
public:
enum { LIST_REF = 0x01,
HASH_REF = 0x02 };
BookKeeping( const Point& p ): m_p(p), m_refCout( 0x3 ) {} // assume object created in both containers
bool isValid() const { return m_refCount == 0x3; } // not "freed" from any container
void remove( unsigned int from ) { m_refCount = m_refCount & ! from ; }
private:
Point m_p;
unsigned int m_refCount;
};

See the answer (the only one, by now) to this similar question. In that case a deque is proposed instead of a list, since the OP only wanted to insert/remove at the ends of the sequence.
Anyway, you might prefer to use the Boost Multi-index Containers Library.

Best way to return list of objects in C++?

It's been a while since I programmed in C++, and after coming from python, I feel soooo in a straight jacket, ok I'm not gonna rant.
I have a couple of functions that act as "pipes", accepting a list as input, returning another list as output (based on the input),
this is in concept, but in practice, I'm using std::vector to represent the list, is that acceptable?
further more, I'm not using any pointers, so I'm using std::vector<SomeType> the_list(some_size); as the variable, and returning it directly, i.e. return the_list;
P.S. So far it's all ok, the project size is small and this doesn't seem to affect performance, but I still want to get some input/advice on this, because I feel like I'm writing python in C++.

The only thing I can see is that your forcing a copy of the list you return. It would be more efficient to do something like:
void DoSomething(const std::vector<SomeType>& in, std::vector<SomeType>& out)
{
...
// no need to return anything, just modify out
}
Because you pass in the list you want to return, you avoid the extra copy.
Edit: This is an old reply. If you can use a modern C++ compiler with move semantics, you don't need to worry about this. Of course, this answer still applies if the object you are returning DOES NOT have move semantics.

If you really need a new list, I would simply return it. Return value optimization will take care of no needless copies in most cases, and your code stays very clear.
That being said, taking lists and returning other lists is indeed python programming in C++.
A, for C++, more suitable paradigm would be to create functions that take a range of iterators and alter the underlying collection.
e.g.
void DoSomething(iterator const & from, iterator const & to);
(with iterator possibly being a template, depending on your needs)
Chaining operations is then a matter of calling consecutive methods on begin(), end().
If you don't want to alter the input, you'd make a copy yourself first.
std::vector theOutput(inputVector);
This all comes from the C++ "don't pay for what you don't need" philosophy, you'd only create copies where you actually want to keep the originals.

I'd use the generic approach:
template <typename InIt, typename OutIt>
void DoMagic(InIt first, InIt last, OutIt out)
{
for(; first != last; ++first) {
if(IsCorrectIngredient(*first)) {
*out = DoMoreMagic(*first);
++out;
}
}
}
Now you can call it
std::vector<MagicIngredients> ingredients;
std::vector<MagicResults> result;
DoMagic(ingredients.begin(), ingredients.end(), std::back_inserter(results));
You can easily change containers used without changing the algorithm used, also it is efficient there's no overhead in returning containers.

If you want to be really hardcore, you could use boost::tuple.
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
But since it seems all your objects are of a single, non-polymorphic type, then the std::vector is all well and fine.
If your types were polymorphic (inherited classes of a base class) then you'd need a vector of pointers, and you'd need to remember to delete all the allocated objects before throwing away your vector.

Using a std::vector is the preferably way in many situations. Its guaranteed to use consecutive memory and is therefor pleasant for the L1 cache.
You should be aware of what happends when your return type is std::vector. What happens under the hood is that the std::vector is recursive copied, so if SomeType's copy constructor is expensive the "return statement" may be a lengthy and time consuming operation.
If you are searching and inserting a lot in your list you could look at std::set to get logarithmic time complexity instead of linear. (std::vectors insert is constant until its capacity is exceeded).
You are saying that you have many "pipe functions"... sounds like an excellent scenario for std::transform.

Another problem with returning a list of objects (opposed to working on one or two lists in place, as BigSandwich pointed out), is if your objects have complex copy constructors, those will called for each element in the container.
If you have 1000 objects each referencing a hunk of memory, and they copy that memory on Object a, b; a=b; that's 1000 memcopys for you, just for returning them contained in a container. If you still want to return a container directly, think about pointers in this case.

It works very simple.
list<int> foo(void)
{
list<int> l;
// do something
return l;
}
Now receiving data:
list<int> lst=foo();
Is fully optimal because compiler know to optimize constructor of lst well. and
would not cause copies.
Other method, more portable:
list<int> lst;
// do anything you want with list
lst.swap(foo());
What happens: foo already optimized so there is no problem to return the value. When
you call swap you set value of lst to new, and thus do not copy it. Now old value
of lst is "swapped" and destructed.
This is the efficient way to do the job.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Optimize search in std::deque - c++

Related

C++ class with container of pointers to internal data members: copying/assignment

Is it a bad practice to return a std container from a interface class?

C++ Iterating over a list of a certain subclass

Store pointers to objects in multiple containers

Best way to return list of objects in C++?

Categories

Resources