I have class:
class A{
//fields, methods
;
I need an efficient data structure that allows you to choose from a variety of pointers to objects of class A minima and maxima (it should work online, that is the choice of questions will alternate with requests for adding new poiters). This can be done by using two priority queues:
priority_queue<A*, vector<A*>, ComparatorForFindingLightestObjects>* qL;
priority_queue<A*, vector<A*>, ComparatorForFindingHardestObjects>* qH;
The problem is that if the object pointer is extracted from the first queue, then after a while the object is destroyed, but since a pointer to the object is still present in another queue there happens errors of reading data from the freed memory.
How solve this problem by means of the standard STL containers without writing own data structures?
I believe you're looking for boost::multi_index which is a single container accessible but multiple different "views": http://www.boost.org/doc/libs/1_59_0/libs/multi_index/doc/index.html
I think you can use std::set and delete the entry from the second set as soon as you extract the data from the first. Performance wise, both give O(log(n)) lookup and insertion. I'm not sure if this is what you want but i'll try
//Use std::set as your priority queue instead
set<A*, ComparatorForFindingLightestObjects> qL;
set<A*, ComparatorForFindingHardestObjects> qH;
auto it=qL.begin(); //The first element
if(it!=aL.end())
{
A* curr=*it;
qL.erase(curr); //Delete it from this
qH.erase(curr); //Delete this from the other queue as well
}
Also, I think you can merge your two queues or whatever and just maintain one container. You can access the minimum and maximum elements by *containerName.begin() and *containerName.rbegin() respectively
Related
I am looking for a data structure that preserves the order in which the elements were inserted and offers a fast "contains" predicate. I also need iterator and random access. Performance during insertion or deletion is not relevant. I am also willing to accept overhead in terms of memory consumption.
Background: I need to store a list of objects. The objects are instances of a class called Neuron and stored in a Layer. The Layer object has the following public interface:
class Layer {
public:
Neuron *neuronAt(const size_t &index) const;
NeuronIterator begin();
NeuronIterator end();
bool contains(const Neuron *const &neuron) const;
void addNeuron(Neuron *const &neuron);
};
The contains() method is called quite often when the software runs, I've asserted that using callgrind. I tried to circumvent some of the calls to contains(), but is still a hot spot. Now I hope to optimize exactly this method.
I thought of using std::set, using the template argument to provide my own comparator struct. But the Neuron class itself does not give its position in the Layer away. Additionally, I'd like to have *someNeuronIterator = anotherNeuron to work without screwing up the order.
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); at least I'd have to change it to point to the new copy I created to keep things linear aligned. Right?
Another idea was to use two data structures in the Layer object: A vector/list for the order, and a map/hash for lookup. But that would contradict my wish for an iterator that allowed operator* without a const reference, wouldn't it?
I hope somebody can hint an idea for a data structure or a concept that would satisfy my needs, or at least give me an idea for an alternative. Thanks!
If this contains check is really where you need the fastest execution, and assuming you can be a little intrusive with the source code, the fastest way to check if a Neuron belongs in a layer is to simply flag it when you insert it into a layer (ex: bit flag).
You have guaranteed O(1) checks at that point to see if a Neuron belongs in a layer and it's also fast at the micro-level.
If there can be numerous layer objects, this can get a little trickier, as you'll need a separate bit for each potential layer a neuron can belong to unless a Neuron can only belong in a single layer at once. This is reasonably manageable, however, if the number of layers are relatively fixed in size.
If the latter case and a Neuron can only belong to one layer at once, then all you need is a backpointer to Layer*. To see if a Neuron belongs in a layer, simply see if that backpointer points to the layer object.
If a Neuron can belong to multiple layers at once, but not too many at one time, then you could store like a little array of backpointers like so:
struct Neuron
{
...
Layer* layers[4]; // use whatever small size that usually fits the common case
Layer* ptr;
int num_layers;
};
Initialize ptr to point to layers if there are 4 or fewer layers to which the Neuron belongs. If there are more, allocate it on the free store. In the destructor, free the memory if ptr != layers. You can also optimize away num_layers if the common case is like 1 layer, in which case a null-terminated solution might work better. To see if a Neuron belongs to a layer, simply do a linear search through ptr. That's practically constant-time complexity with respect to the number of Neurons provided that they don't belong in a mass number of layers at once.
You can also use a vector here but you might reduce cache hits on those common case scenarios since it'll always put its contents in a separate block, even if the Neuron only belongs to like 1 or 2 layers.
This might be a bit different from what you were looking for with a general-purpose, non-intrusive data structure, but if your performance needs are really skewed towards these kinds of set operations, an intrusive solution is going to be the fastest in general. It's not quite as pretty and couples your element to the container, but hey, if you need max performance...
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); [...]
Yes, but it won't invalidate indices. While not as convenient to use as pointers, if you're working with mass data like vertices of a mesh or particles of an emitter, it's common to use indices here to avoid the invalidation and possibly to save an extra 32-bits per entry on 64-bit systems.
Update
Given that Neurons only exist in one Layer at a time, I'd go with the back pointer approach. Seeing if a neuron belongs to a layer becomes a simple matter of checking if the back pointer points to the same layer.
Since there's an API involved, I'd suggest, just because it sounds like you're pushing around a lot of data and have already profiled it, that you focus on an interface which revolves around aggregates (layers, e.g.) rather than individual elements (neurons). It'll just leave you a lot of room to swap out underlying representations when your clients aren't performing operations at the individual scalar element-type interface.
With the O(1) contains implementation and the unordered requirement, I'd go with a simple contiguous structure like std::vector. However, you do expose yourself to potential invalidation on insertion.
Because of that, if you can, I'd suggest working with indices here. However, that become a little unwieldy since it requires your clients to store both a pointer to the layer in which a neuron belongs in addition to its index (though if you do this, the backpointer becomes unnecessary as the client is tracking where things belong).
One way to mitigate this is to simply use something like std::vector<Neuron*> or ptr_vector if available. However, that can expose you to cache misses and heap overhead, and if you want to optimize that, this is where the fixed allocator comes in handy. However, that's a bit of a pain with alignment issues and a bit of a research topic, and so far it seems like your main goal is not to optimize insertion or sequential access quite as much as this contains check, so I'd start with the std::vector<Neuron*>.
You can get O(1) contains-check, O(1) insert and preserve insertion order. If you are using Java, looked at LinkedHashMap. If you are not using Java, look at LinkedHashMap and figure out a parallel data structure that does it or implement it yourself.
It's just a hashmap with a doubly linked list. The link list is to preserve order and the hashmap is to allow O(1) access. So when you insert an element, it makes an entry with the key, and the map will point to a node in the linked list where your data will reside. To look up, you go to the hash table to find the pointer directly to your linked list node (not the head), and get the value in O(1). To access them sequentially, you just traverse the linked list.
A heap sounds like it could be useful to you. It's like a tree, but the newest element is always inserted at the top, and then works its way down based on its value, so there is a method to quickly check if it's there.
Otherwise, you could store a hash table (quick method to check if the neuron is contained in the table) with the key for the neuron, and values of: the neuron itself, and the timestep upon which the neuron was inserted (to check its chronological insertion time).
I have a tree-like structure using Node objects with references to other Node objects. Node is a class. Now, one of the routines I'm writing needs a minimum priority queue, which I'm implementing using std.container.BinaryHeap and std.container.Array. I'm instantiating it as follows:
Node[] r;
auto heap = BinaryHeap!(Array!(Node), "a > b")(Array!Node(r));
As part of the routine, I insert elements into heap using insert and remove elements from it using removeAny. Now, the routine works correctly, but afterwards, the tree-like structure breaks (my invariants for it fail), due to nodes being missing. What's going on here and why is this happening?
could be http://d.puremagic.com/issues/show_bug.cgi?id=6998 - std.container.Array destroys class instances
Im developing an A* for the first time, and I was using a priority_queue for the open set, until I realize you need to check if nodes are in the open set too, not just the close one.
Thing is, you cant iterate over a priority queue..So why everyone recommend a priority queue for the open set? Is it yet the best option? I think the only way to iterate over it is making a copy so I can pop everything from it (enormous cost).
What the best data structure to use on A*?
A priority queue (PQ) is an abstract data structure (ADS). There are many possibilities to implement them. Unfortunately, the priority_queue supplied with the C++ standard library is rather limited, and other implementations are suited a lot better for implementing A*. Spoilers: you can use std::set/multiset instead of std::priority_queue. But read on:
So what do you need from the priority queue to implement A* is:
Get the node with the lowest cost
Decrease the costs of arbitrary elements
Any priority queue can do 1., but for 2., you need a "mutable" priority queue. The Standard-Lib one cannot do this. Also, you need an easy way to find entries in the priority queue, to find out where to decrease the keys (For when A* finds a better path to an already opened node). There are two basic ways for this: You store a handle to the priority queue element within your graph node (or use a map to store those handles for each graph node) - or you insert the graph nodes themselves.
For the first case, where you store handles for each node, you can use std::multiset for your priority queue. std::multiset::first() will always be your "lowest cost" key, and you can decrease a key by removing it from the set, changing the value and re-inserting, and updating the handle. Alternatively, you can use the mutable priority queues from Boost.Heap, which directly support "decrease-key".
For the second case, you would need some kind of "intrusive" binary tree - since your pathfinding nodes themselves need to be in the priority queue. If you don't want to roll your own, see the ordered associative containers in Boost.Intrusive.
The subject is very large, I suggest you reading this page if you want to know the different possibilities and have a good understanding of which data structure is adapted to your situation :
http://theory.stanford.edu/~amitp/GameProgramming/ImplementationNotes.html#set-representation
In my case, the binary heap was a good balance between difficulty to implement and performances, which was totally what I was looking for. But maybe you are looking for something different ?
The rest of the document is a very good reference for A* for game development
http://theory.stanford.edu/~amitp/GameProgramming/index.html
They mean A priority queue not necessarily the std::priority_queue class that comes with the language. If the built in one doesn't do what you need it to write your own, or find another.
I have about 18 million elements in an array that are initialized and ready to be used by a simple manager called ElementManager (this number will later climb to a little more than a billion in later iterations of the program). A class, A, which must use the elements communicates with ElementManager that returns the next available element for consumption. That element is now in use and cannot be reused until recycled, which may happen often. Class A is concurrent, that is, it can ask ElementManager for an available element in several threads. The elements in this case is an object that stores three vertices to make a triangle.
Currently, the ElementManager is using Intel TBB concurrent_bounded_queue called mAllAvailableElements. There is also another container (a TBB concurrent_vector) that contains all elements, regardless of whether they are available for use or not, called mAllElements. Class A asks for the next available element, the manager tries to pop the next available element from the queue. The popped element is now in use.
Now when class A has done what it has to do, control is handed to class B which now has to iterate through all elements that are in use and create meshes (to take advantage of concurrency, the array is split into several smaller arrays to create submeshes which scales with the number of available threads - the reason for this is that creating a mesh must be done serially). For this I am currently iterating over the container mAllElements (this is also concurrent) and grabbing any element that is in use. The elements, as mentioned above, contain polygonal information to create meshes. Iteration in this case takes a long time as it has to check each element and query whether it is in use or not, because if it is not in use then it should not be part of a mesh.
Now imagine if only 1 million out of the possible 18 million elements were in use (but more than 5-6 million were recycled). Worse yet, due to constant updates to only part of the mesh (which happens concurrently) means the in use elements are fragmented throughout the mAllElements container.
I thought about this for quite some time now and one flawed solution that I came up with was to create another queue of elements named mElementsInUse, which is also a concurrent_queue. I can push any element that is now in use. Problem with this approach is that since it is a queue, any element in that queue can be recycled at any time (an update in a part of the mesh) and declared not in use and since I can only pop the front element, this approach fails. The only other approach I can think of is to defragment the concurrent_vector mAllElements every once in a while when no operations are taking place.
I think my approach to this problem is wrong and thus my post here. I hope I explained the problem in enough detail. It seems like a common memory management problem, but I cannot come up with any search terms to search for it.
How about using a bit vector to indicate which of your elements are in use? It's easy to partition it for parallel processing when building your full mesh, and you can use atomic operations on words in the vector and thus avoid locks.
I have a college programming project in C++ divided into two parts. I beggining the second part where it's supposed to use priority_queues, hash tables and BST's.
I'm having trouble (at least) with priority queues since it's obligating myself to redone a lot of code already implemented in the first part.
The project it's about implementing a simple airport management system and, therefore, I have classes like Airport (main class), Airplane, Terminal and Flight. My airport had a list of terminals but now the project specification points out that I must keep the terminals in a priority_queue where the top contains the terminal less occupied, i.e has less flights.
For each class, I have CRUD functions but now how am I supposed, for example, edit a terminal and add a flight to it? With a list, I just had to iterate to a specific position but now I only have access to object in the top of the queue. The solution I thought about was to copy the priority queue terminals to a temporary list but, honestly, I don't like this approach.
What should I do?
Thanks in advance.
It sounds like you need a priority queue with efficient increase and decrease key operations. You might be better of creating you own your own priority queue implementation.
The priority_queue container is great for dynamic sets. But since the number of terminal in an airport are pretty much fixed you can a fixed size container with the heap family of algorithms.
As the internal storage, you could use any container that provides random access iterators (vector, array, deque). Then, use make_heap(), sort_heap() family of functions to heapify the array. Now you can cheaply access the top(), modify the priority of a random member in the heap and iterate through all elements easily.
For an example see:
http://www.cplusplus.com/reference/algorithm/make_heap/