Best sorted c++ container [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am writing a chess engine in c++, and am trying to make as clean and correct code as possible, as this is a learning exercise.
Currently, I have a move class, which defines a possible move. The AI then scores every possible move. What is the best way to pair the score of the move with the move itself in a data structure?
It has to be able to have more than one move per a score (two moves can both have score of 735). I think that rules out std::map?
It should also be quickly sortable, so I can look ahead and recursively do this for the best moves.
Any help would be appreciated, including links. thanks!

Your question isn't entirely clear. On one hand, you say you want a sorted container, but on the other, the way you talk about things is that you're going to generate moves, put them into a container, then sort them according to criteria defined by your AI.
Let's consider those separately. For the first, we'll assume you want to use the scores as a key, and look up the moves that go with particular scores. In this scenario, you'll generate a move, the AI will score that move, then you'll store the move, with its score as the key. Since you can have multiple moves with the same score (i.e., equivalent keys), the data structure you want for this case is an std::multimap.
The other possibility would be that you generate all the moves, put them all into a data structure, score them all, then sort them all by score. For this scenario, you probably want to use an std::vector<std::pair<score_type, move>>. In this case, when you generate each move you'd probably assign it a score of something like 0. Then you'd walk through the vector and have the AI generate a score for each move. Then you'd sort them, using a comparison function that only considers the score.
Either of these could work. The one that's preferable would depend on circumstances. Using a vector is likely minimize overhead--that is, it'll use the least memory and least CPU time to get from raw moves to a vector with all the moves stored in sorted order.
The strength of an std::multiset is that it stays sorted all the time. For example, if you want to generate moves until you reach some time limit, it'll let you accomplish that quite cleanly--generate a move, score it, insert it into the multiset. No matter when you stop, all the moves you generated up to that point are already sorted, so (for example) if the person playing against your program can force the AI to make a move immediately, the AI always has a record of the best move its found yet, so it can immediately make the move it "thinks" is best.
Another possibility would be to use a priority queue. In a typical case for chess, one thing you'll do is generate (say) a couple dozen or possible next moves. Then you'll pick the best of those, and score possible counter-moves to those. Then you'll pick the best of those and score the counters to those moves, and so on until you've scored (say) 4 or 5 full moves deep.
For this, you don't really care about having all the moves in order--you just want to be able to retrieve the N best moves quickly. For that case, a priority queue works quite well. You can retrieve the N best moves, then ignore the rest. This means you only fully sort the N best moves (the ones you care about) and minimize the overhead for the rest, but only doing enough to verify that they have lower scores.
I should also mention that if this is what you really want, you can accomplish the same in the array case. Instead of using sort to sort all the moves into order by score, you can use nth_element to find only the N best scores. nth_element arranges an array/vector into two groups: those that would sort before some selected element, then the selected element, then the ones that would sort after that selected element. For example, given 100 moves, of which you want to keep the top 5, you'd use nth_element to arrange them into the 95 lesser moves, the 95th element, then the other 4. No attempt is made at ordering the items within each group though.
The advantage of this is that it can be completed in O(N) time instead of the O(N log N) needed for a complete sort.
Between these two possibilities (priority_queue vs. nth_element) we get much the same tradeoff as between set::multiset and std::vector with std::sort: the priority_queue maintains its order at all times. It remains quite efficient, even if you intermingle insertions and removals more or less arbitrarily. With a std::vector and std::nth_element, you normally want to insert all the elements, then call nth_element, then consider the top items. If you were going to mix the two (insert some elements, then remove a few of the best, insert some more, remove a few more, etc.) you'd have to call nth_element every time you transitioned from inserting to removing, which could kill the efficiency fairly quickly.

It sounds like what you are looking for is a priority queue.
They are usually implemented using Heap (Fibonacci heap if you want efficiency). The heap itself is not completely sorted, but you are guaranteed to get the best move at the top, at any given moment.
Boost has a Fibonacci heap implementation.
You can look at this question. MyType in that question can be a std::pair of Data and Priority

std::set does what you need. std::set > where X is the score and Y is the class object or you can define your own custom comparator. see this: Using custom std::set comparator

Related

LRU sorted by score in C++, is there such container?

I need to implement a very efficient cache LRU with the following properties: entries are indices in a vector of cache entries, each cache hit updates an empirical score, computed from some values that can be kept in the container value, like number of hits, size of matched object etc.
I need to be able to quickly pick a victim for cache eviction from the bottom of such LRU, and be able to quickly iterate over some number of the best-performing entries from the top, so such container needs to be sorted.
So far, I was only be able to come up with a vector of structures that hold values for score calculation that are updated, and bi-directional links, which I use to put the updated element in place after score recalculation by linear search from its current position and score comparison. This search may obviously happen upwards (when the score is updated, always getting bigger) and downwards (when an element is evicted, and its score resets to 0). Linear search may not be so bad, because this is running for a long time, and scores of elements that survive grow large, and each increment is small, so the element does not have to move very far to get to its place, and in case of reset I can start search from bottom.
I am aware of STL sorted containers, folly's cache LRU implementation, and Boost.Bimap (this last one seems to be an overkill for what I need).
Can I do better than a linear search here? Does anyone know of an implementation?
Thanks in advance!
Update: implemented a solution that involves a vector of iterators into a set that has index into the vector (for uniqueness) + necessary data to compute the score, with comparator sorting by the score.
Seems to work well, maybe there is a more elegant solution out there?

3D-Grid of bins: nested std::vector vs std::unordered_map

pros, I need some performance-opinions with the following:
1st Question:
I want to store objects in a 3D-Grid-Structure, overall it will be ~33% filled, i.e. 2 out of 3 gridpoints will be empty.
Short image to illustrate:
Maybe Option A)
vector<vector<vector<deque<Obj>> grid;// (SizeX, SizeY, SizeZ);
grid[x][y][z].push_back(someObj);
This way I'd have a lot of empty deques, but accessing one of them would be fast, wouldn't it?
The Other Option B) would be
std::unordered_map<Pos3D, deque<Obj>, Pos3DHash, Pos3DEqual> Pos3DMap;
where I add&delete deques when data is added/deleted. Probably less memory used, but maybe less fast? What do you think?
2nd Question (follow up)
What if I had multiple containers at each position? Say 3 buckets for 3 different entities, say object types ObjA, ObjB, ObjC per grid point, then my data essentially becomes 4D?
Another illustration:
Using Option 1B I could just extend Pos3D to include the bucket number to account for even more sparse data.
Possible queries I want to optimize for:
Give me all Objects out of ObjA-buckets from the entire structure
Give me all Objects out of ObjB-buckets for a set of
grid-positions
Which is the nearest non-empty ObjC-bucket to
position x,y,z?
PS:
I had also thought about a tree based data-structure before, reading about nearest neighbour approaches. Since my data is so regular I had thought I'd save all the tree-building dividing of the cells into smaller pieces and just make a static 3D-grid of the final leafs. Thats how I came to ask about the best way to store this grid here.
Question associated with this, if I have a map<int, Obj> is there a fast way to ask for "all objects with keys between 780 and 790"? Or is the fastest way the building of the above mentioned tree?
EDIT
I ended up going with a 3D boost::multi_array that has fortran-ordering. It's a little bit like the chunks games like minecraft use. Which is a little like using a kd-tree with fixed leaf-size and fixed amount of leaves? Works pretty fast now so I'm happy with this approach.
Answer to 1st question
As #Joachim pointed out, this depends on whether you prefer fast access or small data. Roughly, this corresponds to your options A and B.
A) If you want fast access, go with a multidimensional std::vector or an array if you will. std::vector brings easier maintenance at a minimal overhead, so I'd prefer that. In terms of space it consumes O(N^3) space, where N is the number of grid points along one dimension. In order to get the best performance when iterating over the data, remember to resolve the indices in the reverse order as you defined it: innermost first, outermost last.
B) If you instead wish to keep things as small as possible, use a hash map, and use one which is optimized for space. That would result in space O(N), with N being the number of elements. Here is a benchmark comparing several hash maps. I made good experiences with google::sparse_hash_map, which has the smallest constant overhead I have seen so far. Plus, it is easy to add it to your build system.
If you need a mixture of speed and small data or don't know the size of each dimension in advance, use a hash map as well.
Answer to 2nd question
I'd say you data is 4D if you have a variable number of elements a long the 4th dimension, or a fixed large number of elements. With option 1B) you'd indeed add the bucket index, for 1A) you'd add another vector.
Which is the nearest non-empty ObjC-bucket to position x,y,z?
This operation is commonly called nearest neighbor search. You want a KDTree for that. There is libkdtree++, if you prefer small libraries. Otherwise, FLANN might be an option. It is a part of the Point Cloud Library which accomplishes a lot of tasks on multidimensional data and could be worth a look as well.

C++: Best Way To Hold Entity List vector vs set [duplicate]

This question already has answers here:
What is the difference between std::set and std::vector?
(5 answers)
Closed 9 years ago.
I have implemented an entity-component system to manage my entities in a game engine, managing components using std::map. Now the main part is to hold all entities in the CWorld class. Those entities can be directly accessed with index by CWorld, to add component etc. And also, all entities will be iterated each tick for update and render operations. And the list will not be static as you can guess, I mean entities can die, so need to be removed from list at some point. At this point, I need to ask the differences between std::vector and set, or any suggestion to hold entities.
If you want to access the elements by index, and if old elements may die such that the range of active indices may contain gaps, then a map is probably the easiest solution. Map also allows you to iterate and remove single entities easily.
List is easy to remove a single entity but hard to look-up a specific entity by index.
Vector is difficult to remove a single entity without leaving gaps. By plugging the gap, the index of entities after the removal will change.
While you would be better off asking this spin of this question on the game development stack exchange, I'll add this: STL only makes promises about scaling, it does not make promises about actual costs. O(1) is the right hand of an expression: "x = O(1)" where "O" is the cost of "an operation". "fopen" is an operation, that costs O(1). Is O(1) cheap?
You'll need to run perf analysis on your code and look at the specs for each of the classes - you may even need to write your own, since the STL containers are generic and not optimized for any specific usage case.
The thing to focus on is frequency of operations - read, write, manage. Assuming a tick is every 10ms, it seems unlikely that you are going to remove entities anywhere near as often as you walk the list. Are you implementing your own scenegraph? If so, the granularity is going to determine how often entities move between spaces, and that should be a factor in the patterns you employ: if the frequency of management overhead is sufficiently low, you can afford to have management structures that have a high cost, but will still want to avoid something that's O(N) unless O() is very, very cheap.

How to find largest values in C++ Map

My teacher in my Data Structures class gave us an assignment to read in a book and count how many words there are. Thats not all; we need to display the 100 most common words. My gut says to sort the map, but I only need 100 words from the map. After googling around, is there a "Textbook Answer" to sorting maps by the value and not the key?
I doubt there's a "Textbook Answer", and the answer is no: you can't sort maps by value.
You could always create another map using the values. However, this is not the most efficient solution. What I think would be better is for you to chuck the values into a priority_queue, and then pop the first 100 off.
Note that you don't need to store the words in the second data structure. You can store pointers or references to the word, or even a map::iterator.
Now, there's another approach you could consider. That is to maintain a running order of the top 100 candidates as you build your first map. That way there would be no need to do the second pass and build an extra structure which, as you pointed out, is wasteful.
To do this efficiently you would probably use a heap-like approach and do a bubble-up whenever you update a value. Since the word counts only ever increase, this would suit the heap very nicely. However, you would have a maintenance issue on your hands. That is: how you reference the position of a value in the heap, and keeping track of values that fall off the bottom.

How to repeatedly insert elements into a sorted list fast

I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.