Efficient way of threading an octree such that the pointers contained by each octcell in an oct make it easy in the traversal through the tree at the same level.
We have to make use of fully threaded trees here so that i can use openmp to parallelize the code at the same level.
I have some experience with oct-trees and coded several myself. The basic problem is that the tree has (at least) two directions of traversal: horizontal (between daughter cells) and vertical (between mother and daughter cells), which cannot be mapped to linear memory. Thus, traversing the tree (for example for neighbour search) will inevitably result in cache misses.
For a most efficient implementations, you should have all (up to 8) daughter cells of a non-final cell to be in one contiguous block of memory, avoiding both cache misses when traversing over them and the need to link them up with pointers. Each cell then only need one pointer/index for their first daughter cell and, possibly (depending on the needs of your applications), a pointer to their mother cell.
Similarly, any particles/positions sorted by the tree should be ordered such that all contained within a cell are contiguous in memory, at all tree levels. Then each cell only has to store the first and number of particles, allowing access to all them at at every level of the tree (not just final cells).
In practice, such an ordering can be achieved by first building a fully linked tree and then mapping it to the form described above. The overhead of this mapping is minor but the gain in a applications substantial.
Finally, when re-building the tree with only slightly changed particle positions, it makes for a significant speed up (depending on your algorithm) to feed the particles in the previous tree order to the tree building algorithm.
Related
I need to represent a tree with multiple branches per node. What structure should I use? It's for computing chess game states. It explodes exponentially so memory will be a concern. I'm using C++11 but am open to other standards. Also, pruning should be O(1).
EDIT1
To expand, I am going to be holding a Chess AI competition. The main PvP game is complete already, and I am programming the AI API next. Contestants will write their own AI, and then we will have them compete in a tournament. The winner's AI will be used in Player vs Computer games. I am just thinking about the best structure to store my game states and AI thoughts.
I was reading up on Deep Blue, and it thinks from 5 to ~25 moves ahead. I can imagine most computers capable of handling 5 moves deep with BFS, but anything deeper and I believe I will have to use DFS.
AI's will be timed, and competing AI's will only be played locally, so as not to introduce advantages in CPU power.
I am reading up on Monte Carlo and Alpha Beta searches now.
My initial thoughts on a data structure are as follows :
union CHESS_MOVE {
unsigned short m;
ChessPosT pos[2];
///...
};
class ChessMoveNode {
CHESS_MOVE move;
std::set<ChessMoveNode> nextmoves;
};
class ChessMoveTree {
std::set<ChessMoveNode> next;
};
The board can be calculated at any time by concatenating the path from the root to the leaf. Although recalculating the board could get very expensive over time. Ideas? Should I store the board? The board is stored as an array of 64 char indices holding a piece number. So it's 16 bytes, compared to 2, but the memory use would save a lot of re-calculation of the board state.
For my own personal AI, I will be implementing a board scoring function that will rank the game states, and then all non maximal game states will be discarded, as well as pruning game states that are invalidated by choosing a move.
One simple approach to do this that works well for Monte-Carlo Tree Search (MCTS) is to use a vector of some custom class. Inside the class you have whatever state information you need in addition to child information -- the number of the children and their index in the vector. This avoids storing a separate pointer for each child, which can introduce significant overhead.
So, the root is at index 0. Inside that index there would be two integers indicating that the children start at index 1 and that there are k children. (From index 1 to k.) At index 1 the children would start at index k+1 with l total children, and so on throughout the tree.
This works really well based on the assumptions that (1) the number of children is fixed, (2) that they are all added at once, and (3) that states are not removed from the tree.
If you are trying to prune states from the tree, this doesn't work as well, because you will leave gaps in tree if you remove them. Using explicit pointers for storing each child is expensive, so something else is done in practice.
First, with alpha-beta search you typically search the tree with a DFS and don't store branches. But, you use a hash table to store states and check for duplicates. The branches of the tree can be implicitly computed from the state, so you can reconstruct the tree without storing everything explicitly.
Note, however, that hash tables (called transposition tables in the context of game tree search) are not typically used deep in the tree because there are many states and the cost of storing grows while the benefit of removing duplicates shrinks.
To summarize, based on the assumption that you are doing something alpha-beta like and you have a good reason to store the tree explicitly, I suggest store the states in a hash table and leave the edges to be implicitly computed from a move-generation function. (Which would apply moves and take the hash of the resulting state to find them in the hash table. If they aren't there they have been pruned.)
I have a collection of points [(x1,y1),(x2,y2), ..., (xn,yn)] which are Morton sorted. I wish to construct a quadtree from these points in parallel. My intuition is to construct a subtree on each core and merge all subtrees to form a complete quadtree. Can anyone provide some high level insights or pseudocode how may I do this efficiently?
First some thought on your plan:
Are you sure that parallelizing construction will help? I think there is a risk that you won't a much speedup. Quadtree construction is rather cheap on the CPU, so it will be partly bound by your memory bandwidth. Parallelization may not help much, unless you have separate memory buses, for example separate machines.
If you want to parallelize construction on parallel machines, it may be cheapest to simply create separate quadtrees by splitting your point collection in evenly sized chunks. This has one big advantage over other solution: When you want insert more points, or want to look up points, the morton order allows you to pretty efficiently determine which tree contains the point (or should contain it, for insertion). For window queries you can do a similar optimization, if the morton-codes of the 'min/min' and the 'max/max' corners of the query-window lie in the same 'chunk' (sub-tree), then you only need to query this one tree. More optimizations are possible.
If you really want to create a single quadtree on a single machine, there are several ways to split your dataset efficiently:
Walk through all points and identify global min/max. Then walk through all points and assign them (assuming 4 cores) to each core, where each core represents a quadrant. These steps are well parallelizable by splitting the dataset into 4 evenly sized chunks, and it results in a quadtree that exactly fits your dataset. You will have to synchronize insertion, into the trees, but since the dataset is morton ordered, there should be relatively few lock collisions.
You can completely avoid lock collisions during insertion by aligning the quadtrants with Morton coordinates, such that the morton-curve (a z-curve) crosses the quadrant borders only once. Disadvantage: the tree will be imbalanced, i.e. it is unlikely that all quadrants contain the same amount of data. This means your CPUs may have considerably different workloads, unless you split the sub-tree into sub-sub-trees, and so on, to distribute the load better. The split-planes for avoiding the z-curve to cross quadrant borders can be identified on the morton-code/z-code of your coordinates. Split the z-code in chunks of two bits, each to bits tell you which (sub-)quadrant to choose, i.e. 00 is lower/left, 01 is lower/right, 10 is upper/left and 11 is upper/right. Since your points a morton ordered, you can simply use binary search to find the chunks for each quadrant. I realize this maybe sound rather cryptic without more explanation. So maybe you can have a look at the PH-Tree, it is essentially are Z-Ordered (morton-ordered) quadtree (more a 'trie' than a 'tree'). There are also some in-depth explanations here and here (shameless self advertisement). The PH-Tree has some nice properties, such as inherently limiting depth to 64 levels (for 64bit numbers) while guaranteeing small nodes (4 entries max for 2 dimensions); it also guarantees, like the quadtree, that any insert/removal will never affect more than one node, plus possibly adding or removing a second node. There is also a C++ implementation here.
I am trying to implement a Kd tree to perform the nearest neighbor and approximate nearest neighbor search in C++. So far I came across 2 versions of the most basic Kd tree.
The one, where data is stored in nodes and in leaves, such as here
The one, where data is stored only in leaves, such as here
They seem to be fundamentally the same, having the same asymptotic properties.
My question is: are there some reasons why choose one over another?
I figured two reasons so far:
The tree which stores data in nodes too is shallower by 1 level.
The tree which stores data only in leaves has easier to
implement delete data function
Are there some other reasons I should consider before deciding which one to make?
You can just mark nodes as deleted, and postpone any structural changes to the next tree rebuild. k-d-trees degrade over time, so you'll need to do frequent tree rebuilds. k-d-trees are great for low-dimensional data sets that do not change, or where you can easily afford to rebuild an (approximately) optimal tree.
As for implementing the tree, I recommend using a minimalistic structure. I usually do not use nodes. I use an array of data object references. The axis is defined by the current search depth, no need to store it anywhere. Left and right neighbors are given by the binary search tree of the array. (Otherwise, just add an array of byte, half the size of your dataset, for storing the axes you used). Loading the tree is done by a specialized QuickSort. In theory it's O(n^2) worst-case, but with a good heuristic such as median-of-5 you can get O(n log n) quite reliably and with minimal constant overhead.
While it doesn't hold as much for C/C++, in many other languages you will pay quite a price for managing a lot of objects. A type*[] is the cheapest data structure you'll find, and in particular it does not require a lot of management effort. To mark an element as deleted, you can null it, and search both sides when you encounter a null. For insertions, I'd first collect them in a buffer. And when the modification counter reaches a threshold, rebuild.
And that's the whole point of it: if your tree is really cheap to rebuild (as cheap as resorting an almost pre-sorted array!) then it does not harm to frequently rebuild the tree.
Linear scanning over a short "insertion list" is very CPU cache friendly. Skipping nulls is very cheap, too.
If you want a more dynamic structure, I recommend looking at R*-trees. They are actually desinged to balance on inserts and deletions, and organize the data in a disk-oriented block structure. But even for R-trees, there have been reports that keeping an insertion buffer etc. to postpone structural changes improves performance. And bulk loading in many situations helps a lot, too!
The two ways commonly used to represent a graph in memory are to use either an adjacency list or and adjacency matrix. An adjacency list is implemented using an array of pointers to linked lists. Is there any reason that that is faster than using a vector of vectors? I feel like it should make searching and traversals faster because backtracking would be a lot simpler.
The vector of linked adjacencies is a favorite textbook meme with many variations in practice. Certainly you can use vectors of vectors. What are the differences?
One is that links (double ones anyway) allow edges to be easily added and deleted in constant time. This obviously is important only when the edge set shrinks as well as grows. With vectors for edges, any individual operation may require O(k) where k is the incident edge count.
NB: If the order of edges in adjacency lists is unimportant for your application, you can easily get O(1) deletions with vectors. Just copy the last element to the position of the one to be deleted, then delete the last! Alas, there are many cases (e.g. where you're worried about embedding in the plane) when order of adjacencies is important.
Even if order must be maintained, you can arrange for copying costs to amortize to an average that is O(1) per operation over many operations. Still in some applications this is not good enough, and it requires "deleted" marks (a reserved vertex number suffices) with compaction performed only when the number of marked deletions is a fixed fraction of the vector. The code is tedious and checking for deleted nodes in all operations adds overhead.
Another difference is overhead space. Adjacency list nodes are quite small: Just a node number. Double links may require 4 times the space of the number itself (if the number is 32 bits and both pointers are 64). For a very large graph, a space overhead of 400% is not so good.
Finally, linked data structures that are frequently edited over a long period may easily lead to highly non-contiguous memory accesses. This decreases cache performance compared to linear access through vectors. So here the vector wins.
In most applications, the difference is not worth worrying about. Then again, huge graphs are the way of the modern world.
As others have said, it's a good idea to use a generalized List container for the adjacencies, one that may be quickly implemented either with linked nodes or vectors of nodes. E.g. in Java, you'd use List and implement/profile with both LinkedList and ArrayList to see which works best for your application. NB ArrayList compacts the array on every remove. There is no amortization as described above, although adds are amortized.
There are other variations: Suppose you have a very dense graph, where there's a frequent need to search all edges incident to a given node for one with a certain label. Then you want maps for the adjacencies, where the keys are edge labels. Of course the maps can be hashes or trees or skiplists or whatever you like.
The list goes on. How to implement for efficient vertex deletion? As you might expect, there are alternatives here, too, each with advantages and disadvantages.
I have been looking for a quadtree/quadtree node implementation on the net for ages. There is some basic stuff but nothing that I would be able to really use it a game.
My purpose is to store objects in a game for processing things such as collision detection.
I am not 100% certain that a quadtree is the best data structure to use, but from what I have read it is. I have already coded a Red-Black tree, but I don't really know if the performance would be good enough for my game (which will be an adventure 3rd person game like Ankh).
How would I write a basic but complete quadtree class (or octree) in C++?
How would you use the quad tree for collisions?
Quadtrees are used when you only need to store things that are effectively on a plane. Like units in a classic RTS where they are all on the ground or just a little bit above it. Essentially each node has links to 4 children that divide the node's space up into evenly distributed quarters.
Octrees do the same but in all three dimensions rather than just two, and thus they have 8 child nodes and partition the space up into eights. They should be used when the game entities are distributed more evenly among all three dimensions.
If you are looking for a binary tree - like a red-black tree - then you want to use a data structure called a binary space partitioning tree (BSP tree) or a version of it called the KD Tree. These partition space into halves using a plane, in the KD tree the planes are orthogonal (on the XZ, XY, ZY axes) so sometimes it works better in a 3D scene. BSP trees divide the scene up using planes in any orientation, but they can be quite useful, and they were used as far back as Doom.
Now because you've partitioned the game space you now don't have to test every game entity against every other game entity to see if they collide, which is an O(n^2) algorithm at best. Instead you query the data structure to return the game entities within a sub-region of the game space, and only perform collision detection for those nodes against each other.
This means that collision detection for all game entities should be n O(nlogn) operation (at worst).
A couple of extra things to watch out for:
Make sure you test game entities from adjacent nodes, not just the ones in the current node, since they could still collide.
Rebalance the data structure after the entities have moved since you may have empty nodes in the data structure now, or ones that contain too many entities for good performance (also the degenerate case of all entities being in the same node).
A red-black tree is not a spatial index; it can only sort on a single ordinal key. A quadtree is (for two dimensions) a spatial index that allows fast lookup and elimination of points. An Octree does the same thing for three dimensions.
The reason to use a quadtree is because you can then split on x- and y-coordinates, an octree on x, y and z, making collision detection trivial.
Quadtree: if an element is not in the topleft, it wont collide with one in topright, bottomleft or bottomright.
It is a very basic class, so I don't understand what you are missing in implementations you found.
I would not write such a class, I'd just borrow it from a project with a suitable license.
I warmly suggest you to use a rendering engine, Ogre3D for instance. As far as I know it supports Octrees for scene management. But you can extend the Octree-based class as you wish. I used to code the stuff I needed by myself, but for complex projects, it's just not the right way.
Trees in general are problematic for this in that any item inserted can lie on a boundary, and all the methods of dealing with that situation are fairly unsatisfactory.
You'll most likely want to sort your objects into moveable and static, and check anything that moved on a given frame against the static objects.
BSP Trees are the accepted solution for static geometry (boundary cases handled by splitting the object into two pieces), for dynamic try something like Sort and Sweep (also known as Sweep and Prune).
Right now STANN is the best open source implementation.
http://sites.google.com/a/compgeom.com/stann/