surrounding objects algorithm - c++

I'm working on a game where exactly one object may exist at location (x, y) where x and y are ints. For example, an object may exist at (0, 0) or it may not, but it is not possible for multiple objects to exist there at once.
I am trying to decide which STL container to use for the problem at hand and the best way to solve this problem.
Basically, I start with an object and its (x, y) location. The goal is to determine the tallest, largest possible rectangle based on that object's surrounding objects. The rectangle must be created by using all objects above and below the current object. That is, it must be the tallest that it can possibly be based on the starting object position.
For example, say the following represents my object grid and I am starting with the green object at location (3, 4):
Then, the rectangle I am looking for would be represented by the pink squares below:
So, assuming I start with the object at (3, 4) like the example shows, I will need to check if objects also exist at (2, 4), (4, 4), (3, 3), and (3, 5). If an object exists at any of those locations, I need to repeat the process for the object to find the largest possible rectangle.
These objects are rather rare and the game world is massive. It doesn't seem practical to just new a 2D array for the entire game world since most of the elements would be empty. However, I need to be to index into any position to check if an object is there at any time.
Instead, I thought about using a std::map like so:
std::map< std::pair<int, int>, ObjectData> m_objects;
Then, as I am checking the surrounding objects, I could use map::find() in my loop, checking if the surrounding objects exist:
if(m_objects.find(std::pair<3, 4>) != m_objects.end())
{
//An object exists at (3, 4).
//Add it to the list of surrounding objects.
}
I could potentially be making a lot of calls to map::find() if I decide to do this, but the map would take up much less memory than newing a 2D array of the entire world.
Does anyone have any advice on a simple algorithm I could use to find what I am looking for? Should I continue using a std::map or is there a better container for a problem like this?

How much data do you need to store at each grid location? If you are simply looking for a flag that indicates neighbors you have at least two "low tech" solutions
a) If your grid is sparse, how about each square keeps a neighbor list? So each square knows which neighboring squares are occupied. You'll have some work to do to maintain the lists when a square is occupied or vacated. But neighbor lists mean you don't need a grid map at all
b) If the grid map locations are truly just points, use 1 bit per grid location. The results map will be 8x8=64 times smaller that one that uses bytes for each grid point. Bit operations are lightening fast. A 10,000x10,000 map will take 100,000,000 bits or 12.5MB (approx)

An improvement would be to use a hashmap, if possible. This would allow you to at least do your potential extensive searches with an expected time complexity of O(1).
There's a thread here ( Mapping two integers to one, in a unique and deterministic way) that goes into some detail about how to hash two integers together.
If your compiler supports C++11, you could use std::unordered_map. If not, boost has basically the same thing: http://www.boost.org/doc/libs/1_38_0/doc/html/boost/unordered_map.html

You may want to consider a spatial data structure. If the data is 'sparse', as you say, then doing a quadtree neighbourhood search might save you a lot of processing power. I would personally use an R-tree, but that's most likely because I have an R-tree library that I've written and can easily import.
For example, suppose you have a 1000x1000 grid with 10,000 elements. Assuming for the moment, a uniformly-random distribution, we would (based on the density) expect no more than, say . . . a chain of three to five objects touching in either dimension (at this density, a chain of three vertically-oriented objects will happen with probability 0.01% of the time). Suppose the object under consideration is located at (x,y). A window search, starting at (x-5,y-5) and going to (x+5,y+5) would give you a list of at most 121 elements to perform a linear search through. If your rect-picking algorithm notices that it would be possible to form a taller rectangle (i.e. if a rect under consideration touches the edges of this 11x11 bounding box), just repeat the window search for another 5x5 region in one direction of the original. Repeat as necessary.
This, of course, only works well when you have extremely sparse data. It might be worth adapting an R-tree such that the leaves are an assoc. data structure (i.e. Int -> Int -> Object), but at that point it's probably best to just find a solution that works on denser data.
I'm likely over-thinking this; there is likely a much simpler solution around somewhere.
Some references on R-trees:
The original paper, for the original algorithms.
The Wikipedia page, which has some decent overview on the topic.
The R-tree portal, for datasets and algorithms relating to R-trees.
I'll edit this with a link to my own R-tree implementation (public domain) if I ever get around to cleaning it up a little.

This sounds suspiciously like a homework problem (because it's got that weird condition "The rectangle must be created by using all objects above and below the current object" that makes the solution trivial). But I'll give it a shot anyway. I'm going to use the word "pixel" instead of "object", for convenience.
If your application really deserves heavyweight solutions, you might try storing the pixels in a quadtree (whose leaves contain plain old 2D arrays of just a few thousand pixels each). Or you might group contiguous pixels together into "shapes" (e.g. your example would consist of only one "shape", even though it contains 24 individual pixels). Given an initial unstructured list of pixel coordinates, it's easy to find these shapes; google "union-find". The specific benefit of storing contiguous shapes is that when you're looking for largest rectangles, you only need to consider those pixels that are in the same shape as the initial pixel.
A specific disadvantage of storing contiguous shapes is that if your pixel-objects are moving around (e.g. if they represent monsters in a roguelike game), I'm not sure that the union-find data structure supports incremental updates. You might have to run union-find on every "frame", which would be pretty bad.
Anyway... let's just say you're using a std::unordered_map<std::pair<int,int>, ObjectData*>, because that sounds pretty reasonable to me. (You should almost certainly store pointers in your map, not actual objects, because copying around all those objects is going to be a lot slower than copying pointers.)
typedef std::pair<int, int> Pt;
typedef std::pair<Pt, Pt> Rectangle;
std::unordered_map<Pt, ObjectData *> myObjects;
/* This helper function checks a whole vertical stripe of pixels. */
static bool all_pixels_exist(int x, int min_y, int max_y)
{
assert(min_y <= max_y);
for (int y = min_y; y <= max_y; ++y) {
if (myObjects.find(Pt(x, y)) == myObjects.end())
return false;
}
return true;
}
Rectangle find_tallest_rectangle(int x, int y)
{
assert(myObjects.find(Pt(x,y)) != myObjects.end());
int top = y;
int bottom = y;
while (myObjects.find(Pt(x, top-1) != myObjects.end()) --top;
while (myObjects.find(Pt(x, bottom+1) != myObjects.end()) ++bottom;
// We've now identified the first vertical stripe of pixels.
// The next step is to "paint-roller" that stripe to the left as far as possible...
int left = x;
while (all_pixels_exist(left-1, top, bottom)) --left;
// ...and to the right.
int right = x;
while (all_pixels_exist(right+1, top, bottom)) ++right;
return Rectangle(Pt(top, left), Pt(bottom, right));
}

Related

How to apply Data-Oriented Desing when an structure contains various number of elements in inner vector?

I would like to apply Data-Oriented Design (based on e.g. this article) to my simple physics engine. And I'm focused on optimizing the collision testing as it is the most expensive part of it.
I've organized the bounding spheres that may collide with player into single vector:
struct Sphere{ //I don't split sphere into parts,
//as I usually access both position and radius in my calculations
Point3D position;
float radius;
};
std::vector<BoudingSphere> spheres;
and I test collisions with them inside single function/method. Everything looks clear to me to that point.
The problem is, I also have some more general structures like:
struct Polygon{ //it may e.g. represents the area or be used for more precise tests
std::vector<Point2D> points;
};
I guess it won't be a good practise to just create std::vector<Polygon> the same way, as nested vector (points) will take a lot of place in memory (reserving it).
On the other hand, I cannot assume that there are always 2,3,4 or 10 points (it differs a lot, with the maximum of about 20, but it's usually much less).
And I do not want to switch from Polygon general structure to e.g. series of triangles (as it is faster then separated triangles in many calculations).
What should I do then? I want to go with the spirit of Data-Oriented Design and use the memory/cache efficiently with my Polygon.
Do I have to get rid of the inner vector (points)? How if so?
There can't be a definitive answer to this question as it would require knowledge about the way you access the data in terms of when is it initialized, can it be changed after the initialization stage and many others. However, if your data is relatively stable and you are accessing your polygons in a consistent manner just iterating over all polygons or polygons belonging to one particular object, one approach may be to put the points of your polygons into a separate vector and just have the polygons store the beginning and the end indices to that array.
This way, there are a couple of things that needs to be accessed during traversal. First, the indices stored in the polygons. Second, the points themselves. Both of these accesses are likely to be cache-friendly if the polygons are also laid out in a vector. Similar approach can be applied to polygon sets - just store the polygons in a vector and have a (begin, end) pair in your game objects.

What type of Container should I use to hold a 2D Tile Based World's Tile Objects?

I have a 2D tile based world, which is created using a simple array: World[100][100].
I plan on making the game multiplayer, so I figure it would be good to have the SERVER send the CLIENT all the tiles surrounding the player, and only the information that the CLIENT needs: What texture/sprite to render, and what position to render it.
To do this, I am assuming it would be good to make my own TILE class, which has a tileType (Grass, Sand, Water, etc.) and holds an array of 10 TileObjects.
TileObject is a class that holds three variables: objectType (Is it a Character? An Item? A tree? A rock?), an int objectID (which is linked to whatever the object actually is, so it knows the Character is "Leeroy Jenkins" and can then send Leeroy's animation & direction to the CLIENT for rendering.) and an objectPosition (X & Y from the Tile. This can extend beyond the Tile if needed.)
Although with this I am not sure how I would handle objects or characters that are larger than a single tile (such as a Dragon whose collision consumes many tiles) but it sounds like the best design.
What type of container should I use to store the TileObjects in the TILE class? Right now I have an array, but I doubt that is good for performance right? Some tiles may have 0 TileObjects, while others may have 5+. I used 10 because I severely doubt anything will ever exceed <10.
class Tile
{
private:
TileObject TileObjects[10]; //up to 10 objects on a single tile
TileTerrainType tileTerrainType; //GFX to display Grass, Sand, Water, Swamp, etc.
}
I have read many different tutorials and books, who argue completely different container types: Vectors, Maps, Linked Lists, Arrays. I just do not know what is best to store TileObjects (some of which may move constantly, can be destroyed or added like dropping/picking up items, and some which may remain stationary like a tree or rock).
I think you should have a map from co-ordinates on the world to a vector of what things are contained at those co-ordinates in the world.
As in, if I have a class Thing that represents any thing that can be on a space in the game, and a class Position with x and y parameters, I would have this:
map<Position, vector<Thing>>
initialized with one vector<Thing>, initially empty, for every position in the game (all 10000 of them).
This gives you a useful property:
Spatial partitioning. If I want to figure out what things are around me, I just have to check in the map at nine different positions. If I were to go the other way, and have one big list of Things unpartitioned by where they are, I would have to look at every single Thing in the game to make sure I'd see every single Thing that might be near me.
This provides huge speedups 1) the larger the map gets 2) the more Things exist 3) the more and more complex kinds of local interactions you demand.
What if you have a ThingID and need to get the Thing associated with it, not knowing where it is? To solve this, you can have a big map<int, Thing> of ThingIDs to Things and look in there. Of course, we see some redundancy here - we could either store the Thing in the flat map, or in the by-Position map, and have the other one just be a reference to the other (in the former case, containing just current position - in the latter case, containing just ThingID) and they must be kept in sync.
This could even lead to the 'dragon' problem being solved - if Things are in the flat map and merely references are stored in the by-position map, your Dragon could have one entry in the flat map and four references (one for each of the positions it occupies) in the by-position map, all pointing to the same thing. Then you just have to write code that is careful to not interact with an object twice, just because it happened to find it in two positions it was considering!

How to find right data structure for a searching application?

My question can be asked in two different aspects: one is from data structure perspective, and the other is from image processing perspective. Let's begin with the data structure perspective: suppose now I have a component composed of several small items as the following class shows:
class Component
{
public:
struct Point
{
float x_;
float y_;
};
Point center;
Point bottom;
Point top;
}
In the above example, the Component class is composed of member variables such as center, bottom and top (small items).
Now I have a stack of components (the number of components is between 1000 and 10000), and each component in the stack has been assigned different values, which means there are no duplicate components in the stack. Then, if one small item in the component, for example, 'center' in the illustrated class is known, we can find the unique component in the stack. After that, we can retrieve other properties in the component. Then my question is, how to build a right container data structure to make the searching easier? Now I am considering to use vector and find algorithm in STL(Pseudocode):
vector<Component> comArray;
comArray.push_back( component1);
.....
comArray.push_back(componentn);
find(comArray.begin(), comArray.end(), center);
I was wondering whether there are more efficient containers to solve this problem.
I can also explain my question from image processing perspective. In image processing, connect component analysis is a very important step for object recognition. Now for my application I can obtain all the connect components in the image, and I also find interesting objects should fulfill the following requirement: their connect component centers should be in a specific range. Therefore, given this constraint, I can eliminate many connected components and then work on the candidate ones. The key step in the above procedure is to how to search for candidate connected components if the central coordinate constraint is given. Any idea will be appreciated.
If you need to be able to get them rather fast, here's a little strange solution for you.
Note that it is a bad solution general-speaking, but it may suit you.
You could make an ordinary vector< component >. Or that can even be a simple array. Then make three maps:
map< Point, Component *>center
map< Point, Component *>bottom
map< Point, Component *>top
Fill them with all the available values of center, bottom and top as keys, and provide pointers to the corresponding Components as values (you could also use just indexes in a vector, so it would be map< Point, int >).
After that, you just use center[Key], bottom[Key] or top[Key], and get either your value (if you store pointers), or the index of your value in the array (if you store indexes).
I wouldn't use such an approach often, but it could work if the values will not change (so you can fill the index maps once), and the data amount is rather big (so searching through an unsorted vector is slow), and you will need to search often.
You probably want spatial indexing data structures.
I think you want to use a map or a hash_map to efficiently lookup your component based on a "center" value.
std::map<Component::Point, Component> lookuptable;
lookuptable[component1.center] = component1;
....
auto iterator = lookuptable.find(someCenterValue)
if (iterator != lookuptable.end())
{
componentN = iterator->second;
}
As for finding elements in your set that are within a given coordinated range. There are several ways to do this. One easy way is to just to have two sorted arrays of the component list, one sorted on the X axis and the other on the Y axis. Then to find the matching elements, you just do a binary search on either axis for the one closest to your target. Then expand scan up and down the array until you go out of range. You could also look at using a kd-tree and find all the nearest neighbors.
If you want to access them in const time and don't want to modify it. I think std::set is good choice for you code.
set<Component> comArray;
comArray.push_back( component1);
.....
comArray.push_back(componentn);
set<Component>::iterator iter = comArray.find(center)
of course, you should write operator== for class Component and nesting struct Point.

Bidirectional data structure for this situation

I'm studying a little part of a my game engine and wondering how to optimize some parts.
The situation is quite simple and it is the following:
I have a map of Tiles (stored in a bi-dimensional array) (~260k tiles, but assume many more)
I have a list of Items which always are in at least and at most a tile
A Tile can logically contain infinite amount of Items
During game execution many Items are continuously created and they start from their own Tile
Every Item continuously changes its Tile to one of the neighbors (up, right, down, left)
Up to now every Item has a reference to its actual Tile, and I just keep a list of items.
Every time an Item moves to an adjacent tile I just update item->tile = .. and I'm fine. This works fine but it's unidirectional.
While extending the engine I realized that I have to find all items contained in a tile many times and this is effectively degrading the performance (especially for some situations, in which I have to find all items for a range of tiles, one by one).
This means I would like to find a data structure suitable to find all the items of a specific Tile better than in O(n), but I would like to avoid much overhead in the "moving from one tile to another" phase (now it's just assigning a pointer, I would like to avoid doing many operations there, since it's quite frequent).
I'm thinking about a custom data structure to exploit the fact that items always move to neighbor cell but I'm currently groping in the dark! Any advice would be appreciated, even tricky or cryptic approaches. Unfortunately I can't just waste memory so a good trade-off is needed to.
I'm developing it in C++ with STL but without Boost. (Yes, I do know about multimap, it doesn't satisfy me, but I'll try if I don't find anything better)
struct Coordinate { int x, y; };
map<Coordinate, set<Item*>> tile_items;
This maps coordinates on the tile map to sets of Item pointers indicating which items are on that tile. You wouldn't need an entry for every coordinate, only the ones that actually have items on them. Now, I know you said this:
but I would like to avoid much overhead in the "moving from one tile
to another" phase
And this method would involve adding more overhead in that phase. But have you actually tried something like this yet and determined that it is a problem?
To me I would wrap a std::vector into a matrix type (IE impose 2d access on a 1d array) this give you fast random access to any of your tiles (implementing the matrix is trivial).
use
vector_index=y_pos*y_size+x_pos;
to index a vector of size
vector_size=y_size*x_size;
Then each item can have a std::vector of items (if the amount of items a tile has is very dynamic maybe a deque) again these are random access contains with very minimal overhead.
I would stay away from indirect containers for your use case.
PS: if you want you can have my matrix template.
If you really think having each tile store it's items will cost you too much space, consider using a quadtree to store items then. This allows you to efficiently get all the items on a tile, but leaves your Tile grid in place for item movement.

Is there a data structure with these characteristics?

I'm looking for a data structure that would allow me to store an M-by-N 2D matrix of values contiguously in memory, such that the distance in memory between any two points approximates the Euclidean distance between those points in the matrix. That is, in a typical row-major representation as a one-dimensional array of M * N elements, the memory distance differs between adjacent cells in the same row (1) and adjacent cells in neighbouring rows (N).
I'd like a data structure that reduces or removes this difference. Really, the name of such a structure is sufficient—I can implement it myself. If answers happen to refer to libraries for this sort of thing, that's also acceptable, but they should be usable with C++.
I have an application that needs to perform fast image convolutions without hardware acceleration, and though I'm aware of the usual optimisation techniques for this sort of thing, I feel a specialised data structure or data ordering could improve performance.
Given the requirement that you want to store the values contiguously in memory, I'd strongly suggest you research space-filling curves, especially Hilbert curves.
To give a bit of context, such curves are sometimes used in database indexes to improve the locality of multidimensional range queries (e.g., "find all items with x/y coordinates in this rectangle"), thereby aiming to reduce the number of distinct pages accessed. A bit similar to the R-trees that have been suggested here already.
Either way, it looks that you're bound to an M*N array of values in memory, so the whole question is about how to arrange the values in that array, I figure. (Unless I misunderstood the question.)
So in fact, such orderings would probably still only change the characteristics of distance distribution.. average distance for any two randomly chosen points from the matrix should not change, so I have to agree with Oli there. Potential benefit depends largely on your specific use case, I suppose.
I would guess "no"! And if the answer happens to be "yes", then it's almost certainly so irregular that it'll be way slower for a convolution-type operation.
EDIT
To qualify my guess, take an example. Let's say we store a[0][0] first. We want a[k][0] and a[0][k] to be similar distances, and proportional to k, so we might choose to interleave the storage of first row and first column (i.e. a[0][0], a[1][0], a[0][1], a[2][0], a[0][2], etc.) But how do we now do the same for e.g. a[1][0]? All the locations near it in memory are now taken up by stuff that's near a[0][0].
Whilst there are other possibilities than my example, I'd wager that you always end up with this kind of problem.
EDIT
If your data is sparse, then there may be scope to do something clever (re Cubbi's suggestion of R-trees). However, it'll still require irregular access and pointer chasing, so will be significantly slower than straightforward convolution for any given number of points.
You might look at space-filling curves, in particular the Z-order curve, which (mostly) preserves spatial locality. It might be computationally expensive to look up indices, however.
If you are using this to try and improve cache performance, you might try a technique called "bricking", which is a little bit like one or two levels of the space filling curve. Essentially, you subdivide your matrix into nxn tiles, (where nxn fits neatly in your L1 cache). You can also store another level of tiles to fit into a higher level cache. The advantage this has over a space-filling curve is that indices can be fairly quick to compute. One reference is included in the paper here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.8959
This sounds like something that could be helped by an R-tree. or one of its variants. There is nothing like that in the C++ Standard Library, but looks like there is an R-tree in the boost candidate library Boost.Geometry (not a part of boost yet). I'd take a look at that before writing my own.
It is not possible to "linearize" a 2D structure into an 1D structure and keep the relation of proximity unchanged in both directions. This is one of the fundamental topological properties of the world.
Having that that, it is true that the standard row-wise or column-wise storage order normally used for 2D array representation is not the best one when you need to preserve the proximity (as much as possible). You can get better result by using various discrete approximations of fractal curves (space-filling curves).
Z-order curve is a popular one for this application: http://en.wikipedia.org/wiki/Z-order_(curve)
Keep in mind though that regardless of which approach you use, there will always be elements that violate your distance requirement.
You could think of your 2D matrix as a big spiral, starting at the center and progressing to the outside. Unwind the spiral, and store the data in that order, and distance between addresses at least vaguely approximates Euclidean distance between the points they represent. While it won't be very exact, I'm pretty sure you can't do a whole lot better either. At the same time, I think even at very best, it's going to be of minimal help to your convolution code.
The answer is no. Think about it - memory is 1D. Your matrix is 2D. You want to squash that extra dimension in - with no loss? It's not going to happen.
What's more important is that once you get a certain distance away, it takes the same time to load into cache. If you have a cache miss, it doesn't matter if it's 100 away or 100000. Fundamentally, you cannot get more contiguous/better performance than a simple array, unless you want to get an LRU for your array.
I think you're forgetting that distance in computer memory is not accessed by a computer cpu operating on foot :) so the distance is pretty much irrelevant.
It's random access memory, so really you have to figure out what operations you need to do, and optimize the accesses for that.
You need to reconvert the addresses from memory space to the original array space to accomplish this. Also, you've stressed distance only, which may still cause you some problems (no direction)
If I have an array of R x C, and two cells at locations [r,c] and [c,r], the distance from some arbitrary point, say [0,0] is identical. And there's no way you're going to make one memory address hold two things, unless you've got one of those fancy new qubit machines.
However, you can take into account that in a row major array of R x C that each row is C * sizeof(yourdata) bytes long. Conversely, you can say that the original coordinates of any memory address within the bounds of the array are
r = (address / C)
c = (address % C)
so
r1 = (address1 / C)
r2 = (address2 / C)
c1 = (address1 % C)
c2 = (address2 % C)
dx = r1 - r2
dy = c1 - c2
dist = sqrt(dx^2 + dy^2)
(this is assuming you're using zero based arrays)
(crush all this together to make it run more optimally)
For a lot more ideas here, go look for any 2D image manipulation code that uses a calculated value called 'stride', which is basically an indicator that they're jumping back and forth between memory addresses and array addresses
This is not exactly related to closeness but might help. It certainly helps for minimation of disk accesses.
one way to get better "closness" is to tile the image. If your convolution kernel is less than the size of a tile you typical touch at most 4 tiles at worst. You can recursively tile in bigger sections so that localization improves. A Stokes-like (At least I thinks its Stokes) argument (or some calculus of variations ) can show that for rectangles the best (meaning for examination of arbitrary sub rectangles) shape is a smaller rectangle of the same aspect ratio.
Quick intuition - think about a square - if you tile the larger square with smaller squares the fact that a square encloses maximal area for a given perimeter means that square tiles have minimal boarder length. when you transform the large square I think you can show you should the transform the tile the same way. (might also be able to do a simple multivariate differentiation)
The classic example is zooming in on spy satellite data images and convolving it for enhancement. The extra computation to tile is really worth it if you keep the data around and you go back to it.
Its also really worth it for the different compression schemes such as cosine transforms. (That's why when you download an image it frequently comes up as it does in smaller and smaller squares until the final resolution is reached.
There are a lot of books on this area and they are helpful.