Data structure for a random world

Data structure for a random world - c++

So, I was thinking about making a simple random world generator. This generator would create a starting "cell" that would have between one and four random exits (in the cardinal directions, something like a maze). After deciding those exits, I would generate a new random "cell" at each of those exits, and repeat whenever a player would get near a part of the world that had not yet been generated. This concept would allow a "infinite" world of sorts, all randomly generated; however, I am unsure of how to best represent this internally.
I am using C++ (which doesn't really matter, I could implement any sort of data structure necessary). At first I thought of using a sort of directed graph in which each node would have directed edges to each cell surrounding it, but this probably won't work well if a user finds a spot in the world, backtracks, and comes back to that spot from another direction. The world might do some weird things, such as generate two cells at one location.
Any ideas on what kind of data structure might be the most effective for such a situation? Or am I doing something really dumb with my random world generation?
Any help would be greatly appreciated.
Thanks,
Chris

I recommend you read about graphs. This is exactly an application of random graph generation. Instead of 'cell' and 'exit' you are describing 'node' and 'edge'.
Plus, then you can do things like shortest path analysis, cycle detection and all sorts of other useful graph theory application.
This will help you understand about the nodes and edges:
and here is a finished application of these concepts. I implemented this in a OOP way - each node knew about it's edges to other nodes. A popular alternative is to implement this using an adjacency list. I think the adjacency list concept is basically what user470379 described with his answer. However, his map solution allows for infinite graphs, while a traditional adjacency list does not. I love graph theory, and this is a perfect application of it.
Good luck!
-Brian J. Stianr-

A map< pair<int,int>, cell> would probably work well; the pair would represent the x,y coordinates. If there's not a cell in the map at those coordinates, create a new cell. If you wanted to make it truly infinite, you could replace the ints with an arbitrary length integer class that you would have to provide (such as a bigint)

If the world's cells are arranged in a grid, you can easily give them cartesian coordinates. If you keep a big list of existing cells, then before determining exits from a given cell, you can check that list to see if any of its neighbors already exist. If they do, and you don't want to have 1-way doors (directed graph?) then you'll have to take their exits into account. If you don't mind having chutes in your game, you can still choose exits randomly, just make sure that you link to existing cells if they're there.
Optimization note: checking a hash table to see if it contains a particular key is O(1).

Couldn't you have a hash (or STL set) that stored a collection of all grid coordinates that contain occupied cells?
Then when you are looking at creating a new cell, you can quickly check to see if the candidate cell location is already occupied.
(if you had finite space, you could use a 2d array - I think I saw this in a Byte magazine article back in ~1980-ish, but if I understand correctly, you want a world that could extend indefinitely)

Related

What is the best way to represent a map or train station line as a graph in code?

so I was trying to represent a certain transport system and apply some search algorithms. The system consists of a number of stations, so I think they can act as vertices. while the lines between them are good for the edges. I do have a high level idea of what I wanna do and how the search might work but I am not able to translate that into code.
Do I use a class to represent the stations? and each station an object? a tuple to store the co-ordinates? I would love to get any guidance to how to actually translate implementing the algorithms and writing the program itself.
I am thinking about using C++ for this

you need at least a class Node/Station, then you need either 1. a class Graph that contains a list of weighted connections between 2 Nodes, or 2. a list of nodes and each nodes has a list of weighted connections with node pointers.
Usually you would want your graph API to be able to return the neighbors of a node* sorted by weights, so probably sort that by calling graph.build() after adding all the weights.
There is no real gain in adding "coordinates" for the stations, unless its a real application, as its much easier to just set costs between stations than trying to come up with good positions for stations, otherwise just draw yourself a station map and label the edges yourself.
I'm guessing you want something like graph.path(a, b), then with Dijkstra's algorithm you will easily be able to do that, I would recommend setting up the code to call the algo before coding too much, that way if you're about to represent the data in a bad way, you will know earlier.

3D-Grid of bins: nested std::vector vs std::unordered_map

pros, I need some performance-opinions with the following:
1st Question:
I want to store objects in a 3D-Grid-Structure, overall it will be ~33% filled, i.e. 2 out of 3 gridpoints will be empty.
Short image to illustrate:
Maybe Option A)
vector<vector<vector<deque<Obj>> grid;// (SizeX, SizeY, SizeZ);
grid[x][y][z].push_back(someObj);
This way I'd have a lot of empty deques, but accessing one of them would be fast, wouldn't it?
The Other Option B) would be
std::unordered_map<Pos3D, deque<Obj>, Pos3DHash, Pos3DEqual> Pos3DMap;
where I add&delete deques when data is added/deleted. Probably less memory used, but maybe less fast? What do you think?
2nd Question (follow up)
What if I had multiple containers at each position? Say 3 buckets for 3 different entities, say object types ObjA, ObjB, ObjC per grid point, then my data essentially becomes 4D?
Another illustration:
Using Option 1B I could just extend Pos3D to include the bucket number to account for even more sparse data.
Possible queries I want to optimize for:
Give me all Objects out of ObjA-buckets from the entire structure
Give me all Objects out of ObjB-buckets for a set of
grid-positions
Which is the nearest non-empty ObjC-bucket to
position x,y,z?
PS:
I had also thought about a tree based data-structure before, reading about nearest neighbour approaches. Since my data is so regular I had thought I'd save all the tree-building dividing of the cells into smaller pieces and just make a static 3D-grid of the final leafs. Thats how I came to ask about the best way to store this grid here.
Question associated with this, if I have a map<int, Obj> is there a fast way to ask for "all objects with keys between 780 and 790"? Or is the fastest way the building of the above mentioned tree?
EDIT
I ended up going with a 3D boost::multi_array that has fortran-ordering. It's a little bit like the chunks games like minecraft use. Which is a little like using a kd-tree with fixed leaf-size and fixed amount of leaves? Works pretty fast now so I'm happy with this approach.

Answer to 1st question
As #Joachim pointed out, this depends on whether you prefer fast access or small data. Roughly, this corresponds to your options A and B.
A) If you want fast access, go with a multidimensional std::vector or an array if you will. std::vector brings easier maintenance at a minimal overhead, so I'd prefer that. In terms of space it consumes O(N^3) space, where N is the number of grid points along one dimension. In order to get the best performance when iterating over the data, remember to resolve the indices in the reverse order as you defined it: innermost first, outermost last.
B) If you instead wish to keep things as small as possible, use a hash map, and use one which is optimized for space. That would result in space O(N), with N being the number of elements. Here is a benchmark comparing several hash maps. I made good experiences with google::sparse_hash_map, which has the smallest constant overhead I have seen so far. Plus, it is easy to add it to your build system.
If you need a mixture of speed and small data or don't know the size of each dimension in advance, use a hash map as well.
Answer to 2nd question
I'd say you data is 4D if you have a variable number of elements a long the 4th dimension, or a fixed large number of elements. With option 1B) you'd indeed add the bucket index, for 1A) you'd add another vector.
Which is the nearest non-empty ObjC-bucket to position x,y,z?
This operation is commonly called nearest neighbor search. You want a KDTree for that. There is libkdtree++, if you prefer small libraries. Otherwise, FLANN might be an option. It is a part of the Point Cloud Library which accomplishes a lot of tasks on multidimensional data and could be worth a look as well.

Efficiently search among pairs of adjacent elements in a `set`

I'm currently working on a problem where I want to maintain the convex hull of a set of linear functions. It might look something like this:
I'm using a set<Line> to maintain the lines so that I can dynamically insert lines, which works fine. The lines are ordered by increasing slope, which is defined by the operator< of the lines. By throwing out "superseded" lines, the data structure guarantees that every line will have some segment that is a part of the convex hull.
Now the problem is that I want to search in this data structure for the crossing point whose X coordinate precedes a given x. Since those crossing points are only implicitely defined by adjacency in the set (in the image above, those are the points N, Q etc.), it seems to be entirely impossible to solve with the set alone, since I don't have
The option to find an element by anything but the primary compare function
The option to "binary search" in the underlying search tree myself, that is, compute the pre-order predecessor or successor of an iterator
The option to access elements by index efficiently
I am thus inclined to use a second set<pair<set<Line>::iterator, set<Line>::iterator> > >, but this seems incredibly hacky. Seeing as we mainly need this for programming contests, I want to minimize code size, so I want to avoid a second set or a custom BBST data structure.
Is there a good way to model this scenario which still let's me maintain the lines dynamically and binary search by the value of a function on adjacent elements, with a reasonable amount of code?

Bidirectional data structure for this situation

I'm studying a little part of a my game engine and wondering how to optimize some parts.
The situation is quite simple and it is the following:
I have a map of Tiles (stored in a bi-dimensional array) (~260k tiles, but assume many more)
I have a list of Items which always are in at least and at most a tile
A Tile can logically contain infinite amount of Items
During game execution many Items are continuously created and they start from their own Tile
Every Item continuously changes its Tile to one of the neighbors (up, right, down, left)
Up to now every Item has a reference to its actual Tile, and I just keep a list of items.
Every time an Item moves to an adjacent tile I just update item->tile = .. and I'm fine. This works fine but it's unidirectional.
While extending the engine I realized that I have to find all items contained in a tile many times and this is effectively degrading the performance (especially for some situations, in which I have to find all items for a range of tiles, one by one).
This means I would like to find a data structure suitable to find all the items of a specific Tile better than in O(n), but I would like to avoid much overhead in the "moving from one tile to another" phase (now it's just assigning a pointer, I would like to avoid doing many operations there, since it's quite frequent).
I'm thinking about a custom data structure to exploit the fact that items always move to neighbor cell but I'm currently groping in the dark! Any advice would be appreciated, even tricky or cryptic approaches. Unfortunately I can't just waste memory so a good trade-off is needed to.
I'm developing it in C++ with STL but without Boost. (Yes, I do know about multimap, it doesn't satisfy me, but I'll try if I don't find anything better)

struct Coordinate { int x, y; };
map<Coordinate, set<Item*>> tile_items;
This maps coordinates on the tile map to sets of Item pointers indicating which items are on that tile. You wouldn't need an entry for every coordinate, only the ones that actually have items on them. Now, I know you said this:
but I would like to avoid much overhead in the "moving from one tile
to another" phase
And this method would involve adding more overhead in that phase. But have you actually tried something like this yet and determined that it is a problem?

To me I would wrap a std::vector into a matrix type (IE impose 2d access on a 1d array) this give you fast random access to any of your tiles (implementing the matrix is trivial).
use
vector_index=y_pos*y_size+x_pos;
to index a vector of size
vector_size=y_size*x_size;
Then each item can have a std::vector of items (if the amount of items a tile has is very dynamic maybe a deque) again these are random access contains with very minimal overhead.
I would stay away from indirect containers for your use case.
PS: if you want you can have my matrix template.

If you really think having each tile store it's items will cost you too much space, consider using a quadtree to store items then. This allows you to efficiently get all the items on a tile, but leaves your Tile grid in place for item movement.

Boost Graph Library dynamic edges weights

I'm wondering if it's possible to make dynamic edges weights in BGL? I'm writing public transport navigator so except time as weight it would be nice if I can promote actualy using line instead of change at every stop event if it would be 3 minutes faster - this is just inconvenient.
Thanks for your help
edit:
Or maybe there is better library than can do that which I should use?

I'm not entirely clear on what you mean by dynamic... the weights are presumably stored in edge properties; there's nothing to stop you updating the properties with new values as required.
If you mean that you want the edge weights to be a function-object (or "functor", if you must) rather than "just a value", then see this thread on the BGL users list; haven't tried it myself. Makes me wonder how well various graph algorithms using edge weights deal with the weights changing while they're in progress (if the functor is called more than once and returns a different value each time)...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js