I will have to implement an endless 3D raster map in program memory. The map may or may NOT start at [0;0;0]. The map has Y coordinate limited by 255, others may be infinite. (yes, now, you may have guessed it is a Minecraft map)
I need to create some class which will have simple McMap::getBlock(int x, short y, int z) and McMap::setBlock(int x, short y, int z) method. Means I need to be able to both read and write the data. I also want to be able to delete blocks and so free the memory.
What should user for this purpose? I think the best solution would be some table with such structure:
int x|short y|int z|int block id|other values...
-----+-------+-----+------------+---------------
55| 21| 666| 1|
But how do I implement this with C++, without using real MySql (that would be real overkill)? Also, I don't want to keep the map when the program exits, so I want the data to be inside the programs memory.
Once more, consider that the map is infinite and so the coordinates may be whatever. Also do not forget that a very distant points may be mapped.
Also, a very important thing to note: I need to have an effective way to get block by X, Y and Z coordinates - I don't want to walk through all block to find one of them.
I have already included boost library.
I assume you probably wouldn't need to the have entire possible area of a Minecraft world in memory time because that would be incredibly huge (1024000000 KM^2). If you are just trying to keep the area that anyone would usually end up visiting during a game in memory I think it would be completely feasible to access it using STL (Standard template library).
Minecraft worlds are always loaded in game in chunks which are 16X16X255 blocks. You could store chunks in your program in a std::map. There are a few advantages to this method. The first is it allows representation of locations well beyond the playable area of the map based on the wiki entry for the Far Lands. It also allows for a sparse representation of the minecraft map that will very closely resemble how actual Minecraft maps are rendered. Only the chunks that you are using for your program are loaded in the std::map and hopefully keep memory usage reasonable. You would be able to represent any area no matter its location in the playable area of the total possible Minecraft map area.
To implement this you will just have to first create the world datatype:
using namespace std;
struct Block
{
// Whatever information you care to store here...
};
typedef vector<block> Chunk;
typedef map<int, map<int, Chunk> > World;
Then to access a single block:
Block McMap::getBlock(int x, short y, int z)
{
const int WIDTH = 16; // You might want to store these constants elsewhere
const int HEIGHT = 255;
int chunkx = x / WIDTH;
int chunkz = z / WIDTH;
return yourWorld[chunkx][chunkz][x + z * WIDTH + y * HEIGHT * WIDTH];
}
To erase a chunk:
void McMap::eraseChunk(int x, int z)
{
if (yourWorld.find(x)) // Tests to make sure that row exists in the map.
yourWorld[x].erase(z);
}
Another benefit to using this method is that by creating a clever constructor for a chunk rather than just using a typdedef like I did you could automatically generate a chunk when you need to access a new chunk in the world std::map similar to how chunks are generated in Minecraft only when you visit them. Whenever you access an object that does not exist yet in a map it will call the default constructor for that object.
You can decompose your map in chunks, like in minecraft. Each chunk is W * H * L (x y z) blocks. So, a chunk is just a 3d array. The best to do is to wrap it into a 1d array:
BlockType* chunk = new BlockType[W * H * L];
BlockType block = chunk[x + z * W + y * W * H];
This is to have a good memory management (and is a lot better than storing the whole possible map in an array). Note than accessing a block in a chunk is O(1) and here should be very fast.
Then, you can store your chunks. Each chunk is given 2d coordinates that are its id.
The fastest (for access) should be a std::map:
std::map<ChunkCoord, ChunkType*> map;
Access to block is fast. You need to get the chunk (a division should give you the chunk coords from point coords), then you get the block. Accessing a chunk is in O(log(numChunks)).
Creating a chunk is allocating memory and creating a new item in the map. You will still be limited by the amount of memory in the computer (infinity is not part of this world...), that's why minecraft-like games often save unused chunks to disk. Saving to disk is the only way to have a near-endless map.
The tricky point is to find good values for W, H, and L. For this, I'm afraid you will have to test and measure a lot...
Note: extending this idea leads to quadtrees. You can use them, but they may have too much memory overhead.
Related
I'm working on making a tile-map editor in C++. Right now, while a map is being edited, its attributes are stored across three vectors:
vector<vector<vector<bool>>> CanvasCollisionObstruction; //[collision,obstruction][map x][map y]
vector<vector<vector<bool>>> CanvasZoneOverlays; //zone overlays for programmable map zones [zone type][map x][map y]
vector<vector<vector<canvasClip>>> CanvasClips; //identifies which sprite occupies this tile [layer number][map x][map y]
In the above vectors, the 2nd and 3rd dimensions ([map x] and [map y]) refer to the actual tile coordinates on the map. These are just plain-old-square-2d maps. The type of that last vector is the following struct:
struct canvasClip
{
int tileset;
int clip;
//initialization to check if it's in-use
canvasClip() : tileset(-1), clip(-1) {}
bool isInitialized()
{//a clip is only rendered if it is initialized
return ((tileset >= 0) && (clip >= 0));
}
bool operator==(const canvasClip& a) const
{//used in flood-fill to compare with target tile
return ((tileset == a.tileset) && (clip == a.clip));
}
bool operator!=(const canvasClip& a) const
{//used in flood-fill to compare with target tile
return ((tileset != a.tileset) || (clip != a.clip));
}
};
For this application, I expect to eventually want to generate maps with size upwards of 50000x50000 tiles, across indefinite (but probably never more than 10) layers. There are about 12 zones total, and that number is constant.
The map editor has some controls to change the size of the map (numeric inputs and a button). When I set the map size to a very large number, I call vector.resize() on each of those vectors, and I can watch my memory usage quickly travel upward in the task manager until my computer finally crashes.
Can anyone offer me some advice or tips for handling very large vectors? Am I going to have to do something like compress the vector so that a single index describes a span of similar tiles? Should I store the map in a file instead of memory, and then read-back only a few chunks of it at a time, as-needed?
How do good programmers handle this situation?
As it already mentioned in the comments, you are trying to allocate huge amount of memory only for data.
In this case, you have to choose different data structure to store it and operate with it.
Here is a couple of the simplest tricks that you may apply in the cost of complexity of the code that operates the data:
You have default values that seem to be meaningless. Why not to store in memory only the data that is set to true values?
You may store in memory only visible part of data (see Here's what's happening in Horizon: Zero Dawn every time you move a camera
).
You might have to pack your data structure and work on alignments (see Data structure alignment).
Of course, there is a case when you have to limit requirements, but that's a part of a life.
I am working on quite a big parallel application using OpenMPI to distribute data among MPI Processses. Using MPI with some serialization library, such as "cereal" makes it very comfortable to pass huge-multi embedded objects around. To give a hint of what I mean by multi-embedded structure, I am currently working with simplified version, such as :
// structures for CUDA - this is inside std::vector<struct_multi_data> multi_data_vector
struct struct_multi_data{
int intended_kernel_block;
int intended_kernel_thread;
std::vector<float> data_float;
std::vector<float> data_int;
float result;
};
struct struct_unique_data{
// this structure is shared among all blocks/threads
float x;
float y;
float z;
};
class Data_object{
// functions
public:
Data_object();
~Data_object();
int resize(int multi_data_vector_len, int data_float_len, int data_int_len);
void set_id(int id);
int clean(void);
int get_multi_data_len();
int get_multi_data(struct_multi_data * data, int vector_element);
int set_multi_data(struct_multi_data * data, int vector_element);
// variables
private:
std::vector<struct_multi_data> multi_data_vector;
struct_unique_data unique_data;
int data_id;
};
* the above code is simplified, I have removed serialization functions and some other basic stuff, but the overall structure holds
To put it simply, I am moving around the Data_object, containing vector{struct_multi_data}, which is a vector of structures, where every structure struct_multi_data contains some vector{float}.
I have a good reason to embed all the data into 1 Data_object, as it simplifies the MPI sending and receiving.
QUESTION
Is there some comfortable way to move the Data_object to GPU memory using cudaMalloc/cudaMemcpy functions ?
There seems to be problem with regular std::vector. I dont want to rely on Thrust library, because I am not sure whether it would work with my MPI serialization solution.
EDIT QUESTION
Can I use managed for my Data_object, or cudaMallocManaged() to make the data accessible to GPU ?
PLEASE READ
The size of the Data_object is well defined at the beginning of the program execution. None of the vectors changes size anywhere else, but the beginning of the execution. So why am I using vectors ? this way I can set the vectors size by passing parameters, instead of re-compiling the program to change the data size (such as when the data are defined as arrays).
RESPONSE TO COMMENTS
1) I think can replace all the vectors with pointers to arrays.
No, and the extra sections in this question don't help. std::vector is just not intended to work that way: It "owns" the memory it points to, and if you mem-copy it someplace else (even in host memory) and use it from there, you'll just corrupt your memory. Also, the std::vector code can't even run on the GPU since it's not __device__-code.
What you could do is use an std::span, which doesn't own the memory, instead of the std::vector. If you do that, and the memory is managed, then mem-copying a class might work.
Note I'm completely disregarding the members other than the vector as that seems to be the main issue here.
Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays).
double **allocate_matrix(int NumRows, int NumCol)
{
double **x;
int i;
x = (double **)malloc(NumRows * sizeof(double *));
for (i = 0; i < NumRows; ++i) x[i] = (double *)calloc(NumCol, sizeof(double));
return x;
}
double **x = allocate_matrix(1000,2000);
x[m][n] = ...;
But recently noticed that many people implement matrices as follows
double *x = (double *)malloc(NumRows * NumCols * sizeof(double));
x[NumCol * m + n] = ...;
From the locality point of view the second method seems perfect, but has awful readability... So I started to wonder, is my first method with storing auxiliary array or **double pointers really bad or the compiler will optimize it eventually such that it will be more or less equivalent in performance to the second method? I am suspicious because I think that in the first method two jumps are made when accessing the value, x[m] and then x[m][n] and there is a chance that each time the CPU will load first the x array and then x[m] array.
p.s. do not worry about extra memory for storing **double, for large matrices it is just a small percentage.
P.P.S. since many people did not understand my question very well, I will try to re-shape it: do I understand right that the first method is kind of locality-hell, when each time x[m][n] is accessed first x array will be loaded into CPU cache and then x[m] array will be loaded thus making each access at the speed of talking to RAM. Or am I wrong and the first method is also OK from data-locality point of view?
For C-style allocations you can actually have the best of both worlds:
double **allocate_matrix(int NumRows, int NumCol)
{
double **x;
int i;
x = (double **)malloc(NumRows * sizeof(double *));
x[0] = (double *)calloc(NumRows * NumCol, sizeof(double)); // <<< single contiguous memory allocation for entire array
for (i = 1; i < NumRows; ++i) x[i] = x[i - 1] + NumCols;
return x;
}
This way you get data locality and its associated cache/memory access benefits, and you can treat the array as a double ** or a flattened 2D array (array[i * NumCols + j]) interchangeably. You also have fewer calloc/free calls (2 versus NumRows + 1).
No need to guess whether the compiler will optimize the first method. Just use the second method which you know is fast, and use a wrapper class that implements for example these methods:
double& operator(int x, int y);
double const& operator(int x, int y) const;
... and access your objects like this:
arr(2, 3) = 5;
Alternatively, if you can bear a little more code complexity in the wrapper class(es), you can implement a class that can be accessed with the more traditional arr[2][3] = 5; syntax. This is implemented in a dimension-agnostic way in the Boost.MultiArray library, but you can do your own simple implementation too, using a proxy class.
Note: Considering your usage of C style (a hardcoded non-generic "double" type, plain pointers, function-beginning variable declarations, and malloc), you will probably need to get more into C++ constructs before you can implement either of the options I mentioned.
The two methods are quite different.
While the first method allows for easier direct access to the values by adding another indirection (the double** array, hence you need 1+N mallocs), ...
the second method guarantees that ALL values are stored contiguously and only requires one malloc.
I would argue that the second method is always superior. Malloc is an expensive operation and contiguous memory is a huge plus, depending on the application.
In C++, you'd just implement it like this:
std::vector<double> matrix(NumRows * NumCols);
matrix[y * numCols + x] = value; // Access
and if you're concerned with the inconvenience of having to compute the index yourself, add a wrapper that implements operator(int x, int y) to it.
You are also right that the first method is more expensive when accessing the values. Because you need two memory lookups as you described x[m] and then x[m][n]. There is no way the compiler will "optimize this away". The first array, depending on its size, will be cached, and the performance hit may not be that bad. In the second case, you need an extra multiplication for direct access.
In the first method you use, the double* in the master array point to logical columns (arrays of size NumCol).
So, if you write something like below, you get the benefits of data locality in some sense (pseudocode):
foreach(row in rows):
foreach(elem in row):
//Do something
If you tried the same thing with the second method, and if element access was done the way you specified (i.e. x[NumCol*m + n]), you still get the same benefit. This is because you treat the array to be in row-major order. If you tried the same pseudocode while accessing the elements in column-major order, I assume you'd get cache misses given that the array size is large enough.
In addition to this, the second method has the additional desirable property of being a single contiguous block of memory which further improves the performance even when you loop through multiple rows (unlike the first method).
So, in conclusion, the second method should be much better in terms of performance.
If NumCol is a compile-time constant, or if you are using GCC with language extensions enabled, then you can do:
double (*x)[NumCol] = (double (*)[NumCol]) malloc(NumRows * sizeof (double[NumCol]));
and then use x as a 2D array and the compiler will do the indexing arithmetic for you. The caveat is that unless NumCol is a compile-time constant, ISO C++ won't let you do this, and if you use GCC language extensions you won't be able to port your code to another compiler.
So what I've got is a Grid class and a Tile class. ATM Grid contains two dimensional vector of Tiles (vector<vector<Tile>>). These Tiles hold info about their x, y and z (it's a top down map) and f.e. erosion rate etc.
My problem is with that is that I need to effectively access these tiles by their x/y coordinates, find a tile with median (or other 0 to 1 value, median being 0.5) value from all z coordinates (to set sea level) and also loop through all of them from highest z to the lowest (for creating erosion map.
What would you suggest would be the best data structure to hold these in so I can effectively do everything I listed above and maybe something else as well if I find out later I need it. Right now I just create a temporary sorted structure or map to do the thing, copying all the tiles into it and working with it, which is really slow.
The options I've considered are map which doesn't have a direct access and is also always sorted which would make picking tiles by their x/y hard.
Then a single vector which would allow direct access but if I was to sort the tiles the direct access would be pointless because the position of Tile in vector would be the same as it's x + y * width.
Here is a small sample code:
Class Grid {
public:
Class Tile {
unsigned x;
unsigned y;
float z; // used for drawing height map
static float seaLevel; // static value for all the tiles
unsigned erosionLevel; //used for drawing erosion map
void setSeaLevel(float pos) {
// set seaLevel to z of tile on pos from 0 to 1 in tile grid
}
void generateErosionMap() {
// loop thorugh all tiles from highest z to lowest z and set their erosion
}
void draw() {
// loop through all tiles by their x/y and draw them
}
vector<vector<Tile>> tileGrid;
}
The C++ library provides a basic set of containers. Each container is optimized for access in a specific way.
When you have a requirement to be able to optimally access the same set of data in different ways, the way to do this is to combine several containers together, all referencing the same underlying data, with each container being used to locate a single chunk of data in one particular way.
Let's take two of your requirements, as an example:
Locate a Grid object based on its X and Y coordinates, and
Iterate over all Grids in monotonically increasing or decreasing order, by their z coordinates.
We can implement the first requirement by using a simple two-dimensional vector:
typedef std::vector<std::vector<std::shared_ptr<Grid>>> lookup_by_xy_t;
lookup_by_xy_t lookup_by_xy;
This is rather obvious, on its face value. But note that the vector does not store the actual Grids, but a std::shared_ptr to these objects. If you are not familiar with std::shared_ptrs, read up on them, and understand what they are.
This is fairly basic: you construct a new Grid:
auto g = std::make_shared<Grid>( /* arguments to Grid's constructor */);
// Any additional initialization...
//
// g->foo(); g->bar=4;
//
// etc...
and simply insert it into the lookup vector:
lookup_by_xy[g->x][g->y]=g;
Now, we handle your second requirement: being able to iterate over all these objects by their z coordinates:
typedef std::multimap<double, std::shared_ptr<Grid>> lookup_by_z_t;
lookup_by_z_t lookup_by_z;
This is assuming that your z coordinate is a double. The multimap will, by default, iterate over its contents in strict weak ordering according to the key, from lowest to the highest key. You can either iterate over the map backwards, or use the appropriate comparison class with the multimap, to order its keys from highest to lowest values.
Now, simply insert the same std::shared_ptr into this lookup container:
lookup_by_z.insert(std::make_pair(g->z, g));
Now, you can find each Grid object by either its x/y coordinate, or iterate over all objects by their z coordinates. Both of the two-dimensional vector, and the multimap, contain shared_ptrs to the same Grid objects. Either one can be used to access them.
Simply create other containers, as needed, to access the same underlying objects, in different ways.
Now, of course, all of this additional framework does impose some additional overhead, in terms of dynamic memory allocations, and the overhead for each container itself. There is no free lunch. A custom allocator might become necessary if the amount of raw data becomes an issue.
So after asking this question on my university and getting bit deeper explanation, I've come to this solution.
If you need a data structure that needs various access methods(like in my case direct access by x/y, linear access through sorted z etc.) best solution is to make you own class for handling it. Also using shared_ptr is much slower than uniqu_ptr and shouldn't be used unless necessary. So in my case the implementation would look something like this:
#ifndef TILE_GRID_H
#define TILE_GRID_H
#include "Tile.h"
#include <memory>
#include <vector>
using Matrix = std::vector<std::vector<std::unique_ptr<Tile>>>;
using Sorted = std::vector<Tile*>;
class TileGrid {
public:
TileGrid(unsigned w, unsigned h) : width(w), height(h) {
// Resize _dA to desired size
_directAccess.resize(height);
for (unsigned j = 0; j < height; ++j)
for (unsigned i = 0; i < width; ++i)
_directAccess[j].push_back(std::make_unique<Tile>(i, j));
// Link _sZ to _dA
for (auto& i : _directAccess)
for (auto& j : i)
_sortedZ.push_back(j.get());
}
// Sorts the data by it's z value
void sortZ() {
std::sort(_sortedZ.begin(), _sortedZ.end(), [](Tile* a, Tile* b) { return b->z < a->z; });
}
// Operator to read directly from this container
Tile& operator()(unsigned x, unsigned y) {
return *_directAccess[y][x];
}
// Operator returning i-th position from sorted tiles (in my case used for setting sea level)
Tile& operator()(float level) {
level = fmax(fmin(level, 1), 0);
return *_sortedZ[width * height * level];
}
// Iterators
auto begin() { return _sortedZ.begin(); }
auto end() { return _sortedZ.end(); }
auto rbegin() { return _sortedZ.rbegin(); }
auto rend() { return _sortedZ.rend(); }
const unsigned width; // x dimensoin
const unsigned height; // y dimension
private:
Matrix _directAccess;
Sorted _sortedZ;
};
#endif // TILE_GRID_H
You could also use template, but in my case I only needed this for the Tile class. So as you can see, the main _directAccess matrix holds all the unique_ptr while _sortedZ has only raw pointers to data stored in _dA. This is much faster and also safe because of these pointers being tied to one class, and all of them being deleted at the same time. Also I've added overloaded () operators for accessing the data and reused iterators from the _sortedZ vector. And again the width and height being const is only because of the intended usage for this data structure(not resizable, immovable tiles etc.).
If you have any questions or suggestions on what to improve, feel free to comment.
How can you map negative map coordinates in a 2D tile based game?
ex. (-180,100)
or (10, -8)
i need to access them with O(1). i don't want create a huge 2d vector and consider (500,500) as (0,0) just to call negative coordinates.
kinda of a dumb question, but i really have no clue.
Thank you.
I'm assuming you've got an infinite, procedurally-generated world, because if the world isn't infinite, it's a simple matter of setting the lower bound of the X and Y coordinate at zero, then wrapping a function around your tile map array that automatically returns zero if someone asks for a tile that's out of bounds.
In the case of an infinite world, you're not going to be able to access your tiles in O(1) time -- the best you're going to do is O(log n). Here's how I'd go about it:
Divide your tile map into square chunks of whatever size you find reasonable (we'll call this size S)
Create a hash map of vectors, each vector being one chunk of your map.
When the player moves close to a new chunk, generate it in a background thread and toss it into the hash. The hash should be based on the x, y coordinates of the chunk (x/S, y/S).
If you want to access a map tile at a particular position, grab t.he vector for the appropriate chunk and then access tile (x%S, y%S) in that vector.
If you wrap this inside a class, you can add features to it, such as loading and saving chunks to disk so you don't have to hold the entire map in memory. You can also write a getTile function that can take arbitrary coordinates, and takes care of picking the correct chunk uand position inside that chunk for you.
Your vector is always accessible in O(1) time, and the hash should be accessible in O(log n) time. Furthermore, as long as you choose a sane chunk size, the size of your hash won't get out of hand, so the performance impact will be essentially nil.
If you data is dense (all or most of the points in a known 2D range are used), nothing will beat a 2D array (O(1)). Accept the coordinate shift.
If your data is sparse, how about a hash table on the coordinate pairs. Close to O(1).
std::map or hash_map doesnt seem to suit my needs. they are overly complex and not flexible enough.
i decided go for std::vector and accept high memory usage
i'll leave how i did it here for future reference
const uint Xlimit = 500;
const uint Ylimit = 500;
class Tile
{
public:
Tile(){someGameData=NULL;}
void *someGameData;
};
class Game
{
public:
Game()
{
tiles.resize(Xlimit, std::vector<Tile>(Ylimit, Tile()));
}
inline Tile* GetTileAtCoord(int x, int y)
{
return &tiles[x+Xlimit][y+Ylimit];
}
protected:
std::vector<std::vector<Tile>> tiles;
};