Large vectors and memory conservation c++

Large vectors and memory conservation c++ - c++

I'm working on making a tile-map editor in C++. Right now, while a map is being edited, its attributes are stored across three vectors:
vector<vector<vector<bool>>> CanvasCollisionObstruction; //[collision,obstruction][map x][map y]
vector<vector<vector<bool>>> CanvasZoneOverlays; //zone overlays for programmable map zones [zone type][map x][map y]
vector<vector<vector<canvasClip>>> CanvasClips; //identifies which sprite occupies this tile [layer number][map x][map y]
In the above vectors, the 2nd and 3rd dimensions ([map x] and [map y]) refer to the actual tile coordinates on the map. These are just plain-old-square-2d maps. The type of that last vector is the following struct:
struct canvasClip
{
int tileset;
int clip;
//initialization to check if it's in-use
canvasClip() : tileset(-1), clip(-1) {}
bool isInitialized()
{//a clip is only rendered if it is initialized
return ((tileset >= 0) && (clip >= 0));
}
bool operator==(const canvasClip& a) const
{//used in flood-fill to compare with target tile
return ((tileset == a.tileset) && (clip == a.clip));
}
bool operator!=(const canvasClip& a) const
{//used in flood-fill to compare with target tile
return ((tileset != a.tileset) || (clip != a.clip));
}
};
For this application, I expect to eventually want to generate maps with size upwards of 50000x50000 tiles, across indefinite (but probably never more than 10) layers. There are about 12 zones total, and that number is constant.
The map editor has some controls to change the size of the map (numeric inputs and a button). When I set the map size to a very large number, I call vector.resize() on each of those vectors, and I can watch my memory usage quickly travel upward in the task manager until my computer finally crashes.
Can anyone offer me some advice or tips for handling very large vectors? Am I going to have to do something like compress the vector so that a single index describes a span of similar tiles? Should I store the map in a file instead of memory, and then read-back only a few chunks of it at a time, as-needed?
How do good programmers handle this situation?

As it already mentioned in the comments, you are trying to allocate huge amount of memory only for data.
In this case, you have to choose different data structure to store it and operate with it.
Here is a couple of the simplest tricks that you may apply in the cost of complexity of the code that operates the data:
You have default values that seem to be meaningless. Why not to store in memory only the data that is set to true values?
You may store in memory only visible part of data (see Here's what's happening in Horizon: Zero Dawn every time you move a camera
).
You might have to pack your data structure and work on alignments (see Data structure alignment).
Of course, there is a case when you have to limit requirements, but that's a part of a life.

Related

Data structure to store a grid (that will have negative indices)

I'm studying robotics at the university and I have to implement on my own SLAM algorithm. To do it I will use ROS, Gazebo and C++.
I have a doubt about what data structure I have to use to store the map (and what I'm going to store it, but this is another story).
I have thought to represent the map as a 2D grid and robot's start location is (0,0). But I don't know where exactly is the robot on the world that I have to map. It could be at the top left corner, at the middle of the world, or in any other unknonw location inside the world.
Each cell of the grid will be 1x1 meters. I will use a laser to know where are the obstacles. Using current robot's location, I will set to 1 on all the cells that represent an obstacle. For example, it laser detects an obstacle at 2 meters in front of the robot, I will set to 1 the cell at (0,2).
Using a vector, or a 2D matrix, here is a problem, because, vector and matrices indices start at 0, and there could be more room behind the robot to map. And that room will have an obstacle at (-1,-3).
On this data structure, I will need to store the cells that have an obstacle and the cells that I know they are free.
Which kind of data structure will I have to use?
UPDATE:
The process to store the map will be the following:
Robot starts at (0,0) cell. It will detect the obstacles and store them in the map.
Robot moves to (1,0) cell. And again, detect and store the obstacles in the map.
Continue moving to free cells and storing the obstacles it founds.
The robot will detect the obstacles that are in front of it and to the sides, but never behind it.
My problem comes when the robot detects an obstacle on a negative cell (like (0,-1). I don't know how to store that obstacle if I have previously stored only the obstacle on "positive" cells. So, maybe the "offset", it is not a solution here (or maybe I'm wrong).

This is where you can write a class to help you:
class RoboArray
{
constexpr int width_ = ...
constexpr int height_ = ...
Cell grid_[width_ * 2][height_ * 2];
...
public:
...
Cell get(int x, int y) // can make this use [x][y] notation with a helper class
{
return grid_[x + width_][y + height];
}
...
}

The options you have:
Have an offset. Simple and dirty. Your grid is 100x100 but stores -50,-50 to 50x50.
Have multiple offset'ed grids. When you go out of the grid allocate a new one beside it, with a different offset. A list or map of grids.
Have sparse structure. A set or map of coordinates.
Have an hierarchical structure. Your whole, say 50x50, grid is one cell in a grid at a higher level. Implement it with a linked list or something so when you move you build a tree of nest grids. Very efficient for memory and compute time, but much more complex to implement.

You can use a std::set to represent a grid layout by using a position class you create. It contains a x and y variable and can therefore be used to intuitively be used to find points inside the grid. You can also use a std::map if you want to store information about a certain location inside the grid.
Please don't forget to fulfill the C++ named requirements for set/map such as Compare if you don't want to provide a comparison operator externally.
example:
position.h
/* this class is used to store the position of things
* it is made up by a horizontal and a vertical position.
*/
class position{
private:
int32_t horizontalPosition;
int32_t verticalPosition;
public:
position::position(const int hPos = 0,const int vPos = 0) : horizontalPosition{hPos}, verticalPosition{vPos}{}
position::position(position& inputPos) : position(inputPos.getHorPos(),inputPos.getVerPos()){}
position::position(const position& inputPos) : position((inputPos).getHorPos(),(inputPos).getVerPos()){}
//insertion operator, it enables the use of cout on this object: cout << position(0,0) << endl;
friend std::ostream& operator<<(std::ostream& os, const position& dt){
os << dt.getHorPos() << "," << dt.getVerPos();
return os;
}
//greater than operator
bool operator>(const position& rh) const noexcept{
uint64_t ans1 = static_cast<uint64_t>(getVerPos()) | static_cast<uint64_t>(getHorPos())<<32;
uint64_t ans2 = static_cast<uint64_t>(rh.getVerPos()) | static_cast<uint64_t>(rh.getHorPos())<<32;
return(ans1 < ans2);
}
//lesser than operator
bool operator<(const position& rh) const noexcept{
uint64_t ans1 = static_cast<uint64_t>(getVerPos()) | static_cast<uint64_t>(getHorPos())<<32;
uint64_t ans2 = static_cast<uint64_t>(rh.getVerPos()) | static_cast<uint64_t>(rh.getHorPos())<<32;
return(ans1 > ans2);
}
//equal comparison operator
bool operator==(const position& inputPos)const noexcept {
return((getHorPos() == inputPos.getHorPos()) && (getVerPos() == inputPos.getVerPos()));
}
//not equal comparison operator
bool operator!=(const position& inputPos)const noexcept {
return((getHorPos() != inputPos.getHorPos()) || (getVerPos() != inputPos.getVerPos()));
}
void movNorth(void) noexcept{
++verticalPosition;
}
void movEast(void) noexcept{
++horizontalPosition;
}
void movSouth(void) noexcept{
--verticalPosition;
}
void movWest(void) noexcept{
--horizontalPosition;
}
position getNorthPosition(void)const noexcept{
position aPosition(*this);
aPosition.movNorth();
return(aPosition);
}
position getEastPosition(void)const noexcept{
position aPosition(*this);
aPosition.movEast();
return(aPosition);
}
position getSouthPosition(void)const noexcept{
position aPosition(*this);
aPosition.movSouth();
return(aPosition);
}
position getWestPosition(void)const noexcept{
position aPosition(*this);
aPosition.movWest();
return(aPosition);
}
int32_t getVerPos(void) const noexcept {
return(verticalPosition);
}
int32_t getHorPos(void) const noexcept {
return(horizontalPosition);
}
};
std::set<position> gridNoData;
std::map<position, bool> gridWithData;
gridNoData.insert(point(1,1));
gridWithData.insert(point(1,1),true);
gridNoData.insert(point(0,0));
gridWithData.insert(point(0,0),true);
auto search = gridNoData.find(point(0,0));
if (search != gridNoData.end()) {
std::cout << "0,0 exists" << '\n';
} else {
std::cout << "0,0 doesn't exist\n";
}
auto search = gridWithData.find(point(0,0));
if (search != gridWithData.end()) {
std::cout << "0,0 exists with value" << search->second << '\n';
} else {
std::cout << "0,0 doesn't exist\n";
}
The above class was used by me in a similar setting and we used a std::map defined as:
std::map<position,directionalState> exploredMap;
To store if we had found any walls at a certain position.
By using this std::map based method you avoid having to do math to know what offset you have to have inside an 2D array (or some structure like that). It also allows you to move freely as there is no chance that you'll travel outside of the predefined bounds you set at construction. This structure is also more space efficient against a 2D array as this structure only saves the areas where the robot has been. This is also a C++ way of doing things: relying on the STL instead of creating your own 2D map using C constructs.

With offset solution (translation of values by fixed formula (we called it "mapping function" in math class), like doing "+50" to all coordinates, i.e. [-30,-29] will become [+20,+21] and [0,0] will become [+50,+50] ) you still need to have idea what is your maximum size.
In case you want to be dynamic like std::vector<> going from 0 to some N (as much as free memory allows), you can create more complex mapping function, for example map(x) = x*2 when (0 <= x) and x*(-2)-1 when (x < 0) ... this way you can use standard std::vector and let it grow as needed by reaching new maximum coordinates.
With 2D grid vs std::vector this is a bit more complicated as vector of vectors is sometimes not the best idea from performance point of view, but as long as your code can prefer shortness and simplicity over performance, maybe you can use the same mapping for both coordinates and use vector of vectors (using reserve(..) on all of them with some reasonable default to avoid resizing of vectors in common use cases, like if you know the 100m x 100m area will be usual maximum, you can reserve everything to capacity 201 initially to avoid vector resizing for common situations, but it can still grow infinitely (until heap memory is exhausted) in less common situations.
You can also add another mapping function converting 2D coordinates to 1D and use single vector only, and if you want really complicate things, you can for example map those 2D into 0,1,2,... sequence growing from area around center outward to save memory usage for small areas... you will probably easily spend 2-4 weeks on debugging it, if you are kinda fresh to C++ development, and you don't use unit testing and TDD approach (I.e. just go by simple vector of vectors for a start, this paragraph is JFYI, how things can get complicated if you are trying to be too smart :) ).

Class robotArray
{
Int* left,right;
}
RobotArray::RobotArray ()
{
Int* a=new int [50][50];
Int* b=new int[50][50];
//left for the -ve space and right for the positive space with
0,0 of the two arrays removed
Left=a+1;
Right=b+1;
}

I think I see what you are after here: you don't know how big the space is, or even what the coordinates may be.
This is very general, but I would create a class that holds all of the data using vectors (another option -- vector of pairs, or vector of Eigen (the library) vectors). As you discover new regions, you'll add the coordinates and occupancy information to the Map (via AddObservation(), or something similar).
Later, you can determine the minimum and maximum x and y coordinates, and create the appropriate grid, if you like.
class RoboMap{
public:
vector<int> map_x_coord;
vector<int> map_y_coord;
vector<bool> occupancy;
RoboMap();
void AddObservation(int x, int y, bool in_out){
map_x_coord.push_back(x);
map_y_coord.push_back(y);
occupancy.push_back(in_out);
}
};

Memory Efficiency - Eigen::VectorXd in a loop

I have a Measurement object that has two Eigen::VectorXd members -- one for position and the other velocity.
Measurements are arranged in a dataset by scans -- i.e., at each timestep, a new scan of measurements is added to the dataset. These types are defined as:
typedef std::shared_ptr<Measurement> MeasurementPtr;
typedef std::vector<MeasurementPtr> scan_t;
typedef std::vector<scan_t> dataset_t;
At the beginning of each iteration of my algorithm, I need to apply a new transformation to each measurement. Currently, I have:
for (auto scan = dataset_.begin(); scan != dataset_.end(); ++scan)
for (auto meas = scan->begin(); meas != scan->end(); ++meas) {
// Transform this measurement to bring it into the same
// coordinate frame as the current scan
if (scan != std::prev(dataset_.end())) {
core::utils::perspective_transform(T_, (*meas)->pos);
core::utils::perspective_transform(T_, (*meas)->vel);
}
}
Where perspective_transform is defined as
void perspective_transform(const Eigen::Projective2d& T, Eigen::VectorXd& pos) {
pos = (T*pos.homogeneous()).hnormalized();
}
Adding this code increases computation time by 40x when I run the algorithm with scans in the dataset with 50 measurements in each scan -- making it rather slow. I believe this is because I have 550 small objects, each with 2 Eigen memory writes. I removed the writing of the result to memory and my benchmark shows only a slight decrease -- suggesting that this is a memory-efficiency problem and not a computation bottleneck.
How can I speed up this computation? Is there a way to first loop through and create an Eigen::Matrix from Eigen::Map that I could then do the computation once and have it automatically update the two members of the all the Measurement objects?

You might want to rework your data-structures.
Currently you have an array-of-struct (AOS), with a number of indirections.
A structure-of-arrays (SOA) is generally more efficient in memory access.
What about ?:
struct Scant_t
{
Eigen::MatrixXd position;
Eigen::MatrixXd velocity;
}
the .rowwise() and .colwise() operators might be powerfull enough to do the homogeneous transform, which would save you writing the inner loop.

How can I hash a std::unordered_map::const_iterator?

Do you remember my prior question: What is causing data race in std::async here?
Even though I successfully parallelized this program, it still ran too slowly to be practical.
So I tried to improve the data structure representing a Conway's Game of Life pattern.
Brief explanation of the new structure:
class pattern {
// NDos::Lifecell represents a cell by x and y coordinates.
// NDos::Lifecell is equality comparable, and has std::hash specialization.
private:
std::unordered_map<NDos::Lifecell, std::pair<int, bool>> cells_coor;
std::unordered_set<decltype(cells_coor)::const_iterator> cells_neigh[9];
std::unordered_set<decltype(cells_coor)::const_iterator> cells_onoff[2];
public:
void insert(int x, int y) {
// if coordinate (x,y) isn't already ON,
// turns it ON and increases the neighbor's neighbor count by 1.
}
void erase(int x, int y) {
// if coordinate (x,y) isn't already OFF,
// turns it OFF and decreases the neighbor's neighbor count by 1.
}
pattern generate(NDos::Liferule rule) {
// this advances the generation by 1, according to the rule.
// (For example here, B3/S23)
pattern result;
// inserts every ON cell with 3 neighbors to result.
// inserts every OFF cell with 2 or 3 neighbors to result.
return result;
}
// etc...
};
In brief, pattern contains the cells. It contains every ON cells, and every OFF cells that has 1 or more ON neighbor cells. It can also contain spare OFF cells.
cells_coor directly stores the cells, by using their coordinates as keys, and maps them to their number of ON neighbor cells (stored as int) and whether they are ON (stored as bool).
cells_neigh and cells_onoff indirectly stores the cells, by the iterators to them as keys.
The number of ON neighbor of a cell is always 0 or greater and 8 or less, so cells_neigh is a size 9 array.
cells_neigh[0] stores the cells with 0 ON neighbor cells, cells_neigh[1] stores the cells with 1 ON neighbor cell, and so on.
Likewise, a cell is always either OFF or ON, so cells_onoff is a size 2 array.
cells_onoff[false] stores the OFF cells, and cells_onoff[true] stores the ON cells.
Cells must be inserted to or erased from all of cells_coor, cells_neigh and cells_onoff. In other words, if a cell is inserted to or erased from one of them, it must be so also for the others. Because of this, the elements of cells_neigh and cells_onoff is std::unordered_set storing the iterators to the actual cells, enabling fast access to the cells by a neighbor count or OFF/ON state.
If this structure works, the insertion function will have average time complexity of O(1), the erasure also O(1), and the generation O(cells_coor.size()), which are great improval of time complexity from the prior structure.
But as you see, there is a problem: How can I hash a std::unordered_map::const_iterator?
std::hash prohibits a specialization for them, so I have to make a custom one.
Taking their address won't work, as they are usually acquired as rvalues or temporaries.
Dereferencing them also won't work, as there are multiple cells that have 0 ON neighbor cells, or multiple cells that is OFF, etc.
So what can I do? If I can't do anything, cells_neigh and cells_onoff will be std::vector or something, sharply degrading the time complexity.

Short story: this won't work (really well)(*1). Most of the operations that you're likely going to perform on the map cells_coor will invalidate any iterators (but not pointers, as I learned) to its elements.
If you want to keep what I'd call different "views" on some collection, then the underlying container storing the actual data needs to be either not modified or must not invalidate its iterators (a linked list for example).
Perhaps I'm missing something, but why not keep 9 sets of cells for the neighbor counts and 2 sets of cells for on/off? (*2) Put differently: for what do you really need that map? (*3)
(*1): The map only invalidates pointers and iterators when rehashing occurs. You can check for that:
// Before inserting
(map.max_load_factor() * map.bucket_count()) > (map.size() + 1)
(*2): 9 sets can be reduced to 8: if a cell (x, y) is in none of the 8 sets, then it would be in the 9th set. Thus storing that information is unnecessary. Same for on/off: it's enough to store cells that are on. All other are off.
(*3): Accessing the number of neighbours without using the map but only with sets of cells, kind of pseudo code:
unsigned number_of_neighbours(Cell const & cell) {
for (unsigned neighbours = 9; neighbours > 0; --neighbours) {
if (set_of_cells_with_neighbours(neighbours).count() == 1) {
return neighbours;
}
}
return 0;
}
The repeated lookups in the sets could of course destroy actual performance, you'd need to profile that. (Asymptotic runtime is unaffected)

Implementing endless map in memory

I will have to implement an endless 3D raster map in program memory. The map may or may NOT start at [0;0;0]. The map has Y coordinate limited by 255, others may be infinite. (yes, now, you may have guessed it is a Minecraft map)
I need to create some class which will have simple McMap::getBlock(int x, short y, int z) and McMap::setBlock(int x, short y, int z) method. Means I need to be able to both read and write the data. I also want to be able to delete blocks and so free the memory.
What should user for this purpose? I think the best solution would be some table with such structure:
int x|short y|int z|int block id|other values...
-----+-------+-----+------------+---------------
55| 21| 666| 1|
But how do I implement this with C++, without using real MySql (that would be real overkill)? Also, I don't want to keep the map when the program exits, so I want the data to be inside the programs memory.
Once more, consider that the map is infinite and so the coordinates may be whatever. Also do not forget that a very distant points may be mapped.
Also, a very important thing to note: I need to have an effective way to get block by X, Y and Z coordinates - I don't want to walk through all block to find one of them.
I have already included boost library.

I assume you probably wouldn't need to the have entire possible area of a Minecraft world in memory time because that would be incredibly huge (1024000000 KM^2). If you are just trying to keep the area that anyone would usually end up visiting during a game in memory I think it would be completely feasible to access it using STL (Standard template library).
Minecraft worlds are always loaded in game in chunks which are 16X16X255 blocks. You could store chunks in your program in a std::map. There are a few advantages to this method. The first is it allows representation of locations well beyond the playable area of the map based on the wiki entry for the Far Lands. It also allows for a sparse representation of the minecraft map that will very closely resemble how actual Minecraft maps are rendered. Only the chunks that you are using for your program are loaded in the std::map and hopefully keep memory usage reasonable. You would be able to represent any area no matter its location in the playable area of the total possible Minecraft map area.
To implement this you will just have to first create the world datatype:
using namespace std;
struct Block
{
// Whatever information you care to store here...
};
typedef vector<block> Chunk;
typedef map<int, map<int, Chunk> > World;
Then to access a single block:
Block McMap::getBlock(int x, short y, int z)
{
const int WIDTH = 16; // You might want to store these constants elsewhere
const int HEIGHT = 255;
int chunkx = x / WIDTH;
int chunkz = z / WIDTH;
return yourWorld[chunkx][chunkz][x + z * WIDTH + y * HEIGHT * WIDTH];
}
To erase a chunk:
void McMap::eraseChunk(int x, int z)
{
if (yourWorld.find(x)) // Tests to make sure that row exists in the map.
yourWorld[x].erase(z);
}
Another benefit to using this method is that by creating a clever constructor for a chunk rather than just using a typdedef like I did you could automatically generate a chunk when you need to access a new chunk in the world std::map similar to how chunks are generated in Minecraft only when you visit them. Whenever you access an object that does not exist yet in a map it will call the default constructor for that object.

You can decompose your map in chunks, like in minecraft. Each chunk is W * H * L (x y z) blocks. So, a chunk is just a 3d array. The best to do is to wrap it into a 1d array:
BlockType* chunk = new BlockType[W * H * L];
BlockType block = chunk[x + z * W + y * W * H];
This is to have a good memory management (and is a lot better than storing the whole possible map in an array). Note than accessing a block in a chunk is O(1) and here should be very fast.
Then, you can store your chunks. Each chunk is given 2d coordinates that are its id.
The fastest (for access) should be a std::map:
std::map<ChunkCoord, ChunkType*> map;
Access to block is fast. You need to get the chunk (a division should give you the chunk coords from point coords), then you get the block. Accessing a chunk is in O(log(numChunks)).
Creating a chunk is allocating memory and creating a new item in the map. You will still be limited by the amount of memory in the computer (infinity is not part of this world...), that's why minecraft-like games often save unused chunks to disk. Saving to disk is the only way to have a near-endless map.
The tricky point is to find good values for W, H, and L. For this, I'm afraid you will have to test and measure a lot...
Note: extending this idea leads to quadtrees. You can use them, but they may have too much memory overhead.

Negative coordiates with Tile maps

How can you map negative map coordinates in a 2D tile based game?
ex. (-180,100)
or (10, -8)
i need to access them with O(1). i don't want create a huge 2d vector and consider (500,500) as (0,0) just to call negative coordinates.
kinda of a dumb question, but i really have no clue.
Thank you.

I'm assuming you've got an infinite, procedurally-generated world, because if the world isn't infinite, it's a simple matter of setting the lower bound of the X and Y coordinate at zero, then wrapping a function around your tile map array that automatically returns zero if someone asks for a tile that's out of bounds.
In the case of an infinite world, you're not going to be able to access your tiles in O(1) time -- the best you're going to do is O(log n). Here's how I'd go about it:
Divide your tile map into square chunks of whatever size you find reasonable (we'll call this size S)
Create a hash map of vectors, each vector being one chunk of your map.
When the player moves close to a new chunk, generate it in a background thread and toss it into the hash. The hash should be based on the x, y coordinates of the chunk (x/S, y/S).
If you want to access a map tile at a particular position, grab t.he vector for the appropriate chunk and then access tile (x%S, y%S) in that vector.
If you wrap this inside a class, you can add features to it, such as loading and saving chunks to disk so you don't have to hold the entire map in memory. You can also write a getTile function that can take arbitrary coordinates, and takes care of picking the correct chunk uand position inside that chunk for you.
Your vector is always accessible in O(1) time, and the hash should be accessible in O(log n) time. Furthermore, as long as you choose a sane chunk size, the size of your hash won't get out of hand, so the performance impact will be essentially nil.

If you data is dense (all or most of the points in a known 2D range are used), nothing will beat a 2D array (O(1)). Accept the coordinate shift.
If your data is sparse, how about a hash table on the coordinate pairs. Close to O(1).

std::map or hash_map doesnt seem to suit my needs. they are overly complex and not flexible enough.
i decided go for std::vector and accept high memory usage
i'll leave how i did it here for future reference
const uint Xlimit = 500;
const uint Ylimit = 500;
class Tile
{
public:
Tile(){someGameData=NULL;}
void *someGameData;
};
class Game
{
public:
Game()
{
tiles.resize(Xlimit, std::vector<Tile>(Ylimit, Tile()));
}
inline Tile* GetTileAtCoord(int x, int y)
{
return &tiles[x+Xlimit][y+Ylimit];
}
protected:
std::vector<std::vector<Tile>> tiles;
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js