c++ Memory reduction with arrays - c++

I'm creating a block based game and I would like to improve on its memory usage.
I'm currently creating blocks with a sizeof() 8. And I can't reduce its size.
Block:
...
bool front, back, left, right, top, bottom;//If the face is active, true
BYTE blockType;
BYTE data;
I came up with a solution but I have no idea how to properly implement it because I'm quite new to c++. The solution:
All air blocks are exactly equal but each take up 8 bytes of memory. If I could set all air blocks, pointing to the same piece of memory, this should (I guess) be using less memory. (unless the pointer address takes up 8 bytes ?)
Currently my array looks like this:
Chunk:
Block*** m_pBlocks;
Chunk::Chunk()
{
m_pBlocks = new Block**[CHUNK_SIZE];
for(int x = 0; x < CHUNK_SIZE; x++){
m_pBlocks[x] = new Block*[CHUNK_HEIGHT];
for(int y = 0; y < CHUNK_HEIGHT; y++){
m_pBlocks[x][y] = new Block[CHUNK_SIZE];
}
}
}
I know you can't make these point to null or point to something else so how should I do this?

Using bitfields to reduce the size of Block.
class Block {
// bit fields, reduce 6 bytes to 1
unsigned char front:1, back:1, left:1, right:1, top:1, bottom:1;//If the face is active, true
BYTE blockType;
BYTE data;
// optional alignment to size 4.
// BYTE pad;
};
Block m_pBlocks[32][64][32]; // 32*64*32=64K * sizeof(Block)=256K that is a lot.
And yes, using a pointer that is 8 bytes is not really a save.
But there are several methods that helps save more, if you have a hightmap everything above the hight map is air! so you only need a 2d array to check that, where the elements are the hight. All air voxels below the hightmap must be defined along with the other none-air elements.
Other data structures are often more space effective like Octree or Sparse voxel octree.

if you don't need to modify those blocks, you could create a lookup map.
please, do not use new, and avoid pointers as much as you can.
bool lookup[CHUNK_SIZE][CHUNK_HEIGHT];
Chunk::Chunk()
{
for(int x = 0; x < CHUNK_SIZE; x++)
for(int y = 0; y < CHUNK_HEIGHT; y++)
lookup[x][y] = true;
}
now you can just query the lookup table to see if you have that particular block set.
Moreover you now have all those values close together, which is beneficial for performance.

Related

Data analysis - memory bug c++

I am a data scientist, currently working on some C++ code to extract triplet particles from a rather large text file containing 2D coordinate data of particles in ~10⁵ consecutive frames. I am struggling with a strange memory error that I don't seem to understand.
I have a vector of structs, which can be divided into snippets defined by their frame. For each frame, I build an array with unique ID's for each individual coordinate pair, and if at any point the coordinate pair is repeated, the coordinate pair is given the old coordinate pair. This I then use later to define whether the particle triplet is indeed a trimer.
I loop over all particles and search forward for any corresponding coordinate pair. After I'm done, and no particles were found, I define this triplet to be unique and push the coordinates into a vector that corresponds to particle IDs.
The problem is: after the 18th iteration, at line trimerIDs[i][0] = particleCounter; , the variable trimerCands (my big vector array) suddenly becomes unreadable. Can this be that the vector pointer object is being overwritten? I put this vector fully on the heap, but even if I put it on stack, the error persists.
Do any of you have an idea of what I might be overlooking? Please note that I am rather new at C++, coming from other, less close to the metal, languages. While I think I understand how stack/heap allocations work, especially with respect to vectors/vector structs, I might be very wrong!
The error that Eclipse gives me in the variables tab is:
Failed to execute MI command:
-data-evaluate-expression trimerCands
Error message from debugger back end:
Cannot access memory at address 0x7fff0000000a
The function is as follows.
struct trimerCoords{
float x1,y1,x2,y2,x3,y3;
int frame;
int tLength1, tLength2, tLength3;
};
void removeNonTrimers(std::vector<trimerCoords> trimerCands, int *trCandLUT){
// trimerCands is a vector containing possible trimers, tLengthx is an attribute of the particle;
// trCandLUT is a look up table array with indices;
for (int currentFrame = 1; currentFrame <=framesTBA; currentFrame++){ // for each individual frame
int nTrimers = trCandLUT[currentFrame] - trCandLUT[currentFrame-1]; // get the number of trimers for this specific frame
int trimerIDs[nTrimers][3] = {0}; // preallocate an array for each of the inidivual particle in each triplet;
int firstTrim = trCandLUT[currentFrame-1]; // first index for this particular frame
int lastTrim = trCandLUT[currentFrame] - 1; // last index for this particular frame
bool found;
std::vector<int> traceLengths;
traceLengths.reserve(nTrimers*3);
// Block of code to create a unique ID array for this particular frame
std::vector<Particle> currentFound;
Particle tempEntry;
int particleCounter = 0;
for (int i = firstTrim; i <= lastTrim; i++){
// first triplet particle. In the real code, this is repeated three times, for x2/y2 and x3/y3, corresponding to the
tempEntry.x = trimerCands[i].x1;
tempEntry.y = trimerCands[i].y1;
found = false;
for (long unsigned int j = 0; j < currentFound.size(); j++){
if (fabs(tempEntry.x - currentFound[j].x) + fabs(tempEntry.y - currentFound[j].y) < 0.001){
trimerIDs[i][0] = j; found = true; break;
}
}
if (found == false) {
currentFound.push_back(tempEntry);
traceLengths.push_back(trimerCands[i].tLength1);
trimerIDs[i][0] = particleCounter;
particleCounter++;
}
}
// end block of create unique ID code block
compareTrips(nTrimers, trimerIDs, traceLengths, trimerfile_out);
}
}
If anything's unclear, let me know!

Advantages on flattening to 1D

I have questions about the flattening operation I see on forums. People often recommend flattening a multi-dimensional vector, or array to a single dimension one.
For example:
int height = 10;
int width = 10;
std::vector<int> grid;
for(int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
grid.push_back(rand() % i + j);
}
}
std::vector<std::vector<int>> another_grid;
for(int i = 0; i < height; i++){
std::vector<int> row;
for(int j = 0; j < width; j++){
row.push_back(rand() % i + j);
}
another_grid.push_back(row);
}
I can guess that it's less memory consuming to have a single vector instead of many ones, but what about a multidimensional array of int ? Is there real advantages to flatten multi dimensional data structures ?
I can think of multiple reasons to do this, in no particular order and there might be more that I missed:
Slightly less memory use: each vector takes 24 bytes*, if you have 1000 rows, it's 24K more memory. Not that important, but it's there.
Fewer allocations: Again, not very important, but allocations can be slow, and if this is happening for instance in real time and you're allocating buffers for images coming from a camera, having 1 allocation is better than potentially thousands.
Locality: This is the most important one, with a single allocation, all the data is going to be very close to each other, so accessing nearby data will be much faster either because it's already in the cache, or the prefetching hardware can accurately pull the next cache line.
Easier serialization/deserialization: For instance, if this is a texture data, it can be passed to a GPU with a single copy. Same applies for writing to a disk or network, though you may want some compression with those.
The downside is it's less comfortable to write and use, but with a proper class abstracting this away, it's pretty much a must-have if performance matters. It may also be less efficient for certain operations. For instance, with the vector<vector<>> version, you can swap entire rows with a single pointer swap, and the single vector version needs to copy a bunch of data around.
*: This depends on your implementation, but on 64-bit platforms, this is common.

Flip picture horizontal c++ without 2d array

I would like to know if there is anyway to horizontally flip an image without the use of a 2d array. Something similar to this:
for (int x = 0; x < width/2; x++) {
for (int y = 0; y < height; y++) {
int temp = pixelData[x][y];
pixelData[x][y]=pixelData[width-x][y]
pixelData[width-x][y]=temp;
}
}
You can use a single pointer into the image.
You will need to know the distance (amount to increment the pointer) between the end of one raster line (scanline) to the next. So after you increment past the end of a raster line, add the distance to the pointer, so you end up at the left most column on the next raster line.
With modern systems, that have smart GPUs, you may have better efficiency by copying an image to a 2d-array, transforming it, then putting it back. Smarter GPUs may have a mirroring function along with bit-blitting.

How to copy elements of 2D matrix to 1D array vertically using c++

I have a 2D matrix and I want to copy its values to a 1D array vertically in an efficient way as the following way.
Matrice(3x3)
[1 2 3;
4 5 6;
7 8 9]
myarray:
{1,4,7,2,5,8,3,6,9}
Brute force takes 0.25 sec for 1000x750x3 image. I dont want to use vector because I give myarray to another function(I didnt write this function) as input. So, is there a c++ or opencv function that I can use? Note that, I'm using opencv library.
Copying matrix to array is also fine, I can first take the transpose of the Mat, then I will copy it to array.
cv::Mat transposed = myMat.t();
uchar* X = transposed.reshape(1,1).ptr<uchar>(0);
or
int* X = transposed.reshape(1,1).ptr<int>(0);
depending on your matrix type. It might copy data though.
You can optimize to make it more cache friendly, i.e. you can copy blockwise, keeping track of the positions in myArray, where the data should go to. The point is, that you brute force approach will most likely make each access to the matrix being off-cache, which has a tremendous performance impact. Hence it is better to copy vertical/horizontal taking the cache line size into account.
See the idea bbelow (I didn't test it, so it has most likely bugs, but it should make the idea clear).
size_t cachelinesize = 128/sizeof(pixel); // assumed cachelinesize of 128 bytes
struct pixel
{
char r;
char g;
char b;
};
array<array<pixel, 1000>, 750> matrice;
vector<pixel> vec(1000*750);
for (size_t row = 0; row<matrice.size; ++row)
{
for (size_t col = 0; col<matrice[0].size; col+=cachelinesize)
{
for (size_t i = 0; i<cachelinesize; ++i)
{
vec[row*(col+i)]=matrice[row][col+i]; // check here, if right copy order. I didn't test it.
}
}
}
If you are using the matrix before the vertical assignment/querying, then you can cache the necessary columns when you hit each one of the elements of columns.
//Multiplies and caches
doCalcButCacheVerticalsByTheWay(myMatrix,calcType,myMatrix2,cachedColumns);
instead of
doCalc(myMatrix,calcType,myMatrix2); //Multiplies
then use it like this:
...
tmpVariable=cachedColumns[i];
...
For example, upper function multiplies the matrix with another one, then when the necessary columns are reached, caching into a temporary array occurs so you can access elements of it later in a contiguous order.
I think Mat::reshape is what you want. It does not copying data.

Shift vector in thrust

I'm looking at a project involving online (streaming) data. I want to work with a sliding window of that data. For example, say that I want to hold 10 values in my vector. When value 11 comes in, I want to drop value 1, shift everything over, and then place value 11 where value 10 was.
The long way would be something like the following:
int n = 9;
thrust::device_vector<float> val;
val.resize(n+1,0);
// Shift left
for(int i=0; i != n-1; i++){
val[i] = val[i+1];
}
// add the new value to the last position
val[n] = newValue;
Is there a "fast" way to do this with thrust? The project I'm looking at will have around 500 vectors that will need this operation done simultaneously.
Thanks!
As I have said, Ring buffer is what you need. No need to shift there, only one counter and a fixed size array.
Let's think how we may deal with 500 of ring buffers.
If you want to have 500 (let it be 512) sliding windows and process them all on the GPU, then you might pack them into one big 2D texture, where each column is an array of samples for the same moment.
If you're getting new samples for each of the vector at once (I mean one new sample for each 512 buffers at one processing step), then this "ring texture" (like a cylinder) only needs to be updated once (upload the array of new samples at each step) and you need just one counter.
I highly recommend using a different, yet still free, library for this problem. In 4 lines of ArrayFire code, you can do all 500 vectors, as follows:
array val = array(window_width, num_vectors);
val = shift(val, 0, 1);
array newValue = array(1,num_vectors);
val(span,end) = newValue;
I benchmarked against Thrust code for the same and ArrayFire is getting about a 10X speedup over Thrust.
Downside is that ArrayFire is not open source, but it is still free for this sort of problem.
Want you want is simply thrust::copy. You can't do a shift in place in parallel, because you can't guarantee a value is read before it is written.
int n = 9;
thrust::device_vector<float> val_in(n);
thrust::device_vector<float> val_out(n+1);
thrust::copy(val_in.begin() + 1, val_in.end(), val_out.begin());
// add the new value to the last position
val_out[n] = newValue;