I'm creating a block based engine and I was working on infinite loading. I was testing some code, but soon I noticed some troubles. At first I was using an std::unordered_map, storing xz as key and ChunkContainer* as value. I store this as pointers (with new) because it's such a big object it can't be all stored on the stack. It's size is: CHUNK_SIZE(32)^3 * WORLD_HEIGHT(8 amount of chunks in height) * 4(block bytes) = 1048576 bytes. (And that * 225 makes up my world.)
Then I swapped to using a big array instead of the std::unordered_map so I could achieve faster read speed. So I needed to swap the loading code. I came up with this code,
but I'm having some issues with a memory leak created when using this code:
for(int z = 0; z < size; z++){
temp = loadedChunkContainers[(size-1)*size + z]; //Store the last container in a temp var
for(int x = size-1; x > 0; x--){
loadedChunkContainers[x * size + z] = loadedChunkContainers[(x-1) * size + z]; //Move all containers 1 to the right
}
int cx = temp.getX() - size;
int cz = temp.getZ();
temp.move(cx, cz);//Move the container internally
loadedChunkContainers[z] = temp; //put the container back into the array, but this time on the first row
buildQueue.push_back(&loadedChunkContainers[z]);
}
temp is a global variable because I can't store it locally because it will overflow the stack. I also can't use swap as it will also overflow the stack.
Should I even use this code? It works, but It's quite a slow way in the first place. Would there be another way to have the fastest read access while still being able to swap values (without memory leaks)?
An std::unordered_map will never store its elements "on the stack", so this is not a good reason to store pointers. You can basically forget all of your stack-related worries, here. With that goes away all your pointer troubles. Just store the elements directly and be done with it!
If you have measured the std::unordered_map version of your code and found that the (very fast) hash lookup is prohibitively slow in your application, by all means stick with your sparse array, but make it a std::vector so that elements are dynamically allocated [for you]. Then everything I said in the first paragraph still applies. :)
Related
What I Know
I know that arrays int ary[] can be expressed in the equivalent "pointer-to" format: int* ary. However, what I would like to know is that if these two are the same, how physically are arrays stored?
I used to think that the elements are stored next to each other in the ram like so for the array ary:
int size = 5;
int* ary = new int[size];
for (int i = 0; i < size; i++) { ary[i] = i; }
This (I believe) is stored in RAM like: ...[0][1][2][3][4]...
This means we can subsequently replace ary[i] with *(ary + i) by just increment the pointers' location by the index.
The Issue
The issue comes in when I am to define a 2D array in the same way:
int width = 2, height = 2;
Vector** array2D = new Vector*[height]
for (int i = 0; i < width; i++) {
array2D[i] = new Vector[height];
for (int j = 0; j < height; j++) { array2D[i][j] = (i, j); }
}
Given the class Vector is for me to store both x, and y in a single fundamental unit: (x, y).
So how exactly would the above be stored?
It cannot logically be stored like ...[(0, 0)][(1, 0)][(0, 1)][(1, 1)]... as this would mean that the (1, 0)th element is the same as the (0, 1)th.
It cannot also be stored in a 2d array like below, as the physical RAM is a single 1d array of 8 bit numbers:
...[(0, 0)][(1, 0)]...
...[(0, 1)][(1, 1)]...
Neither can it be stored like ...[&(0, 0)][&(1, 0)][&(0, 1)][&(1, 1)]..., given &(x, y) is a pointer to the location of (x, y). This would just mean each memory location would just point to another one, and the value could not be stored anywhere.
Thank you in advanced.
What OP is struggling with a dynamically allocated array of pointers to dynamically allocated arrays. Each of these allocations is its own block of memory sitting somewhere in storage. There is no connection between them other than the logical connection established by the pointers in the outer array.
To try to visualize this say we make
int ** twodee;
twodee = new int*[4];
for (int i = 0; i < 4; i++)
{
twodee[i] = new int[4];
}
and then
int count = 1;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
twodee[i][j] = count++;
}
}
so we should wind up with twodee looking something like
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
right?
Logically, yes. But laid out in memory twodee might look something like this batsmurph crazy mess:
You can't really predict where your memory will be, you're at the mercy of the whatever memory manager handles the allocations and what already in storage where it might have been efficient for your memory to go. This makes laying dynamically-allocated multi-dimensional arrays out in your head almost a waste of time.
And there are a whole lot of things wrong with this when you get down into the guts of what a modern CPU can do for you. The CPU has to hop around a lot, and when it's hopping, it's ability to predict and preload the cache with memory you're likely to need in the near future is compromised. This means your gigahertz computer has to sit around and wait on your megahertz RAM a lot more than it should have to.
Try to avoid this whenever possible by allocating single, contiguous blocks of memory. You may pick up a bit of extra code mapping one dimensional memory over to other dimensions, but you don't lose any CPU time. C++ will have generated all of that mapping math for you as soon as you compiled [i][j] anyway.
The short answer to your question is: It is compiler dependent.
A more helpful answer (I hope) is that you can create 2D arrays that are layed out directly in memory, or you can create "2D arrays" that are actually 1D arrays, some with data, some with pointers to arrays.
There is a convention that the compiler is happy to generate the right kind of code to dereference and/or calculate the address of an element within an array when you use brackets to access an element in the array.
Generally arrays that are known to be 2D at compile time (eg int array2D[a][b]) will be layed out in memory without extra pointers and the compiler knows to multiply AND add to get an address each time there is an access. If your compiler isn't good at optimizing out the multiply, it makes repeated accesses much slower than they can be, so in the old days we often did pointer math ourselves to avoid the multiply if possible.
There is the issue that a compiler might optimize by rounding the lower dimension size up to a power of two, so a shift can be used instead of multiply, which would then require padding the locations (then even though they are all in one memory block, there are meaningless holes).
(Also, I'm pretty sure I've run into the problem that within a procedure, it needs to know which way the 2D array really is, so you may need to declare parameters in a way that lets the compiler know how to code the procedure, eg a[][] is different from *a[]). And obviously you can actually get the pointer from the array of pointers, if that is what you want--which isn't the same thing as the array it points too, of course.
In your code, you have clearly declared a full set of the lower dimension 1D arrays (inside the loop), and you have ALSO declared another 1D array of pointers you use to get to each one without a mulitply--instead by a dereference. So all those things will be in memory. Each 1D array will surely be sequentially layed out in a contiguous block of memory. It is just that it is entirely up to the memory manager as to where those 1D arrays are, relative to each other. (I doubt a compiler is smart enough to actually do the "new" ops at compile time, but it is theoretically possible, and would obviously affect/control the behavior if it did.)
Using the extra array of pointers clearly avoids the multiply ever and always. But it takes more space, and for sequential access actually makes the accesses slower and bigger (the extra dereference) versus maintaining a single pointer and one dereference.
Even if the 1D arrays DO end up contiguous sometimes, you might break it with another thread using the same memory manager, running a "new" while your "new" inside the loop is repeating.
I was poking around with multidimensional arrays today, and i came across blog which distinguishes rectangular arrays, and jagged arrays; usually i would do this on both jagged and rectangular:
Object** obj = new Obj*[5];
for (int i = 0; i < 5; ++i)
{
obj[i] = new Obj[10];
}
but in that blog it was said that if i knew that the 2d array was rectangular then i'm better off allocating the entire thing in a 1d array and use an improvised way of accessing the elements, something like this:
Object* obj = new Obj[rows * cols];
obj[x * cols + y];
//which should have been obj[x][y] on the previous implementation
I somehow have a clue that allocating a continuous memory chunk would be good, but i don't really understand how big of a difference this would make, can somebody explain?
First and less important, when you allocate and free your object you only need to do a single allocation/deallocation.
More important: when you use the array you basically get to trade a multiplication against a memory access. On modern computers, memory access is much much much slower than arithmetic.
That's a bit of a lie, because much of the slowness of memory accesses gets hidden by caches -- regions of memory that are being accessed frequently get stored in fast memory inside, or very near to, the CPU and can be accessed faster. But these caches are of limited size, so (1) if your array isn't being used all the time then the row pointers may not be in the cache and (2) if it is being used all the time then they may be taking up space that could otherwise be used by something else.
Exactly how it works out will depend on the details of your code, though. In many cases it will make no discernible difference one way or the other to the speed of your program. You could try it both ways and benchmark.
[EDITED to add, after being reminded of it by Peter Schneider's comment:] Also, if you allocate each row separately they may end up all being in different parts of memory, which may make your caches a bit less effective -- data gets pulled into cache in chunks, and if you often go from the end of one row to the start of the next then you'll benefit from that. But this is a subtle one; in some cases having your rows equally spaced in memory may actually make the cache perform worse, and if you allocate several rows in succession they may well end up (almost) next to one another in memory anyway, and in any case it probably doesn't matter much unless your rows are quite short.
Allocating a 2D array as a one big chunk permits the compiler to generate a more efficient code than doing it in multiple chunks. At least, there would be one pointer dereferencing operation in one chunk approach. BTW, declaring the 2D array like this:
Object obj[rows][cols];
obj[x][y];
is equivalent to:
Object* obj = new Obj[rows * cols];
obj[x * cols + y];
in terms of speed. But the first one in not dynamic (you need to specify the values of "rows" and "cols" at compile time.
By having one large contiguous chunk of memory, you may get improved performance because there is more chance that memory accesses are already in the cache. This idea is called cache locality. We say the large array has better cache locality. Modern processors have several levels of cache. The lowest level is generally the smallest and the fastest.
It still pays to access the array in meaningful ways. For example, if data is stored in row-major order and you access it in column-major order, you are scattering your memory accesses. At certain sizes, this access pattern will negate the advantages of caching.
Having good cache performance is far preferable to any concerns you may have about multiplying values for indexing.
If one of the dimensions of your array is a compile time constant you can allocate a "truly 2-dimensional array" in one chunk dynamically as well and then index it the usual way. Like all dynamic allocations of arrays, new returns a pointer to the element type. In this case of a 2-dimensional array the elements are in turn arrays -- 1-dimensional arrays. The syntax of the resulting element pointer is a bit cumbersome, mostly because the dereferencing operator*() has a lower precedence than the indexing operator[](). One possible allocation statement could be int (*arr7x11)[11] = new int[7][11];.
Below is a complete example. As you see, the innermost index in the allocation can be a run-time value; it determines the number of elements in the allocated array. The other indices determine the element type (and hence element size as well as overall size) of the dynamically allocated array, which of course must be known to perform the allocation. As discussed above, the elements are themselves arrays, here 1-dimensional arrays of 11 ints.
#include<cstdio>
using namespace std;
int main(int argc, char **argv)
{
constexpr int cols = 11;
int rows = 7;
// overwrite with cmd line arg if present.
// if scanf fails, default is retained.
if(argc >= 2) { sscanf(argv[1], "%d", &rows); }
// The actual allocation of "rows" elements of
// type "array of 'cols' ints". Note the brackets
// around *arr7x11 in order to force operator
// evaluation order. arr7x11 is a pointer to array,
// not an array of pointers.
int (*arr7x11)[cols] = new int[rows][cols];
for(int row = 0; row<rows; row++)
{
for(int col = 0; col<cols; col++)
{
arr7x11[row][col] = (row+1)*1000 + col+1;
}
}
for(int row = 0; row<rows; row++)
{
for(int col = 0; col<cols; col++)
{
printf("%6d", arr7x11[row][col]);
}
putchar('\n');
}
return 0;
}
A sample session:
g++ -std=c++14 -Wall -o 2darrdecl 2darrdecl.cpp && ./2darrdecl 3
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011
I have got a class that represents a 2D map with size 40x40.
I read some data from sensors and create this map with marking cells if my sensors found something and I set value of propablity of finding an obstacle. For example when I am find some obstacle in cell [52,22] I add to its value for example to 10 and add to surrounded cells value 5.
So each cell of this map should keep some little value(propably not bigger). So when a cell is marked three times by sensor, its value will be 30 and surronding cells will have 15.
And my question is, is it worth to use casual array or is it better to use vector even I do not sort this cells, dont remove them etc. I just set its value, and read it later?
Update:
Actually I have in my header file:
using cell = uint8_t;
class Grid {
private:
int xSize, ySize;
cell *cells;
public:
//some methods
}
In cpp :
using cell = uint8_t;
Grid::Grid(int xSize, int ySize) : xSize(xSize), ySize(ySize) {
cells = new cell[xSize * ySize];
for (int i = 0; i < xSize; i++) {
for (int j = 0; j < ySize; j++)
cells[x + y * xSize] = 0;
}
}
Grid::~Grid(void) {
delete cells;
}
inline cell* Grid::getCell(int x, int y) const{
return &cells[x + y * xSize];
}
Does it look fine?
I'd use std::array rather than std::vector.
For fixed size arrays you get the benefits of STL containers with the performance of 'naked' arrays.
http://en.cppreference.com/w/cpp/container/array
A static (C-style) array is possible in your case since the size in known at compile-time.
BUT. It may be interesting to have the data on the heap instead of the stack.
If the array is a global variable, it's ugly an bug-prone (avoid that when you can).
If the array is a local variable (let say, in your main() function), then a stack overflow may occur. Well, it's very unlikely for a 40*40 array of tiny things, but I'd prefer have my data on the heap, to keep things safe, clean, and future-proof.
So, IMHO you should definitely go for the vector, it's fast, clean and readable, and you don't have to worry about stack overflow, memory allocation, etc.
About your data. If you know your values are storable on a single byte, go for it !
An uint8_t (same as unsigned char) can store values from 0 to 255. If it's enough, use it.
using cell = uint8_t; // define a nice name for your data type
std::vector<cell> myMap;
size_t size = 40;
myMap.reserve(size*size);
side note: don't use new[]. Well, you can, but it has no advantages over a vector. You will probably only gain headaches handling memory manually.
Some advantages of using a std::vector is that it can be dynamically allocated (flexible size, can be resized during execution, etc) and can be passed/returned from a function. Since you have a fixed size 40x40 and you know you have one element int in every cell, I don't think it matters that much in your case and I would NOT suggest using a class object std::vector to process this simple task.
And here is a possible duplicate.
Can anyone help with the general format for flattening a 3D array using MPI? I think I can get the array 1 dimensional just by using (i+xlength*j+xlength*ylength*k), but then I have trouble using equations that reference particular cells of the array.
I tried chunking the code into chunks based on how many processors I had, but then when I needed a value that another processor had, I had a hard time. Is there a way to make this easier (and more efficient) using ghost cells or pointer juggling?
You have two options at least. The simpler one is to declare a preprocessor macro that hides the complexity of the index calculation, e.g.:
#define ARR(A,i,j,k) A[(i)*ylength*zlength+(j)*zlength+(k)]
ARR(myarray,i,j,k) = ARR(myarray,i+1,j,k) + ARR(myarray,i,j+1,k) + ...
This is clumsy since the macro will only work with arrays of fixed leading dimensions, e.g. whatever x ylength x zlength.
Much better way to do it is to use so-called dope vectors. Dope vectors are basically indices into the big array. You allocate one big flat chunk of size xlength * ylength * zlength to hold the actual data and then create an index vector (actually a tree in the 3D case). In your case the index has two levels:
top level, consisting of xlength pointers to the
second level, consisting of xlength arrays of pointers, each containing ylength pointers to the beginning of a block of zlength elements in memory.
Let's call the top level pointer array A. Then A[i] is a pointer to a pointer array that describes the i-th slab of data. A[i][j] is the j-th element of the i-th pointer array, which points to data[i][j][0] (if data was a 3D array). Construction of the dope vector works similar to this:
double *data = new double[xlength*ylength*zlength];
double ***A;
A = new double**[xlength];
for (int i = 0; i < xlength; i++)
{
A[i] = new double*[ylength];
for (int j = 0; j < ylength; j++)
A[i][j] = data + i*ylength*zlength + j*zlength;
}
Dope vectors are as easy to use as normal arrays with some special considerations. For example, A[i][j][k] will give you access to the desired element of data. One caveat of dope vectors is that the top level consist of pointers to other pointer tables and not of pointers to the data itself, hence A cannot be used as shortcut for &A[0][0][0], nor A[i] used as shortcut for &A[i][0][0]. Still A[i][j] is equivalent to &A[i][j][0]. Another caveat is that this form of array indexing is slower than normal 3D array indexing since it involves pointer chasing.
Some people tend to allocate a single storage block for both data and dope vectors. They simply place the index at the beginning of the allocated block and the actual data goes after that. The advantage of this method is that disposing the array is as simple as deleting the whole memory block, while disposing dope vectors, created with the code from the previous section, requires multiple invocations of the free operator.
I need to allocate memory for a very large array which represents triangular matrix.
I wrote the following code:
const int max_number_of_particles=20000;
float **dis_vec;
dis_vec = new float **[max_number_of_particles];
for (i = 0; i<max_number_of_particles; i++)
dis_vec[i] = new float *[i];
for (i = 0; i<max_number_of_particles; i++)
for (j = 0; j<i; j++)
dis_vec[i][j] = new float[2];
The problem is that the time needed to do it (to allocate the memory) quickly increases with the increasing size of matrix. Does anyone know better solution for this problem?
Thanks.
Allocate a one dimensional array and convert indices to subscripts and vice versa. One allocation compared to O(N) allocations should be much faster.
EDIT
Specifically, just allocate N(N+1)/2 elements, and when you want to access [r][c] in the original, just access [r*(r+1)/2 + c] instead.
Yes.
First... start with your inner loop.
"new float[2]"
That allocates an array, which I imagine is slower to allocate than a fixed size object that happens to have 2 floats.
struct Float2D {
float a;
float b;
};
x = new Float2D;
that seems better.
But really, forget all that. If you want it fast... just malloc a bunch of floats.
I'd say... let some floats go to waste. Just alloc a plain old 2D array.
float* f = (float*)malloc( max_number_of_particles*max_number_of_particles*2*sizeof(float) );
The only size saving you could get over this, is a 2x size saving by using a triangle instead of a square.
However, I'm pretty damn sure you KILLED that entire "size saving" already, by using "new float[2]", and "new float *[i];". I'm not sure how much the overhead of "new" is, but I imagine it's like malloc except worse. And I think most mallocs have about 8 bytes overhead per allocation.
So what you have already is WORSE than a 2X size lost by allocating a square.
Also, it makes the math simpler. You'd need to do some wierd "Triangular number" math to get the pointer. Something like (n+1)*n/2 or whatever it is :)