Flattening a 3D array in c++ for use with MPI - c++

Can anyone help with the general format for flattening a 3D array using MPI? I think I can get the array 1 dimensional just by using (i+xlength*j+xlength*ylength*k), but then I have trouble using equations that reference particular cells of the array.
I tried chunking the code into chunks based on how many processors I had, but then when I needed a value that another processor had, I had a hard time. Is there a way to make this easier (and more efficient) using ghost cells or pointer juggling?

You have two options at least. The simpler one is to declare a preprocessor macro that hides the complexity of the index calculation, e.g.:
#define ARR(A,i,j,k) A[(i)*ylength*zlength+(j)*zlength+(k)]
ARR(myarray,i,j,k) = ARR(myarray,i+1,j,k) + ARR(myarray,i,j+1,k) + ...
This is clumsy since the macro will only work with arrays of fixed leading dimensions, e.g. whatever x ylength x zlength.
Much better way to do it is to use so-called dope vectors. Dope vectors are basically indices into the big array. You allocate one big flat chunk of size xlength * ylength * zlength to hold the actual data and then create an index vector (actually a tree in the 3D case). In your case the index has two levels:
top level, consisting of xlength pointers to the
second level, consisting of xlength arrays of pointers, each containing ylength pointers to the beginning of a block of zlength elements in memory.
Let's call the top level pointer array A. Then A[i] is a pointer to a pointer array that describes the i-th slab of data. A[i][j] is the j-th element of the i-th pointer array, which points to data[i][j][0] (if data was a 3D array). Construction of the dope vector works similar to this:
double *data = new double[xlength*ylength*zlength];
double ***A;
A = new double**[xlength];
for (int i = 0; i < xlength; i++)
{
A[i] = new double*[ylength];
for (int j = 0; j < ylength; j++)
A[i][j] = data + i*ylength*zlength + j*zlength;
}
Dope vectors are as easy to use as normal arrays with some special considerations. For example, A[i][j][k] will give you access to the desired element of data. One caveat of dope vectors is that the top level consist of pointers to other pointer tables and not of pointers to the data itself, hence A cannot be used as shortcut for &A[0][0][0], nor A[i] used as shortcut for &A[i][0][0]. Still A[i][j] is equivalent to &A[i][j][0]. Another caveat is that this form of array indexing is slower than normal 3D array indexing since it involves pointer chasing.
Some people tend to allocate a single storage block for both data and dope vectors. They simply place the index at the beginning of the allocated block and the actual data goes after that. The advantage of this method is that disposing the array is as simple as deleting the whole memory block, while disposing dope vectors, created with the code from the previous section, requires multiple invocations of the free operator.

Related

How Physically are Arrays Stored (Specifically with dimensions greater than 2)?

What I Know
I know that arrays int ary[] can be expressed in the equivalent "pointer-to" format: int* ary. However, what I would like to know is that if these two are the same, how physically are arrays stored?
I used to think that the elements are stored next to each other in the ram like so for the array ary:
int size = 5;
int* ary = new int[size];
for (int i = 0; i < size; i++) { ary[i] = i; }
This (I believe) is stored in RAM like: ...[0][1][2][3][4]...
This means we can subsequently replace ary[i] with *(ary + i) by just increment the pointers' location by the index.
The Issue
The issue comes in when I am to define a 2D array in the same way:
int width = 2, height = 2;
Vector** array2D = new Vector*[height]
for (int i = 0; i < width; i++) {
array2D[i] = new Vector[height];
for (int j = 0; j < height; j++) { array2D[i][j] = (i, j); }
}
Given the class Vector is for me to store both x, and y in a single fundamental unit: (x, y).
So how exactly would the above be stored?
It cannot logically be stored like ...[(0, 0)][(1, 0)][(0, 1)][(1, 1)]... as this would mean that the (1, 0)th element is the same as the (0, 1)th.
It cannot also be stored in a 2d array like below, as the physical RAM is a single 1d array of 8 bit numbers:
...[(0, 0)][(1, 0)]...
...[(0, 1)][(1, 1)]...
Neither can it be stored like ...[&(0, 0)][&(1, 0)][&(0, 1)][&(1, 1)]..., given &(x, y) is a pointer to the location of (x, y). This would just mean each memory location would just point to another one, and the value could not be stored anywhere.
Thank you in advanced.
What OP is struggling with a dynamically allocated array of pointers to dynamically allocated arrays. Each of these allocations is its own block of memory sitting somewhere in storage. There is no connection between them other than the logical connection established by the pointers in the outer array.
To try to visualize this say we make
int ** twodee;
twodee = new int*[4];
for (int i = 0; i < 4; i++)
{
twodee[i] = new int[4];
}
and then
int count = 1;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
twodee[i][j] = count++;
}
}
so we should wind up with twodee looking something like
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
right?
Logically, yes. But laid out in memory twodee might look something like this batsmurph crazy mess:
You can't really predict where your memory will be, you're at the mercy of the whatever memory manager handles the allocations and what already in storage where it might have been efficient for your memory to go. This makes laying dynamically-allocated multi-dimensional arrays out in your head almost a waste of time.
And there are a whole lot of things wrong with this when you get down into the guts of what a modern CPU can do for you. The CPU has to hop around a lot, and when it's hopping, it's ability to predict and preload the cache with memory you're likely to need in the near future is compromised. This means your gigahertz computer has to sit around and wait on your megahertz RAM a lot more than it should have to.
Try to avoid this whenever possible by allocating single, contiguous blocks of memory. You may pick up a bit of extra code mapping one dimensional memory over to other dimensions, but you don't lose any CPU time. C++ will have generated all of that mapping math for you as soon as you compiled [i][j] anyway.
The short answer to your question is: It is compiler dependent.
A more helpful answer (I hope) is that you can create 2D arrays that are layed out directly in memory, or you can create "2D arrays" that are actually 1D arrays, some with data, some with pointers to arrays.
There is a convention that the compiler is happy to generate the right kind of code to dereference and/or calculate the address of an element within an array when you use brackets to access an element in the array.
Generally arrays that are known to be 2D at compile time (eg int array2D[a][b]) will be layed out in memory without extra pointers and the compiler knows to multiply AND add to get an address each time there is an access. If your compiler isn't good at optimizing out the multiply, it makes repeated accesses much slower than they can be, so in the old days we often did pointer math ourselves to avoid the multiply if possible.
There is the issue that a compiler might optimize by rounding the lower dimension size up to a power of two, so a shift can be used instead of multiply, which would then require padding the locations (then even though they are all in one memory block, there are meaningless holes).
(Also, I'm pretty sure I've run into the problem that within a procedure, it needs to know which way the 2D array really is, so you may need to declare parameters in a way that lets the compiler know how to code the procedure, eg a[][] is different from *a[]). And obviously you can actually get the pointer from the array of pointers, if that is what you want--which isn't the same thing as the array it points too, of course.
In your code, you have clearly declared a full set of the lower dimension 1D arrays (inside the loop), and you have ALSO declared another 1D array of pointers you use to get to each one without a mulitply--instead by a dereference. So all those things will be in memory. Each 1D array will surely be sequentially layed out in a contiguous block of memory. It is just that it is entirely up to the memory manager as to where those 1D arrays are, relative to each other. (I doubt a compiler is smart enough to actually do the "new" ops at compile time, but it is theoretically possible, and would obviously affect/control the behavior if it did.)
Using the extra array of pointers clearly avoids the multiply ever and always. But it takes more space, and for sequential access actually makes the accesses slower and bigger (the extra dereference) versus maintaining a single pointer and one dereference.
Even if the 1D arrays DO end up contiguous sometimes, you might break it with another thread using the same memory manager, running a "new" while your "new" inside the loop is repeating.

trim array to elements between i and j

A classic, I'm looking for optimisation here : I have an array of things, and after some processing I know I'm only interested in elements i to j. How to trim my array in the fatset, lightest way, with complete deletions/freeing of memory of elements before i and after j ?
I'm doing mebedded C++, so I may not be able to compile all sorts of library let's say. But std or vector things welcome in a first phase !
I've tried, for array A to be trimmed between i and j, with variable numElms telling me the number of elements in A :
A = &A[i];
numElms = i-j+1;
As it is this yields an incompatibility error. Can that be fixed, and even when fixed, does that free the memory at all for now-unused elements?
A little context : This array is the central data set of my module, and it can be heavy. It will live as long as the module lives. And there's no need to carry dead weight all this time. This is the very first thing that is done - figuring which segment of the data set has to be at all analyzed, and trimming and dumping the rest forever, never to use it again (until the next cycle where we get a fresh array with possibily a compeltely different size).
When asking questions about speed your millage may very based on the size of the array you're working with, but:
Your fastest way will be to not trim the array, just use A[index + i] to find the elements you want.
The lightest way to do this would be to:
Allocate a dynamic array with malloc
Once i and j are found copy that range to the head of the dynamic array
Use realloc to resize the dynamic array to the size j - i + 1
However you have this tagged as C++ not C, so I believe that you're also interested in readability and the required programming investment, not raw speed or weight. If this is true then I would suggest use of a vector or deque.
Given vector<thing> A or a deque<thing> A you could do:
A.erase(cbegin(A), next(cbegin(A), i));
A.resize(j - i + 1);
There is no way to change aloocated memory block size in standard C++ (unless you have POD data — in this case C facilities like realloc could be used). The only way to trim an array is to allocate new array. copy/move needed elements and destroy old array.
You can do it manually, or using vectors:
int* array = new int[10]{0,1,2,3,4,5,6,7,8,9};
std::vector<int> vec {0,1,2,3,4,5,6,7,8,9};
//We want only elements 3-5
{
int* new_array = new int[3];
std::copy(array + 3, array + 6, new_array);
delete[] array;
array = new_array;
}
vec = std::vector<int>(vec.begin()+3, vec.begin()+6);
If you are using C++11, both approaches should have same perfomance.
If you only want to remove extra elements and do not really want to release memory (for example you might want to add more elements later) you can follow NathanOliver link
However, you should consider: do you really need that memory freed immideately? Do you need to move elements right now? Will you array live for such long time that this memory would be lost for your program completely? Maybe you need a range or perharps a view to the array content? In many cases you can store two pointers (or pointer and size) to denote your "new" array, while keeping old one to be released all at once.

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

CUDA documentation recommends the use of cudaMemCpy2D() for 2D arrays (and similarly cudaMemCpy3D() for 3D arrays) instead of cudaMemCpy() for better performance as the former allocates device memory more appropriately. On the other hand, all cudaMemCpy functions, just like memcpy(), require contiguous allocation of memory.
This is all fine if I create my (host) array as, for example, float myArray[h][w];. However, it most likely will not work if I use something like:
float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
myArray2[i] = new float[w];
}
This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use cudaMemCpy() and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way.
Is there a better way to handle this situation? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use cudaMemCpy2D()?
P.S: I couldn't find the answer to this question the following previous similar posts:
Allocate 2D array with cudaMallocPitch and copying with
cudaMemcpy2D
Assigning memory for contiguous 2D array
Dynamic 2d Array non contiguous memory c++ (The second answer in
this one is rather puzzling.)
Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows.
float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
myArray2[i] = &bigArray[i * w];
}
Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA.

C++ array 'matrix' swap (memory leak)

I'm creating a block based engine and I was working on infinite loading. I was testing some code, but soon I noticed some troubles. At first I was using an std::unordered_map, storing xz as key and ChunkContainer* as value. I store this as pointers (with new) because it's such a big object it can't be all stored on the stack. It's size is: CHUNK_SIZE(32)^3 * WORLD_HEIGHT(8 amount of chunks in height) * 4(block bytes) = 1048576 bytes. (And that * 225 makes up my world.)
Then I swapped to using a big array instead of the std::unordered_map so I could achieve faster read speed. So I needed to swap the loading code. I came up with this code,
but I'm having some issues with a memory leak created when using this code:
for(int z = 0; z < size; z++){
temp = loadedChunkContainers[(size-1)*size + z]; //Store the last container in a temp var
for(int x = size-1; x > 0; x--){
loadedChunkContainers[x * size + z] = loadedChunkContainers[(x-1) * size + z]; //Move all containers 1 to the right
}
int cx = temp.getX() - size;
int cz = temp.getZ();
temp.move(cx, cz);//Move the container internally
loadedChunkContainers[z] = temp; //put the container back into the array, but this time on the first row
buildQueue.push_back(&loadedChunkContainers[z]);
}
temp is a global variable because I can't store it locally because it will overflow the stack. I also can't use swap as it will also overflow the stack.
Should I even use this code? It works, but It's quite a slow way in the first place. Would there be another way to have the fastest read access while still being able to swap values (without memory leaks)?
An std::unordered_map will never store its elements "on the stack", so this is not a good reason to store pointers. You can basically forget all of your stack-related worries, here. With that goes away all your pointer troubles. Just store the elements directly and be done with it!
If you have measured the std::unordered_map version of your code and found that the (very fast) hash lookup is prohibitively slow in your application, by all means stick with your sparse array, but make it a std::vector so that elements are dynamically allocated [for you]. Then everything I said in the first paragraph still applies. :)

Dynamic Memory allocation in 2-D Array

I have to create dynamic 2d array.I was trying with the code mentioned below:
Suppose a=512 and b =102.So Now the 2-D Array created is ary[512][102].
Now I am Creating a pointer to base location i.e int *ptr=&(ary[0][0]);
Now, if I am giving pointer the offset 102 i.e ptr+=102, it should point to &(ary[1][0]) but it doesn't point to.
If given offset of 104 then only it points to &(ary[1][0]). Why it requires extra 2 Offset????
Code Snippet:
int** ary;
ary= new int*[a];
for(int i = 0; i < a; ++i)
ary[i] = new int[b];
your 2d array is dynamically allocated as your code suggested, so the address of this 2d arrays is not in sequence, thus &(ary[0][0])+102, you can't grantee the pointer returned is the next row. So in this case, you should just use &(ary[1][0])
Unlike in the static allocation of 2D arrays, the addresses are not consecutive for the whole array when you use multiple new. Instead when you call for new() to create rows of a 2D array each new returns completely unrelated addresses which are allocated from heap. So you have to use ary[n] to access elements of (n + 1)th row and ary[n+1] to access elements of (n+2)th row and so on.
In this case, your 2-D array is not necessarily linear in the actual memory. So you can not write a program assumming that the memory is linear.
The reason for 104 pointing to ary[1][0] could be the boundary alignment. The run time system chooses to use 104 bytes for your ary[i], rather than 102.