How to pass dynamic and static 2d arrays as void pointer? - c++

for a project using Tensorflow's C API I have to pass a void pointer (void*) to a method of Tensorflow. In the examples the void* points to a 2d array, which also worked for me. However now I have array dimensions which do not allow me to use the stack, which is why I have to use a dynamic array or a vector.
I managed to create a dynamic array with the same entries like this:
float** normalizedInputs;//
normalizedInputs = new float* [noCellsPatches];
for(int i = 0; i < noCellsPatches; ++i)
{
normalizedInputs[i] = new float[no_input_sizes];
}
for(int i=0;i<noCellsPatches;i++)
{
for(int j=0;j<no_input_sizes;j++)
{
normalizedInputs[i][j]=inVals.at(no_input_sizes*i+j);
////
////
//normalizedInputs[i][j]=(inVals.at(no_input_sizes*i+j)-inputMeanValues.at(j))/inputVarValues.at(j);
}
}
The function call needing the void* looks like this:
TF_Tensor* input_value = TF_NewTensor(TF_FLOAT,in_dims_arr,2,normalizedInputs,num_bytes_in,&Deallocator, 0);
In argument 4 you see the "normalizedInputs" array. When I run my program now, the calculated results are totally wrong. When I go back to the static array they are right again. What do I have to change?
Greets and thanks in advance!
Edit: I also noted that the TF_Tensor* input_value holds totally different values for both cases (for dynamic it has many 0 and nan entries). Is there a way to solve this by using a std::vector<std::vector<float>>?
Respectively: is there any valid way pass a consecutive dynamic 2d data structure to a function as void*?

In argument 4 you see the "normalizedInputs" array. When I run my program now, the calculated results are totally wrong.
The reason this doesn't work is because you are passing the pointers array as data. In this case you would have to use normalizedInputs[0] or the equivalent more explicit expression &normalizedInputs[0][0]. However there is another bigger problem with this code.
Since you are using new inside a loop you won't have contiguous data which TF_NewTensor expects. There are several solutions to this.
If you really need a 2d-array you can get away with two allocations. One for the pointers and one for the data. Then set the pointers into the data array appropriately.
float **normalizedInputs = new float* [noCellsPatches]; // allocate pointers
normalizedInputs[0] = new float [noCellsPatches*no_input_sizes]; // allocate data
// set pointers
for (int i = 1; i < noCellsPatches; ++i) {
normalizedInputs[i] = &normalizedInputs[i-1][no_input_sizes];
}
Then you can use normalizedInputs[i][j] as normal in C++ and the normalizedInputs[0] or &normalizedInputs[0][0] expression for your TF_NewTensor call.
Here is a mechanically simpler solution, just use a flat 1d array.
float * normalizedInputs = new float [noCellsPatches*no_input_sizes];
You access the i,j-th element by normalizedInputs[i*no_input_sizes+j] and you can use it directly in the TF_NewTensor call without worrying about any addresses.

C++ standard does its best to prevent programmers to use raw arrays, specifically multi-dimensional ones.
From your comment, your statically declared array is declared as:
float normalizedInputs[noCellsPatches][no_input_sizes];
If noCellsPatches and no_input_sizes are both compile time constants you have a correct program declaring a true 2D array. If they are not constants, you are declaring a 2D Variable Length Array... which does not exist in C++ standard. Fortunately, gcc allow it as an extension, but not MSVC nor clang.
If you want to declare a dynamic 2D array with non constant rows and columns, and use gcc, you can do that:
int (*arr0)[cols] = (int (*) [cols]) new int [rows*cols];
(the naive int (*arr0)[cols] = new int [rows][cols]; was rejected by my gcc 5.4.0)
It is definitely not correct C++ but is accepted by gcc and does what is expected.
The trick is that we all know that the size of an array of size n in n times the size of one element. A 2D array of rows rows of columnscolumns if then rows times the size of one row, which is columns when measured in underlying elements (here int). So we ask gcc to allocate a 1D array of the size of the 2D array and take enough liberalities with the strict aliasing rule to process it as the 2D array we wanted. As previously said, it violates the strict aliasing rule and use VLA in C++, but gcc accepts it.

Related

How to convert between flat and multidimensional arrays without copying data?

I've got some data structured as a multi-dimensional array, i.e. double[][], and I need to pass it to a function that expects a single linear array of double[] along with dimensional metadata for the multi-dimensional representation.
For example, I might have a 3 x 5 multidimensional array, which I need to pass as a 15-element flat array along with height and width parameters so that the function knows it is a 3x5 array rather than a 5x3 array.
The function will then return a flat array and size metadata, which I need to use to convert the data back into a multidimensional type.
I believe the data layout in memory is exactly the same for both the flat and multi-dimensional representations; the only difference is how the indexing operations are performed. So I'd like to do the "conversion" with typecasting rather than copying the array values.
What's the most correct and readable way to typecast between multidimensional and flat arrays of the same total size?
I actually know what the dimensions of the multi-dimensional array will be at compile time. The array sizes aren't dynamic.
The most correct way has been given by #Maxim Egorushkin and #ypnos: double *flat = &multi[0][0];. And it will work fine with any decent compiler. But unfortunately is not valid C++ code and invokes Undefined Bahaviour.
The problem is that for an array double multi[N][M]; (N and M being compile time contant expressions), &multi[0][0] is the address of the first element of an array of size M. So it is legal to do pointer arithmetics only up to M. See this other question of mine for more details.
What's the most correct and readable way to typecast between multidimensional and flat arrays of the same total size?
The address of the first array element coincides with the address of the array. You can pass around the address of the first element, no casting is necessary.
I would assume the most popular way to do it is:
double *flat = &multi[0][0];
This is how it is done in C, and you do operate with simple C arrays.
You could also have a look at std::array in your use case (dimensions known at compile time), but that one is not multi-dimensional, so if you would cascade it, you would lose the contiguous layout.
You can use cast to a reference to an array. This require to use some fancy C++ type syntax but in return it allows to use all features that work on arrays, like for each loop.
#include <iostream>
using namespace std;
int main()
{
static constexpr size_t x = 5, y = 3;
unsigned multiArray[x][y];
for (size_t i = 0; i != x; ++i)
for (size_t j = 0; j != y; ++j)
multiArray[i][j] = i * j;
static constexpr size_t z = x * y;
unsigned (&singleArray)[z] = (unsigned (&)[z])multiArray[0][0];
for (const unsigned value : singleArray)
cout << value << ' ';
cout << endl;
return 0;
}
Take into account that this and other methods basing on casts work only with real multi-dimensional arrays. If it is an array of arrays (like unsigned **multiArray;), it isn't allocated in a continuous block of memory and a cast cannot bypass that.

Passing a 3-dimensional variable size array by reference in C++

I've been working off of Passing a 2D array to a C++ function , as well as a few other similar articles. However, I'm running into a problem wherein the array I'm creating has two dimensions of variable size.
The initialization looks like:
int** mulePosition;
mulePosition = new int *[boardSize][boardSize][2];
The function looks like:
int moveMule (int boardSize, int ***mulePosition)
And the references look like
moveMule (boardSize, mulePosition)
Boardsize is defined at the beginning of the function, but may change per execution.
The array, properly sized, would be int [boardSize][boardSize][2].
Either use a plain '3-dimensional' array via
int* mulePosition = new int[boardsize*boardsize*2];
and address its elements calculating the offset from the beginning: mulePosition[a][b][c] is mulePosition[boardSize*2*a + 2*b + c],
or use array of arrays of arrays (which would correspond to your int*** declaration) or better (and simpler) vector of vectors of vectors, although the initialization would be a little more complex (you would need to initialize every array/vector).
Either use a std::vector<std::vector<int>> if boardSize is not a const or std::array<std::array<boardSize>, boardSize> (see Multidimensional std::array for how to initialize the std::array).
That being said, it looks like a good idea to hide this in a class Board which provides a nice interface.

CUDA C++: declare a vector with length

I recently found this won't work in my global CUDA C++ code that I plan to compile and later to be called in Matlab:
int M = 10; float V[M];
or if I were to import M value from the matlab host code.
But this works:
float V[10];
I was told there exists a function called new that I can use to avoid this problem, but I read online and am still quite confused how to use this new function, and it seems only to apply to host code, is that right? If so, it won't apply to my case then, since my host code is in matlab. Is this a way to get around this, so that I don't have to change vector lengths one by one? Thank you!
I don't know anything about MATLAB or CUDA, but your problem is in C++. Arrays declared like that must have sizes fixed at compile-time.
Solution 1: Fix the size
Declare your variable M const. These are equivalent:
int const M = 10;
const int M = 10;
The compiler would then know that it can assume these variables will always have the same value no matter how you run the program.
Solution 2: C-style dynamic allocation
Dynamic allocation with new and delete. Arrays allocated on the abstract section of memory called the "free-store" (rather than on the "stack", like those arrays you have) can determine their sizes on the fly. You use it like this:
float * V = new V[M]; //V is a pointer to some freestore memory
//You use it and pass it like you would a normal array:
V[2] = 5.5;
int x = some_func(V);
//But after you're done, you should manually free the memory
delete [] V; //don't forget the [], since you used [] in the allocation
I don't recommend this, because of the possiblity of forgetting to delete the memory.
Solution 3: Automatic memory management with C++'s vector
In C++, the work of memory management can be hidden behind structures called classes.
#include<vector>
using std::vector;
vector<float> V(M); //V is a vector of floats, with initial size M
//You use it like a normal array
V[2] = 5.5;
//But to pass it as an array, you need to pass a pointer to the first element
int x = some_func(&V[0]); //read as &(V[0]): pass in the address of V[0]
Solution 3b: CUDA-compatible vector
Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
http://docs.nvidia.com/cuda/thrust/#vectors
Conclusion
If you're using fixed sizes, I recommend solution 1. If you're using sizes determined during runtime, I recommend vector.
(By the way, when you pass an ordinary array to a function, you are actually passing a pointer to the first element, NOT the array. The name of the array is automatically converted to a pointer type.)

Incrementally dynamic allocation of memory in C/C++

I have a for-loop that needs to incrementally add columns to a matrix. The size of the rows is known before entering the for-loop, but the size of the columns varies depending on some condition. Following code illustrates the situation:
N = getFeatureVectorSize();
float **fmat; // N rows, dynamic number of cols
for(size_t i = 0; i < getNoObjects(); i++)
{
if(Object[i] == TARGET_OBJECT)
{
float *fv = new float[N];
getObjectFeatureVector(fv);
// How to add fv to fmat?
}
}
Edit 1 This is how I temporary solved my problem:
N = getFeatureVectorSize();
float *fv = new float[N];
float *fmat = NULL;
int col_counter = 0;
for(size_t i = 0; i < getNoObjects(); i++)
{
if(Object[i] == TARGET_OBJECT)
{
getObjectFeatureVector(fv);
fmat = (float *) realloc(fmat, (col_counter+1)*N*sizeof(float));
for(int r=0; r<N; r++) fmat[col_counter*N+r] = fv[r];
col_counter++;
}
}
delete [] fv;
free(fmat);
However, I'm still looking for a way to incrementally allocate memory of a two-dimensional array in C/C++.
To answer your original question
// How to add fv to fmat?
When you use float **fmat you are declaring a pointer to [an array of] pointers. Therefore you have to allocate (and free!) that array before you can use it. Think of it as the row pointer holder:
float **fmat = new float*[N];
Then in your loop you simply do
fmat[i] = fv;
However I suggest you look at the std::vector approach since it won't be significantly slower and will spare you from all those new and delete.
better - use boost::MultiArray as in the top answer here :
How do I best handle dynamic multi-dimensional arrays in C/C++?
trying to dynamically allocate your own matrix type is pain you do not need.
Alternatively - as a low-tech, quick and dirty solution, use a vector of vectors, like this :
C++ vector of vectors
If you want to do this without fancy data structures, you should declare fmat as an array of size N of pointers. For each column, you'll probably have to just guess at a reasonable size to start with. Dynamically allocate an array of that size of floats, and set the appropriate element of fmat to point at that array. If you run out of space (as in, there are more floats to be added to that column), try allocating a new array of twice the previous size. Change the appropriate element of fmat to point to the new array and deallocate the old one.
This technique is a bit ugly and can cause many allocations/deallocations if your predictions aren't good, but I've used it before. If you need dynamic array expansion without using someone else's data structures, this is about as good as you can get.
To elaborate the std::vector approach, this is how it would look like:
// initialize
N = getFeatureVectorSize();
vector<vector<float>> fmat(N);
Now the loop looks the same, you access the rows by saying fmat[i], however there is no pointer to a float. You simply call fmat[i].resize(row_len) to set the size and then assign to it using fmat[i][z] = 1.23.
In your solution I suggest you make getObjectFeatureVector return a vector<float>, so you can just say fmat[i] = getObjectFeatureVector();. Thanks to the C++11 move constructors this will be just as fast as assigning the pointers. Also this solution will solve the problem of getObjectFeatureVector not knowing the size of the array.
Edit: As I understand you don't know the number of columns. No problem:
deque<vector<float>> fmat();
Given this function:
std::vector<float> getObjectFeatureVector();
This is how you add another column:
fmat.push_back(getObjectFeatureVector());
The number of columns is fmat.size() and the number of rows in a column is fmat[i].size().

2D array vs array of arrays

What is the difference between a 2D array and an array of arrays?
I have read comments, such as #Dave's, that seem to differentiate between the two.
This breaks if he's using 2d arrays, or pointer-to-array types, rather than an array of arrays. – Dave
I always thought that both referred to:
int arr_arr[][];
EDIT: #FutureReader, you may wish to see How do I use arrays in C++?
There are four different concepts here.
The two-dimensional array: int arr[][]. It cannot be resized in any direction, and is contiguous. Indexing it is the same as ((int*)arr)[y*w + x]. Must be allocated statically.
The pointer-to array: int (*arr)[]. It can be resized only to add more rows, and is contiguous. Indexing it is the same as ((int*)arr)[y*w + x]. Must be allocated dynamically, but can be freed free(x);
The pointer-to-pointer: int **arr. It can be resized in any direction, and isn't necessarily square. Usually allocated dynamically, not necessarily contiguous, and freeing is dependent on its construction. Indexing is the same as *(*(arr+y)+x).
The array-of-pointers: int *arr[]. It can be resized only to add more columns, and isn't necessarily square. Resizing and freeing also depends on construction. Indexing is the same as *(*(arr+y)+x).
Every one of these can be used arr[y][x], leading to the confusion.
A 2 dimensional array is by definition an array of arrays.
What Dave was saying is that in that context, there are different semantics between the definition of a 2D array like this:
int x[][];
this:
int *x[];
or this:
int **x;
The answer here is a little more subtle.
An array of arrays is defined as such:
int array2[][];
The pointer-to-array types are defined as:
int (*array2)[];
The array-of-pointer types are defined as:
int* array2[];
The compiler treats both of these a little differently, and indeed there is one more option:
int** array2;
A lot of people are taught that these three are identical, but if you know more about compilers you will surely know that difference is small, but it is there. A lot of programs will run if you substitute one for another, but at the compiler and ASM level things are NOT the same. A textbook on C compilers should provide a much more in depth answer.
Also, if one is interested in the implementation of a 2D array there are multiple methods that vary in efficiency, depending on the situation. You can map a 2D array to a 1D array, which ensures spacial locality when dealing with linearized data. You can use the array of arrays if you want the ease of programming, and if you need to manipulate the rows/columns separately. There are certain blocked types and other fancy designs that are cache-smart, but rarely do you need to know the implementation if you the user.
Hope I helped!
The following is a 2D array that can be called an array of arrays:
int AoA[10][10];
The following is a pointer to a pointer that has been set up to function as a 2D array:
int **P2P = malloc(10 * sizeof *P2P);
if(!P2P) exit(1);
for(size_t i = 0; i < 10; i++)
{
P2P[i] = malloc(10 * sizeof **P2P);
if(!P2P[i])
{
for(; i > 0; i--)
free(P2P[i - 1]);
free(P2P);
}
}
Both can be accessed via AoA[x][y] or P2P[x][y], but the two are incompatible. In particular, P2P = AoA is something that newbies sometimes expect to work, but will not - P2P expects to point to pointers, but when AoA decays into a pointer, it is a pointer to an array, specifically int (*)[10], which is not the int ** that P2P is supposed to be.
2d array can include this:
int x[width * height]; // access: x[x + y * width];
From Wikipedia:
For a two-dimensional array, the element with indices i,j would have
address B + c · i + d · j, where the coefficients c and d are the row
and column address increments, respectively.