2D array vs array of arrays - c++

What is the difference between a 2D array and an array of arrays?
I have read comments, such as #Dave's, that seem to differentiate between the two.
This breaks if he's using 2d arrays, or pointer-to-array types, rather than an array of arrays. – Dave
I always thought that both referred to:
int arr_arr[][];
EDIT: #FutureReader, you may wish to see How do I use arrays in C++?

There are four different concepts here.
The two-dimensional array: int arr[][]. It cannot be resized in any direction, and is contiguous. Indexing it is the same as ((int*)arr)[y*w + x]. Must be allocated statically.
The pointer-to array: int (*arr)[]. It can be resized only to add more rows, and is contiguous. Indexing it is the same as ((int*)arr)[y*w + x]. Must be allocated dynamically, but can be freed free(x);
The pointer-to-pointer: int **arr. It can be resized in any direction, and isn't necessarily square. Usually allocated dynamically, not necessarily contiguous, and freeing is dependent on its construction. Indexing is the same as *(*(arr+y)+x).
The array-of-pointers: int *arr[]. It can be resized only to add more columns, and isn't necessarily square. Resizing and freeing also depends on construction. Indexing is the same as *(*(arr+y)+x).
Every one of these can be used arr[y][x], leading to the confusion.

A 2 dimensional array is by definition an array of arrays.
What Dave was saying is that in that context, there are different semantics between the definition of a 2D array like this:
int x[][];
this:
int *x[];
or this:
int **x;

The answer here is a little more subtle.
An array of arrays is defined as such:
int array2[][];
The pointer-to-array types are defined as:
int (*array2)[];
The array-of-pointer types are defined as:
int* array2[];
The compiler treats both of these a little differently, and indeed there is one more option:
int** array2;
A lot of people are taught that these three are identical, but if you know more about compilers you will surely know that difference is small, but it is there. A lot of programs will run if you substitute one for another, but at the compiler and ASM level things are NOT the same. A textbook on C compilers should provide a much more in depth answer.
Also, if one is interested in the implementation of a 2D array there are multiple methods that vary in efficiency, depending on the situation. You can map a 2D array to a 1D array, which ensures spacial locality when dealing with linearized data. You can use the array of arrays if you want the ease of programming, and if you need to manipulate the rows/columns separately. There are certain blocked types and other fancy designs that are cache-smart, but rarely do you need to know the implementation if you the user.
Hope I helped!

The following is a 2D array that can be called an array of arrays:
int AoA[10][10];
The following is a pointer to a pointer that has been set up to function as a 2D array:
int **P2P = malloc(10 * sizeof *P2P);
if(!P2P) exit(1);
for(size_t i = 0; i < 10; i++)
{
P2P[i] = malloc(10 * sizeof **P2P);
if(!P2P[i])
{
for(; i > 0; i--)
free(P2P[i - 1]);
free(P2P);
}
}
Both can be accessed via AoA[x][y] or P2P[x][y], but the two are incompatible. In particular, P2P = AoA is something that newbies sometimes expect to work, but will not - P2P expects to point to pointers, but when AoA decays into a pointer, it is a pointer to an array, specifically int (*)[10], which is not the int ** that P2P is supposed to be.

2d array can include this:
int x[width * height]; // access: x[x + y * width];
From Wikipedia:
For a two-dimensional array, the element with indices i,j would have
address B + c · i + d · j, where the coefficients c and d are the row
and column address increments, respectively.

Related

How to convert between flat and multidimensional arrays without copying data?

I've got some data structured as a multi-dimensional array, i.e. double[][], and I need to pass it to a function that expects a single linear array of double[] along with dimensional metadata for the multi-dimensional representation.
For example, I might have a 3 x 5 multidimensional array, which I need to pass as a 15-element flat array along with height and width parameters so that the function knows it is a 3x5 array rather than a 5x3 array.
The function will then return a flat array and size metadata, which I need to use to convert the data back into a multidimensional type.
I believe the data layout in memory is exactly the same for both the flat and multi-dimensional representations; the only difference is how the indexing operations are performed. So I'd like to do the "conversion" with typecasting rather than copying the array values.
What's the most correct and readable way to typecast between multidimensional and flat arrays of the same total size?
I actually know what the dimensions of the multi-dimensional array will be at compile time. The array sizes aren't dynamic.
The most correct way has been given by #Maxim Egorushkin and #ypnos: double *flat = &multi[0][0];. And it will work fine with any decent compiler. But unfortunately is not valid C++ code and invokes Undefined Bahaviour.
The problem is that for an array double multi[N][M]; (N and M being compile time contant expressions), &multi[0][0] is the address of the first element of an array of size M. So it is legal to do pointer arithmetics only up to M. See this other question of mine for more details.
What's the most correct and readable way to typecast between multidimensional and flat arrays of the same total size?
The address of the first array element coincides with the address of the array. You can pass around the address of the first element, no casting is necessary.
I would assume the most popular way to do it is:
double *flat = &multi[0][0];
This is how it is done in C, and you do operate with simple C arrays.
You could also have a look at std::array in your use case (dimensions known at compile time), but that one is not multi-dimensional, so if you would cascade it, you would lose the contiguous layout.
You can use cast to a reference to an array. This require to use some fancy C++ type syntax but in return it allows to use all features that work on arrays, like for each loop.
#include <iostream>
using namespace std;
int main()
{
static constexpr size_t x = 5, y = 3;
unsigned multiArray[x][y];
for (size_t i = 0; i != x; ++i)
for (size_t j = 0; j != y; ++j)
multiArray[i][j] = i * j;
static constexpr size_t z = x * y;
unsigned (&singleArray)[z] = (unsigned (&)[z])multiArray[0][0];
for (const unsigned value : singleArray)
cout << value << ' ';
cout << endl;
return 0;
}
Take into account that this and other methods basing on casts work only with real multi-dimensional arrays. If it is an array of arrays (like unsigned **multiArray;), it isn't allocated in a continuous block of memory and a cast cannot bypass that.

How to pass dynamic and static 2d arrays as void pointer?

for a project using Tensorflow's C API I have to pass a void pointer (void*) to a method of Tensorflow. In the examples the void* points to a 2d array, which also worked for me. However now I have array dimensions which do not allow me to use the stack, which is why I have to use a dynamic array or a vector.
I managed to create a dynamic array with the same entries like this:
float** normalizedInputs;//
normalizedInputs = new float* [noCellsPatches];
for(int i = 0; i < noCellsPatches; ++i)
{
normalizedInputs[i] = new float[no_input_sizes];
}
for(int i=0;i<noCellsPatches;i++)
{
for(int j=0;j<no_input_sizes;j++)
{
normalizedInputs[i][j]=inVals.at(no_input_sizes*i+j);
////
////
//normalizedInputs[i][j]=(inVals.at(no_input_sizes*i+j)-inputMeanValues.at(j))/inputVarValues.at(j);
}
}
The function call needing the void* looks like this:
TF_Tensor* input_value = TF_NewTensor(TF_FLOAT,in_dims_arr,2,normalizedInputs,num_bytes_in,&Deallocator, 0);
In argument 4 you see the "normalizedInputs" array. When I run my program now, the calculated results are totally wrong. When I go back to the static array they are right again. What do I have to change?
Greets and thanks in advance!
Edit: I also noted that the TF_Tensor* input_value holds totally different values for both cases (for dynamic it has many 0 and nan entries). Is there a way to solve this by using a std::vector<std::vector<float>>?
Respectively: is there any valid way pass a consecutive dynamic 2d data structure to a function as void*?
In argument 4 you see the "normalizedInputs" array. When I run my program now, the calculated results are totally wrong.
The reason this doesn't work is because you are passing the pointers array as data. In this case you would have to use normalizedInputs[0] or the equivalent more explicit expression &normalizedInputs[0][0]. However there is another bigger problem with this code.
Since you are using new inside a loop you won't have contiguous data which TF_NewTensor expects. There are several solutions to this.
If you really need a 2d-array you can get away with two allocations. One for the pointers and one for the data. Then set the pointers into the data array appropriately.
float **normalizedInputs = new float* [noCellsPatches]; // allocate pointers
normalizedInputs[0] = new float [noCellsPatches*no_input_sizes]; // allocate data
// set pointers
for (int i = 1; i < noCellsPatches; ++i) {
normalizedInputs[i] = &normalizedInputs[i-1][no_input_sizes];
}
Then you can use normalizedInputs[i][j] as normal in C++ and the normalizedInputs[0] or &normalizedInputs[0][0] expression for your TF_NewTensor call.
Here is a mechanically simpler solution, just use a flat 1d array.
float * normalizedInputs = new float [noCellsPatches*no_input_sizes];
You access the i,j-th element by normalizedInputs[i*no_input_sizes+j] and you can use it directly in the TF_NewTensor call without worrying about any addresses.
C++ standard does its best to prevent programmers to use raw arrays, specifically multi-dimensional ones.
From your comment, your statically declared array is declared as:
float normalizedInputs[noCellsPatches][no_input_sizes];
If noCellsPatches and no_input_sizes are both compile time constants you have a correct program declaring a true 2D array. If they are not constants, you are declaring a 2D Variable Length Array... which does not exist in C++ standard. Fortunately, gcc allow it as an extension, but not MSVC nor clang.
If you want to declare a dynamic 2D array with non constant rows and columns, and use gcc, you can do that:
int (*arr0)[cols] = (int (*) [cols]) new int [rows*cols];
(the naive int (*arr0)[cols] = new int [rows][cols]; was rejected by my gcc 5.4.0)
It is definitely not correct C++ but is accepted by gcc and does what is expected.
The trick is that we all know that the size of an array of size n in n times the size of one element. A 2D array of rows rows of columnscolumns if then rows times the size of one row, which is columns when measured in underlying elements (here int). So we ask gcc to allocate a 1D array of the size of the 2D array and take enough liberalities with the strict aliasing rule to process it as the 2D array we wanted. As previously said, it violates the strict aliasing rule and use VLA in C++, but gcc accepts it.

Passing a 3-dimensional variable size array by reference in C++

I've been working off of Passing a 2D array to a C++ function , as well as a few other similar articles. However, I'm running into a problem wherein the array I'm creating has two dimensions of variable size.
The initialization looks like:
int** mulePosition;
mulePosition = new int *[boardSize][boardSize][2];
The function looks like:
int moveMule (int boardSize, int ***mulePosition)
And the references look like
moveMule (boardSize, mulePosition)
Boardsize is defined at the beginning of the function, but may change per execution.
The array, properly sized, would be int [boardSize][boardSize][2].
Either use a plain '3-dimensional' array via
int* mulePosition = new int[boardsize*boardsize*2];
and address its elements calculating the offset from the beginning: mulePosition[a][b][c] is mulePosition[boardSize*2*a + 2*b + c],
or use array of arrays of arrays (which would correspond to your int*** declaration) or better (and simpler) vector of vectors of vectors, although the initialization would be a little more complex (you would need to initialize every array/vector).
Either use a std::vector<std::vector<int>> if boardSize is not a const or std::array<std::array<boardSize>, boardSize> (see Multidimensional std::array for how to initialize the std::array).
That being said, it looks like a good idea to hide this in a class Board which provides a nice interface.

Passing a 2-D array the column is mandatory

While passing a 2-Dimensional array we have to specify the the column.
eg:
void funtion1(a[])// works
{
}
void function2(a[][4])//works
{
}
void function3(a[][])//doesn't work
{
}
What could be the possible reasons that the function3 is considered an incorrect definition.
Is there a different way to define function3 so that we can leave both row and column blank.
Reading some replies:
Can you explain how x[n] and x[] are different?. I guess the former represents a specific array position and the latter is unspecified array. More explanation will be deeply appreciated.
You cannot pass a 2D array without specifying the second dimension, since otherwise, parameter "a" will decay to a pointer, the compiler needs to know how long the second dimension is to calculate the offsets (reason is that 2D array is stored as 1D in memory). Therefore, compiler must know size of *a, which requires that the second dimension be given. You can use vector of vectors to replace 2D array.
with void function2(a[][4]) it knows that there are 4 elements in each row. With void function3(a[][]) it doesn't know, so it can't calculate where a[i] should be.
Use a vector, since it's c++
C style arrays don't work the way you think. Think of them as a block of memory, and the dimensions tell the compiler how far to offset from the original address.
int a[] is basically a pointer and every element is an int, which means a[1] is equivalent of *(a + 1), where each 1 is sizeof(int) bytes. There's no limit or end (simplistically speaking) of the a array. You could use a[999999] and the compiler won't care.
int a[][4] is similar, but now the compiler knows that each row is 4*sizeof(int). So a[2][1] is *(a + 2*4 + 1)
int a[][] on the other hand, is an incomplete type, so to the compiler, a[2][1] is *(a + 2*?? + 1), and who know what ?? means.
Don't use int **a, that means an array of pointers, which is most likely what you don't want.
As some have said, with STL, use vectors instead. It's pretty safe to use std::vector<std::vector<int> > a. You'll still be able to get a[2][1].
And while you're at it, use references instead, const std::vector<std::vector<int> > &a. That way, you're not copying the whole array with each function call.
how does compiler calculate address of a[x][y]?
well simply:
address_of_a+(x*SECOND_SIZE+y)
imagine now that you want
a[2][3]
compiler has to computes:
address_of_a+(2*SECOND_SIZE+3)
if compiler doesn't know SECOND_SIZE how it can compute this?
you have to give it to him explicitly. you are using a[2][1], a[100][13] in your code, so compiler has to know how to compute addresses of these objects.
see more here

C++ Array Size Initialization

I am trying to define a class. This is what I have:
enum Tile {
GRASS, DIRT, TREE
};
class Board {
public:
int toShow;
int toStore;
Tile* shown;
Board (int tsh, int tst);
~Board();
};
Board::Board (int tsh, int tst) {
toShow = tsh;
toStore = tst;
shown = new Tile[toStore][toStore]; //ERROR!
}
Board::~Board () {
delete [] shown;
}
However, I get the following error on the indicated line -- Only the first dimension of an allocated array can have dynamic size.
What I want to be able to do is rather then hard code it, pass the parameter toShow to the constructor and create a two-dimensional array which only contains the elements that I want to be shown.
However, my understanding is that when the constructor is called, and shown is initialized, its size will be initialized to the current value of toStore. Then even if toStore changes, the memory has already been allocated to the array shown and therefore the size should not change. However, the compiler doesn't like this.
Is there a genuine misconception in how I'm understanding this? Does anyone have a fix which will do what I want it to without having to hard code in the size of the array?
Use C++'s containers, that's what they're there for.
class Board {
public:
int toShow;
int toStore;
std::vector<std::vector<Tile> > shown;
Board (int tsh, int tst) :
toShow(tsh), toStore(tst),
shown(tst, std::vector<Tile>(tst))
{
};
};
...
Board board(4, 5);
board.shown[1][3] = DIRT;
You can use a one dimensional array. You should know that bi-dimensional arrays are treated as single dimensional arrays and when you want a variable size you can use this pattern. for example :
int arr1[ 3 ][ 4 ] ;
int arr2[ 3 * 4 ] ;
They are the same and their members can be accessed via different notations :
int x = arr1[ 1 ][ 2 ] ;
int x = arr2[ 1 * 4 + 2 ] ;
Of course arr1 can be seen as a 3 rows x 4 cols matrix and 3 cols x 4 rows matrix.
With this type of multi-dimensional arrays you can access them via a single pointer but you have to know about its internal structure. They are one dimensional arrays which they are treated as 2 or 3 dimensional.
Let me tell you about what I did when I needed a 3D array. It might be an overkeill, but it's rather cool and might help, although it's a whole different way of doing what you want.
I needed to represent a 3D box of cells. Only a part of the cells were marked and were of any interest. There were two options to do that. The first one, declare a static 3D array with the largest possible size, and use a portion of it if one or more of the dimensions of the box were smaller than the corresponding dimensions in the static array.
The second way was to allocate and deallocate the array dynamically. It's quite an effort with a 2D array, not to mention 3D.
The array solution defined a 3D array with the cells of interest having a special value. Most of the allocated memory was unnecessary.
I dumped both ways. Instead I turned to STL map.
I define a struct called Cell with 3 member variables, x, y, z which represented coordinates. The constructor Cell(x, y, z) was used to create such a Cell easily.
I defined the operator < upon it to make it orderable. Then I defined a map<Cell, Data>. Adding a marked cell with coordinates x, y, z to the map was done simply by
my_map[Cell(x, y, z)] = my_data;
This way I didn't need to maintain 3D array memory management, and also only the required cells were actually created.
Checking if a call at coordinate x0, y0, z0 exists (or marked) was done by:
map<Cell, Data>::iterator it = my_map.find(Cell(x0, y0, z0));
if (it != my_map.end()) { ...
And referencing the cell's data at coordinat x0, y0, z0 was done by:
my_map[Cell(x0, y0, z0)]...
This methid might seem odd, but it is robust, self managed regarding to memory, and safe - no boundary overrun.
First, if you want to refer to a 2D array, you have to declare a pointer to a pointer:
Tile **shown;
Then, have a look at the error message. It's proper, comprehensible English. It says what the error is. Only the first dimension of an allocated array can have dynamic size. means -- guess what, that only the first dimension of an allocated array can have dynamic size. That's it. If you want your matrix to have multiple dynamic dimensions, use the C-style malloc() to maintain the pointers to pointers, or, which is even better for C++, use vector, made exactly for this purpose.
It's good to understand a little of how memory allocation works in C and C++.
char x[10];
The compiler will allocate ten bytes and remember the starting address, perhaps it's at 0x12 (in real life probably a much larger number.)
x[3] = 'a';
Now the compiler looks up x[3] by taking the starting address of x, which is 0x12, and adding 3*sizeof(char), which brings to 0x15. So x[3] lives at 0x15.
This simple addition-arithmetic is how memory inside an array is accessed. For two dimensional arrays the math is only slightly trickier.
char xy[20][30];
Allocates 600 bytes starting at some place, maybe it's 0x2000. Now accessing
xy[4][3];
Requires some math... xy[0][0], xy[0][1], xy[0][2]... are going to occupy the first 30 bytes. Then xy[1][0], xy[1][1], ... are going to occupy bytes 31 to 60. It's multiplication: xy[a][b] will be located at the address of xy, plus a*20, plus b.
This is only possible if the compiler knows how long the first dimension is - you'll notice the compiler needed to know the number "20" to do this math.
Now function calls. The compiler little cares whether you call
foo(int* x);
or
foo(int[] x);
Because in either case it's an array of bytes, you pass the starting address, and the compiler can do the additional to find the place at which x[3] or whatever lives. But in the case of a two dimensional array, the compiler needs to know that magic number 20 in the above example. So
foo(int[][] xy) {
xy[3][4] = 5; //compiler has NO idea where this lives
//because it doesn't know the row dimension of xy!
}
But if you specify
foo(int[][30] xy)
Compiler knows what to do. For reasons I can't remember it's often considered better practice to pass it as a double pointer, but this is what's going on at the technical level.