What's the proper way to declare and initialize a (large) two dimensional object array in c++? - c++

I need to create a large two dimensional array of objects. I've read some related questions on this site and others regarding multi_array, matrix, vector, etc, but haven't been able to put it together. If you recommend using one of those, please go ahead and translate the code below.
Some considerations:
The array is somewhat large (1300 x 1372).
I might be working with more than one of these at a time.
I'll have to pass it to a function at some point.
Speed is a large factor.
The two approaches that I thought of were:
Pixel pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j].setOn(true);
...
}
}
and
Pixel* pixelArray[1300][1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i][j] = new Pixel();
pixelArray[i][j]->setOn(true);
...
}
}
What's the right approach/syntax here?
Edit:
Several answers have assumed Pixel is small - I left out details about Pixel for convenience, but it's not small/trivial. It has ~20 data members and ~16 member functions.

Your first approach allocates everything on stack, which is otherwise fine, but leads to stack overflow when you try to allocate too much stack. The limit is usually around 8 megabytes on modern OSes, so that allocating arrays of 1300 * 1372 elements on stack is not an option.
Your second approach allocates 1300 * 1372 elements on heap, which is a tremendous load for the allocator, which holds multiple linked lists to chunks of allocted and free memory. Also a bad idea, especially since Pixel seems to be rather small.
What I would do is this:
Pixel* pixelArray = new Pixel[1300 * 1372];
for(int i=0; i<1300; i++) {
for(int j=0; j<1372; j++) {
pixelArray[i * 1372 + j].setOn(true);
...
}
}
This way you allocate one large chunk of memory on heap. Stack is happy and so is the heap allocator.

If you want to pass it to a function, I'd vote against using simple arrays. Consider:
void doWork(Pixel array[][]);
This does not contain any size information. You could pass the size info via separate arguments, but I'd rather use something like std::vector<Pixel>. Of course, this requires that you define an addressing convention (row-major or column-major).
An alternative is std::vector<std::vector<Pixel> >, where each level of vectors is one array dimension. Advantage: The double subscript like in pixelArray[x][y] works, but the creation of such a structure is tedious, copying is more expensive because it happens per contained vector instance instead of with a simple memcpy, and the vectors contained in the top-level vector must not necessarily have the same size.
These are basically your options using the Standard Library. The right solution would be something like std::vector with two dimensions. Numerical libraries and image manipulation libraries come to mind, but matrix and image classes are most likely limited to primitive data types in their elements.
EDIT: Forgot to make it clear that everything above is only arguments. In the end, your personal taste and the context will have to be taken into account. If you're on your own in the project, vector plus defined and documented addressing convention should be good enough. But if you're in a team, and it's likely that someone will disregard the documented convention, the cascaded vector-in-vector structure is probably better because the tedious parts can be implemented by helper functions.

I'm not sure how complicated your Pixel data type is, but maybe something like this will work for you?:
std::fill(array, array+100, 42); // sets every value in the array to 42
Reference:
Initialization of a normal array with one default value

Check out Boost's Generic Image Library.
gray8_image_t pixelArray;
pixelArray.recreate(1300,1372);
for(gray8_image_t::iterator pIt = pixelArray.begin(); pIt != pixelArray.end(); pIt++) {
*pIt = 1;
}

My personal peference would be to use std::vector
typedef std::vector<Pixel> PixelRow;
typedef std::vector<PixelRow> PixelMatrix;
PixelMatrix pixelArray(1300, PixelRow(1372, Pixel(true)));
// ^^^^ ^^^^ ^^^^^^^^^^^
// Size 1 Size 2 default Value

While I wouldn't necessarily make this a struct, this demonstrates how I would approach storing and accessing the data. If Pixel is rather large, you may want to use a std::deque instead.
struct Pixel2D {
Pixel2D (size_t rsz_, size_t csz_) : data(rsz_*csz_), rsz(rsz_), csz(csz_) {
for (size_t r = 0; r < rsz; r++)
for (size_t c = 0; c < csz; c++)
at(r, c).setOn(true);
}
Pixel &at(size_t row, size_t col) {return data.at(row*csz+col);}
std::vector<Pixel> data;
size_t rsz;
size_t csz;
};

Related

Initializing multi-dimensional std::vector without knowing dimensions in advance

Context: I have a class, E (think of it as an organism) and a struct, H (a single cell within the organism). The goal is to estimate some characterizing parameters of E. H has some properties that are stored in multi-dimensional matrices. But, the dimensions depend on the parameters of E.
E reads a set of parameters from an input file, declares some objects of type H, solves each of their problems and fills the matrices, computes a likelihood function, exports it, and moves on to next set of parameters.
What I used to do: I used to declare pointers to pointers to pointers in H's header, and postpone memory allocation to H's constructor. This way, E could pass parameters to constructor, and memory allocation could be done afterwards. I de-allocated memory in the destructor.
Problem: Yesterday, I realized this is bad practice! So, I decided to try vectors. I have read several tutorials. At the moment, the only thing that I can think of is using push_back() as used in the question here. But, I have a feeling that this might not be the best practice (as mentioned by many, e.g., here, under method 3).
There are tens of questions that are tangent to this, but none answers this question directly: What is the best practice if dimensions are not known in advance?
Any suggestion helps: Do I have any other solution? Should I stick to arrays?
Using push_back() should be fine, as long as the vector has reserved the appropriate capacity.
If your only hesitancy to using push_back() is the copy overhead when a reallocation is performed, there is a straightforward way to resolve that issue. You use the reserve() method to inform the vector how many elements the vector will eventually have. So long as
reserve() is called before the vector is used, there will just be a single allocation for the needed amount. Then, push_back() will not incur any reallocations as the vector is being filled.
From the example in your cited source:
std::vector<std::vector<int>> matrix;
matrix.reserve(M);
for (int i = 0; i < M; i++)
{
// construct a vector of ints with the given default value
std::vector<int> v;
v.reserve(N);
for (int j = 0; j < N; j++) {
v.push_back(default_value);
}
// push back above one-dimensional vector
matrix.push_back(v);
}
This particular example is contrived. As #kei2e noted in a comment, the inner v variable could be initialized once on the outside of the loop, and then reused for each row.
However, as noted by #Jarod42 in a comment, the whole thing can actually be accomplished with the appropriate construction of matrix:
std::vector<std::vector<int>> matrix(M, std::vector<int>(N, default_value));
If this initialization task was populating matrix with values from some external source, then the other suggestion by #Jarod42 could be used, to move the element into place to avoid a copy.
std::vector<std::vector<int>> matrix;
matrix.reserve(M);
for (int i = 0; i < M; i++)
{
std::vector<int> v;
v.reserve(N);
for (int j = 0; j < N; j++) {
v.push_back(source_of_value());
}
matrix.push_back(std::move(v));
}

How Physically are Arrays Stored (Specifically with dimensions greater than 2)?

What I Know
I know that arrays int ary[] can be expressed in the equivalent "pointer-to" format: int* ary. However, what I would like to know is that if these two are the same, how physically are arrays stored?
I used to think that the elements are stored next to each other in the ram like so for the array ary:
int size = 5;
int* ary = new int[size];
for (int i = 0; i < size; i++) { ary[i] = i; }
This (I believe) is stored in RAM like: ...[0][1][2][3][4]...
This means we can subsequently replace ary[i] with *(ary + i) by just increment the pointers' location by the index.
The Issue
The issue comes in when I am to define a 2D array in the same way:
int width = 2, height = 2;
Vector** array2D = new Vector*[height]
for (int i = 0; i < width; i++) {
array2D[i] = new Vector[height];
for (int j = 0; j < height; j++) { array2D[i][j] = (i, j); }
}
Given the class Vector is for me to store both x, and y in a single fundamental unit: (x, y).
So how exactly would the above be stored?
It cannot logically be stored like ...[(0, 0)][(1, 0)][(0, 1)][(1, 1)]... as this would mean that the (1, 0)th element is the same as the (0, 1)th.
It cannot also be stored in a 2d array like below, as the physical RAM is a single 1d array of 8 bit numbers:
...[(0, 0)][(1, 0)]...
...[(0, 1)][(1, 1)]...
Neither can it be stored like ...[&(0, 0)][&(1, 0)][&(0, 1)][&(1, 1)]..., given &(x, y) is a pointer to the location of (x, y). This would just mean each memory location would just point to another one, and the value could not be stored anywhere.
Thank you in advanced.
What OP is struggling with a dynamically allocated array of pointers to dynamically allocated arrays. Each of these allocations is its own block of memory sitting somewhere in storage. There is no connection between them other than the logical connection established by the pointers in the outer array.
To try to visualize this say we make
int ** twodee;
twodee = new int*[4];
for (int i = 0; i < 4; i++)
{
twodee[i] = new int[4];
}
and then
int count = 1;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
twodee[i][j] = count++;
}
}
so we should wind up with twodee looking something like
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
right?
Logically, yes. But laid out in memory twodee might look something like this batsmurph crazy mess:
You can't really predict where your memory will be, you're at the mercy of the whatever memory manager handles the allocations and what already in storage where it might have been efficient for your memory to go. This makes laying dynamically-allocated multi-dimensional arrays out in your head almost a waste of time.
And there are a whole lot of things wrong with this when you get down into the guts of what a modern CPU can do for you. The CPU has to hop around a lot, and when it's hopping, it's ability to predict and preload the cache with memory you're likely to need in the near future is compromised. This means your gigahertz computer has to sit around and wait on your megahertz RAM a lot more than it should have to.
Try to avoid this whenever possible by allocating single, contiguous blocks of memory. You may pick up a bit of extra code mapping one dimensional memory over to other dimensions, but you don't lose any CPU time. C++ will have generated all of that mapping math for you as soon as you compiled [i][j] anyway.
The short answer to your question is: It is compiler dependent.
A more helpful answer (I hope) is that you can create 2D arrays that are layed out directly in memory, or you can create "2D arrays" that are actually 1D arrays, some with data, some with pointers to arrays.
There is a convention that the compiler is happy to generate the right kind of code to dereference and/or calculate the address of an element within an array when you use brackets to access an element in the array.
Generally arrays that are known to be 2D at compile time (eg int array2D[a][b]) will be layed out in memory without extra pointers and the compiler knows to multiply AND add to get an address each time there is an access. If your compiler isn't good at optimizing out the multiply, it makes repeated accesses much slower than they can be, so in the old days we often did pointer math ourselves to avoid the multiply if possible.
There is the issue that a compiler might optimize by rounding the lower dimension size up to a power of two, so a shift can be used instead of multiply, which would then require padding the locations (then even though they are all in one memory block, there are meaningless holes).
(Also, I'm pretty sure I've run into the problem that within a procedure, it needs to know which way the 2D array really is, so you may need to declare parameters in a way that lets the compiler know how to code the procedure, eg a[][] is different from *a[]). And obviously you can actually get the pointer from the array of pointers, if that is what you want--which isn't the same thing as the array it points too, of course.
In your code, you have clearly declared a full set of the lower dimension 1D arrays (inside the loop), and you have ALSO declared another 1D array of pointers you use to get to each one without a mulitply--instead by a dereference. So all those things will be in memory. Each 1D array will surely be sequentially layed out in a contiguous block of memory. It is just that it is entirely up to the memory manager as to where those 1D arrays are, relative to each other. (I doubt a compiler is smart enough to actually do the "new" ops at compile time, but it is theoretically possible, and would obviously affect/control the behavior if it did.)
Using the extra array of pointers clearly avoids the multiply ever and always. But it takes more space, and for sequential access actually makes the accesses slower and bigger (the extra dereference) versus maintaining a single pointer and one dereference.
Even if the 1D arrays DO end up contiguous sometimes, you might break it with another thread using the same memory manager, running a "new" while your "new" inside the loop is repeating.

Is it worth to use vector in case of making a map

I have got a class that represents a 2D map with size 40x40.
I read some data from sensors and create this map with marking cells if my sensors found something and I set value of propablity of finding an obstacle. For example when I am find some obstacle in cell [52,22] I add to its value for example to 10 and add to surrounded cells value 5.
So each cell of this map should keep some little value(propably not bigger). So when a cell is marked three times by sensor, its value will be 30 and surronding cells will have 15.
And my question is, is it worth to use casual array or is it better to use vector even I do not sort this cells, dont remove them etc. I just set its value, and read it later?
Update:
Actually I have in my header file:
using cell = uint8_t;
class Grid {
private:
int xSize, ySize;
cell *cells;
public:
//some methods
}
In cpp :
using cell = uint8_t;
Grid::Grid(int xSize, int ySize) : xSize(xSize), ySize(ySize) {
cells = new cell[xSize * ySize];
for (int i = 0; i < xSize; i++) {
for (int j = 0; j < ySize; j++)
cells[x + y * xSize] = 0;
}
}
Grid::~Grid(void) {
delete cells;
}
inline cell* Grid::getCell(int x, int y) const{
return &cells[x + y * xSize];
}
Does it look fine?
I'd use std::array rather than std::vector.
For fixed size arrays you get the benefits of STL containers with the performance of 'naked' arrays.
http://en.cppreference.com/w/cpp/container/array
A static (C-style) array is possible in your case since the size in known at compile-time.
BUT. It may be interesting to have the data on the heap instead of the stack.
If the array is a global variable, it's ugly an bug-prone (avoid that when you can).
If the array is a local variable (let say, in your main() function), then a stack overflow may occur. Well, it's very unlikely for a 40*40 array of tiny things, but I'd prefer have my data on the heap, to keep things safe, clean, and future-proof.
So, IMHO you should definitely go for the vector, it's fast, clean and readable, and you don't have to worry about stack overflow, memory allocation, etc.
About your data. If you know your values are storable on a single byte, go for it !
An uint8_t (same as unsigned char) can store values from 0 to 255. If it's enough, use it.
using cell = uint8_t; // define a nice name for your data type
std::vector<cell> myMap;
size_t size = 40;
myMap.reserve(size*size);
side note: don't use new[]. Well, you can, but it has no advantages over a vector. You will probably only gain headaches handling memory manually.
Some advantages of using a std::vector is that it can be dynamically allocated (flexible size, can be resized during execution, etc) and can be passed/returned from a function. Since you have a fixed size 40x40 and you know you have one element int in every cell, I don't think it matters that much in your case and I would NOT suggest using a class object std::vector to process this simple task.
And here is a possible duplicate.

Incrementally dynamic allocation of memory in C/C++

I have a for-loop that needs to incrementally add columns to a matrix. The size of the rows is known before entering the for-loop, but the size of the columns varies depending on some condition. Following code illustrates the situation:
N = getFeatureVectorSize();
float **fmat; // N rows, dynamic number of cols
for(size_t i = 0; i < getNoObjects(); i++)
{
if(Object[i] == TARGET_OBJECT)
{
float *fv = new float[N];
getObjectFeatureVector(fv);
// How to add fv to fmat?
}
}
Edit 1 This is how I temporary solved my problem:
N = getFeatureVectorSize();
float *fv = new float[N];
float *fmat = NULL;
int col_counter = 0;
for(size_t i = 0; i < getNoObjects(); i++)
{
if(Object[i] == TARGET_OBJECT)
{
getObjectFeatureVector(fv);
fmat = (float *) realloc(fmat, (col_counter+1)*N*sizeof(float));
for(int r=0; r<N; r++) fmat[col_counter*N+r] = fv[r];
col_counter++;
}
}
delete [] fv;
free(fmat);
However, I'm still looking for a way to incrementally allocate memory of a two-dimensional array in C/C++.
To answer your original question
// How to add fv to fmat?
When you use float **fmat you are declaring a pointer to [an array of] pointers. Therefore you have to allocate (and free!) that array before you can use it. Think of it as the row pointer holder:
float **fmat = new float*[N];
Then in your loop you simply do
fmat[i] = fv;
However I suggest you look at the std::vector approach since it won't be significantly slower and will spare you from all those new and delete.
better - use boost::MultiArray as in the top answer here :
How do I best handle dynamic multi-dimensional arrays in C/C++?
trying to dynamically allocate your own matrix type is pain you do not need.
Alternatively - as a low-tech, quick and dirty solution, use a vector of vectors, like this :
C++ vector of vectors
If you want to do this without fancy data structures, you should declare fmat as an array of size N of pointers. For each column, you'll probably have to just guess at a reasonable size to start with. Dynamically allocate an array of that size of floats, and set the appropriate element of fmat to point at that array. If you run out of space (as in, there are more floats to be added to that column), try allocating a new array of twice the previous size. Change the appropriate element of fmat to point to the new array and deallocate the old one.
This technique is a bit ugly and can cause many allocations/deallocations if your predictions aren't good, but I've used it before. If you need dynamic array expansion without using someone else's data structures, this is about as good as you can get.
To elaborate the std::vector approach, this is how it would look like:
// initialize
N = getFeatureVectorSize();
vector<vector<float>> fmat(N);
Now the loop looks the same, you access the rows by saying fmat[i], however there is no pointer to a float. You simply call fmat[i].resize(row_len) to set the size and then assign to it using fmat[i][z] = 1.23.
In your solution I suggest you make getObjectFeatureVector return a vector<float>, so you can just say fmat[i] = getObjectFeatureVector();. Thanks to the C++11 move constructors this will be just as fast as assigning the pointers. Also this solution will solve the problem of getObjectFeatureVector not knowing the size of the array.
Edit: As I understand you don't know the number of columns. No problem:
deque<vector<float>> fmat();
Given this function:
std::vector<float> getObjectFeatureVector();
This is how you add another column:
fmat.push_back(getObjectFeatureVector());
The number of columns is fmat.size() and the number of rows in a column is fmat[i].size().

pointer arithmetic on vectors in c++

i have a std::vector, namely
vector<vector<vector> > > mdata;
i want pass data from my mdata vector to the GSL function
gsl_spline_init(gsl_spline * spline, const double xa[], const double ya[], size_t size);
as ya. i already figured out that i can do things like
gsl_spline_init(spline, &(mgrid.front()), &(mdata[i][j][k].front()), mgrid.size());
this is fine if i want to pass the data from mdata for fixed i,j to gsl_spline_init().
however, now i would need to pass along the first dimension of mdata, so for fixed j,k.
i know that for any two fixed indices, all vectors along the remaining dimensions have the same length, so my vector is a 'regular cube'. so the offset between all the values i need should be the same.
of course i could create a temporary vector
int j = 123;
int k = 321;
vector<double> tmp;
for (int i = 0: i < mdata.size(); i++)
tmp.push_back(mdata[i][j][k]);
gsl_spline_init(spline, &(mgrid.front()), &(tmp.front()), mgrid.size());
but this seems too complicated. perhaps there is a way to achieve my goal with pointer arithmetic?
any help is greatly appreciated :)
You really can't do that without redesigning the array consumer function gsl_spline_init() - it relies on the data passed being a contiguous block of data. This is not the case with you three-level vector - not only it is a cube but also each level has a separate buffer allocated on heap.
This can't be done. Not only with vectors, but even with plain arrays only the last dimension is a contiguous block of data. If gsl_spline_init took an iterator instead of array, you could try to craft some functor to choose appropriate data but I'm not sure it's worth trying. No pointer arithmetic can help you.