Create an Array of vectors in C++ - c++

I want to create a distance matrix of a big dataset, and only want to store the 'close' enough elements. The code reads like this
vector<double> * D;
D = (vector<double> *) malloc(dim *sizeof(vector<double>) ) ;
for(i=0;i<dim;i++){
for(j=i+1;j<dim;j++){
dx = s[j][0] - s[i][0];
dy = s[j][1] - s[i][1];
d = sqrt( dx*dx + dy*dy );
if(d < MAX_DISTANCE){
D[i].push_back(d);
D[j].push_back(d);
}
}
which gives me segmentation fault. I guess I have not defined the array of vector correctly. How do I get around this ?

In C++ you should never allocate object (or arrays of objects) using malloc. While malloc is good at allocating memory, that's all it does. What it doesn't do is calling constructors which means all your vector objects are uninitialized. Using them will lead to undefined behavior.
If you want to allocate an array dynamically you must use new[]. Or, better yet, use std::vector (yes, you can have a vector of vectors, std::vector<std::vector<double>> is fine).
Using the right vector constructor you can initialize a vector of a specific size:
// Create the outer vector containing `dim` elements
std::vector<std::vector<double>> D(dim);
After the above you can use the existing loops like before.

Related

How to init a double**?

I need to init/use a double ** (decleared in my header):
double **pSamples;
allocating (during the time) a matrix of NxM, where N and M are get from two function:
const unsigned int N = myObect.GetN();
const unsigned int M = myObect.GetM();
For what I learnt from heap and dynamic allocation, I need keyword new, or use STL vector, which will manage automatically allocate/free within the heap.
So I tried with this code:
vector<double> samplesContainer(M);
*pSamples[N] = { samplesContainer.data() };
but it still says I need a constant value? How would you allocate/manage (during the time) this matrix?
The old fashioned way of initializing a pointer to a pointer, is correctly enough with the new operator, you would first initialize the the first array which is a pointer to doubles (double*), then you would iterate through that allocating the next pointer to doubles (double*).
double** pSamples = new double*[N];
for (int i = 0; i < N; ++i) {
pSambles[i] = new double[M];
}
The first new allocates an array of double pointers, each pointer is then assigned to the array of pointers allocated by the second new.
That is the old way of doing it, remember to release the memory again at some point using the delete [] operator. However C++ provide a lot better management of sequential memory, such as a vector which you can use as either a vector of vectors, or simply a single vector capable of holding the entire buffer.
If you go the vector of vector way, then you have a declaration like this:
vector<vector<double>> samples;
And you will be able to reference the elements using the .at function as such: samples.at(2).at(0) or using the array operator: samples[2][0].
Alternatively you could create a single vector with enough storage to hold the multidimensional array by simply sizing it to be N * M elements large. However this method is difficult to resize, and honestly you could have done that with new as well: new double[N * M], however this would give you a double* and not a double**.
Use RAII for resource management:
std::vector<std::vector<double>> samplesContainer(M, std::vector<double>(N));
then for compatibility
std::vector<double*> ptrs(M);
for (std::size_t i = 0; i != M; ++i) {
ptrs[i] = samplesContainer[i].data();
}
And so pass ptrs.data() for double**.
samplesContainer.data() returns double*, bur expression *pSamples[N] is of type double, not double*. I think you wanted pSamples[N].
pSamples[N] = samplesContainer.data();

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

CUDA documentation recommends the use of cudaMemCpy2D() for 2D arrays (and similarly cudaMemCpy3D() for 3D arrays) instead of cudaMemCpy() for better performance as the former allocates device memory more appropriately. On the other hand, all cudaMemCpy functions, just like memcpy(), require contiguous allocation of memory.
This is all fine if I create my (host) array as, for example, float myArray[h][w];. However, it most likely will not work if I use something like:
float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
myArray2[i] = new float[w];
}
This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use cudaMemCpy() and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way.
Is there a better way to handle this situation? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use cudaMemCpy2D()?
P.S: I couldn't find the answer to this question the following previous similar posts:
Allocate 2D array with cudaMallocPitch and copying with
cudaMemcpy2D
Assigning memory for contiguous 2D array
Dynamic 2d Array non contiguous memory c++ (The second answer in
this one is rather puzzling.)
Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows.
float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
myArray2[i] = &bigArray[i * w];
}
Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA.

Error: Deallocating a 2D array

I am developing a program in which one of the task is to read points (x,y and z) from a text file and then store them in an array. Now the text file may contain 10^2 or even 10^6 points, depending upon the text file user selects. Therefore I am defining a dynamic array.
For allocating a dynamic 2D array, I wrote as below and it works fine:
const int array_size = 100000;
float** array = new float* [array_size];
for(int i = 0; i < array_size; ++i){
ary[i] = new float[2]; // 0,1,2 being the columns for x,y,z co-ordinates
}
After the points are saved in the array, I write the following to deallocate the unallocated memory :
for (int i = 0; i < array_size; i++){
delete [] array[i];
}
delete [] array;
and then my program stops working and shows "Project.exe stopped working".
If I don't deallocate, the program works just fine.
In your comment you say 0,1,2 being the columns for x,y,z co-ordinates, if that's the case, you need to be allocating as float[3]. When you allocate an array of float[N], you are allocating a chunk of the memory of the size N * sizeof(float), and you will index them in the array from 1 to N - 1. Therefore if you need indeces 0,1,2, you will need to allocate a memory of the size 3 * sizeof(float), which makes it float[3].
Because other than that, I can compile and run the code without an error. If you fix it and still get an error, it might be your compiler problem. Then try to decrease 100000 to a small number and try again.
You are saying that you are trying to implement a dynamic array, this is what std::vector does and I would highly recommend that you use it. This way you are using something from the standard library that's extremely well tested and you won't run into issues by essentially trying to roll your own version of std::vector. Additionally this approach wraps memory better as it uses RAII which leverages the language to solve a lot of memory management issues. This has other benefits too like making your code more exception safe.
Also if you are storing x,y,z coordinates consider using a struct or a tuple, I think that enhances readability a lot. You can typedef the coordinate type too. Something like std::vector< coord_t > is more readable to me.
(Thanx a lot for suggestions!!)
Finally I am using vectors for the stated problem for reasons as below:
1.Unlike Arrays (not array object ofcourse), I don't need to manually deallocate unallocated memory.
2.There are numerous built in methods defined under vector class
Vector size can be extended at later stages
Below is how I used 2D Vector to store points (x,y,z co-ordinates)
Initialized (allocated memory) a 2D vector:
vector<vector<float>> array (1000, vector<float> array (3));
Where 1000 is the number of rows, and 3 is the number of columns
Once declared, values can be passed simply as:
array[i][j] = some value;
Also, at later stage I declared functions taking vector arguments and returning vectors as:
vector <vector <float>> function_name ( vector <vector <float>>);
vector <vector <float>> function_name ( vector <vector <float>> input_vector_name)
{
return output_vector_name_created_inside_function
}
Note: This method crates a copy of vector while returning, use pointer to return by reference. Even though mine is not working when I return vector by reference :(
For multi arrays I recommended use boost::multi_array.
Example:
typedef boost::multi_array<double, 3> array_type;
array_type A(boost::extents[3][4][2]);
A[0][0][0] = 3.14;

Incrementally dynamic allocation of memory in C/C++

I have a for-loop that needs to incrementally add columns to a matrix. The size of the rows is known before entering the for-loop, but the size of the columns varies depending on some condition. Following code illustrates the situation:
N = getFeatureVectorSize();
float **fmat; // N rows, dynamic number of cols
for(size_t i = 0; i < getNoObjects(); i++)
{
if(Object[i] == TARGET_OBJECT)
{
float *fv = new float[N];
getObjectFeatureVector(fv);
// How to add fv to fmat?
}
}
Edit 1 This is how I temporary solved my problem:
N = getFeatureVectorSize();
float *fv = new float[N];
float *fmat = NULL;
int col_counter = 0;
for(size_t i = 0; i < getNoObjects(); i++)
{
if(Object[i] == TARGET_OBJECT)
{
getObjectFeatureVector(fv);
fmat = (float *) realloc(fmat, (col_counter+1)*N*sizeof(float));
for(int r=0; r<N; r++) fmat[col_counter*N+r] = fv[r];
col_counter++;
}
}
delete [] fv;
free(fmat);
However, I'm still looking for a way to incrementally allocate memory of a two-dimensional array in C/C++.
To answer your original question
// How to add fv to fmat?
When you use float **fmat you are declaring a pointer to [an array of] pointers. Therefore you have to allocate (and free!) that array before you can use it. Think of it as the row pointer holder:
float **fmat = new float*[N];
Then in your loop you simply do
fmat[i] = fv;
However I suggest you look at the std::vector approach since it won't be significantly slower and will spare you from all those new and delete.
better - use boost::MultiArray as in the top answer here :
How do I best handle dynamic multi-dimensional arrays in C/C++?
trying to dynamically allocate your own matrix type is pain you do not need.
Alternatively - as a low-tech, quick and dirty solution, use a vector of vectors, like this :
C++ vector of vectors
If you want to do this without fancy data structures, you should declare fmat as an array of size N of pointers. For each column, you'll probably have to just guess at a reasonable size to start with. Dynamically allocate an array of that size of floats, and set the appropriate element of fmat to point at that array. If you run out of space (as in, there are more floats to be added to that column), try allocating a new array of twice the previous size. Change the appropriate element of fmat to point to the new array and deallocate the old one.
This technique is a bit ugly and can cause many allocations/deallocations if your predictions aren't good, but I've used it before. If you need dynamic array expansion without using someone else's data structures, this is about as good as you can get.
To elaborate the std::vector approach, this is how it would look like:
// initialize
N = getFeatureVectorSize();
vector<vector<float>> fmat(N);
Now the loop looks the same, you access the rows by saying fmat[i], however there is no pointer to a float. You simply call fmat[i].resize(row_len) to set the size and then assign to it using fmat[i][z] = 1.23.
In your solution I suggest you make getObjectFeatureVector return a vector<float>, so you can just say fmat[i] = getObjectFeatureVector();. Thanks to the C++11 move constructors this will be just as fast as assigning the pointers. Also this solution will solve the problem of getObjectFeatureVector not knowing the size of the array.
Edit: As I understand you don't know the number of columns. No problem:
deque<vector<float>> fmat();
Given this function:
std::vector<float> getObjectFeatureVector();
This is how you add another column:
fmat.push_back(getObjectFeatureVector());
The number of columns is fmat.size() and the number of rows in a column is fmat[i].size().

2D Vectors/Dynamic Arrays

I'm trying to work with 2D arrays in order to keep track of some objects that are laid out in a grid fashion. I would like each element of the of the 2d array to contain an Object*. Object being a class I have defined. However working with these things isn't exactly easy.
This is the my method for filling the 2D array with Object pointers:
int xDim;
//how far to go in the x direction
//x's Dimension that is
Object *** test; //the highest level pointer used
test = new Object ** [xDim];
//add horizontal array of Object **
for(int fillPos=0; fillPos < xDim; fillPos++){
//point each Object ** to a new Object * array
//add column arrays
test[fillPos] = new Object*[zDim];
}
My intention is then to use this array's Object pointers to point to the child class of Object, say childObj. My intent is to use them in this way.
for (int xPos=0; xPos < xDim; xPos++){
for(int zPos=0; zPos < zDim; zPos++){
//pointing each Object * in the 2D array to
//a new childObj
test[xPos] [zPos] = new childObj;
}
}
I realize this could potentially be a real hassle in terms of memory. I'm asking if this is a nice way to handle such a situation. Could perhaps something like
vector< <vector<Object*> > work better? Would vectors manage the deletion nicely so as to avoid memory leaks? Or perhaps I would simply have to loop through the vector and call delete on each Object* before getting rid of the vectors themselves?
So, should I use arrays as I have or vectors? What could be some problems associated with each method?
Using Object *** requires that you go through and delete each Object Pointer, each Array of Object Pointers, and then the finally delete the outermost Array of Object**, in that order. In my opinion this leaves a lot of room for carelessness and mistakes.
for (int xPos=0; xPos < xDim; xPos++) {
for (int zPos=0; zPos < zDim; zPos++) {
delete test[xPos][yPos]; // delete the object ptr
}
delete[] test[xPos]; // delete each array of object ptr
}
delete[] test; // delete the array of array of object ptrs
I would much rather prefer the vector approach, because the vectors are locally scoped. Dynamic allocation can be rather expensive and should be avoided if possible.
So for the vector approach, you would only need to delete the Object ptrs. (A good rule of thumb is that every call to new requires a corresponding call to delete).
vector<vector<Object*>> matrix;
... // some code here
for each (vector<Object*> vec in matrix)
for each (Object* oPtr in vec)
delete oPtr;
If you knew the size of your 2-D array at compile-time, you could achieve the same effect of avoiding memory management for the 2-D array, and simply manage the Object pointers.
Object * matrix[xDim][yDim]; // xDim and yDim are compile-time constants
But I still like vectors because they have the added benefit of being able to resize themselves dynamically unlike arrays, so you won't have to worry about knowing the size upfront.