My code has a 4D matrix in it for some math problem solving
int**** Sads = new int***[inputImage->HeightLines];
for (size_t i = 0; i < inputImage->HeightLines; i++)
{
Sads[i] = new int**[inputImage->WidthColumns];
for (size_t j = 0; j < inputImage->WidthColumns; j++)
{
Sads[i][j] = new int*[W_SIZE];
for (size_t k = 0; k < W_SIZE; k++)
{
Sads[i][j][k] = new int[W_SIZE];
}
}
}
//do something with Sads...
for (int i = 0; i < inputImage->HeightLines; i++)
{
int*** tempI = Sads[i];
for (int j = 0; j < inputImage->WidthColumns; j++)
{
int** tempJ = tempI[j];
for (int k = 0; k < W_SIZE; k++)
{
delete[] tempJ[k];
}
delete[] Sads[i][j];
}
delete[] Sads[i];
}
delete[] Sads;
The sizes are very large WidthColumns = 2018, HeightLines = 1332, W_SIZE =7, the memory allocation is very fast but the memory deallocation (delete) is very slow.
Is there a way to optimize it?
I tired openMP but it throws unrelated errors of missing DLL which are there... if I removed the #pragma omp parallel for everything works fine. but slow...
Using a pointer to a pointer to... is a bad idea because it will fragment your data a lot.
I would create a class ta manage the indices transform and use 1D array, it's a bit more complicated but it will be faster.
Anyway, a trick: nothing prevent you to build your int**** with pointers to a zone in memory that isn't sparse (1D array you preallocated) and then use it as a 4D array.
I'd probably be inclined to use a std::vector. Now memory allocation is taken care of for me (in one allocation/deallocation) and I get free copy/move semantics.
All I have to do is provide the offset calculations:
#include <vector>
#include <cstddef>
struct vector4
{
vector4(std::size_t lines, std::size_t columns)
: lines_(lines), columns_(columns)
, storage_(totalSize())
{}
auto totalSize() const -> std::size_t
{
return lines_ * columns_ * w_size * w_size;
}
int* at(std::size_t a)
{
return storage_.data() + (a * columns_ * w_size * w_size);
}
int* at(std::size_t a, std::size_t b)
{
return at(a) + (b * w_size * w_size);
}
int* at(std::size_t a, std::size_t b, std::size_t c)
{
return at(a, b) + (c * w_size);
}
int& at(std::size_t a, std::size_t b, std::size_t c, std::size_t d)
{
return *(at(a, b, c) + d);
}
private:
std::size_t lines_, columns_;
static constexpr std::size_t w_size = 32; // ?
std::vector<int> storage_;
};
int main()
{
auto v = vector4(20, 20);
v.at(3, 2, 5, 1) = 6;
// other things
// now let it go out of scope
}
The correct way to create, use, and delete a 4D array is this, using the closure of the statement group to delete the automatic variables.
{
const int H = 10;
const int I = 10;
const int J = 10;
const int K = 10;
int h = 0;
int i = 0;
int j = 0;
int k = 0;
int fourDimArray [H][I][J][K];
fourDimArray[h][i][j][k] = 0;
}
If you have a need to dynamically allocate, then use either STL's list or vector class or use something like this with perhaps inline methods to calculate the index of the 1D array from the 4D array indices if you need blazing speed.
int * fourDimArrayAsOneDim = new int[H*I*J*K];
fourDimArrayAsOneDim[indexFromIndices(h, i, j, k)] = 0;
delete [] fourDimArrayAsOneDim;
Related
I am trying to do a large matrix multiplication, e.g. 1000x1000. Unfortunately, it only works for very small matrices. For the big ones, the program just turns on and that's all - no results. Here's the code:
#include <iostream>
using namespace std;
int main() {
int matrix_1_row;
int matrix_1_column;
matrix_1_row = 10;
matrix_1_column = 10;
int** array_1 = new int* [matrix_1_row];
// dynamically allocate memory of size matrix_1_column for each row
for (int i = 0; i < matrix_1_row; i++)
{
array_1[i] = new int[matrix_1_column];
}
// assign values to allocated memory
for (int i = 0; i < matrix_1_row; i++)
{
for (int j = 0; j < matrix_1_column; j++)
{
array_1[i][j] = 3;
}
}
int matrix_2_row;
int matrix_2_column;
matrix_2_row = 10;
matrix_2_column = 10;
// dynamically create array of pointers of size matrix_2_row
int** array_2 = new int* [matrix_2_row];
// dynamically allocate memory of size matrix_2_column for each row
for (int i = 0; i < matrix_2_row; i++)
{
array_2[i] = new int[matrix_2_column];
}
// assign values to allocated memory
for (int i = 0; i < matrix_2_row; i++)
{
for (int j = 0; j < matrix_2_column; j++)
{
array_2[i][j] = 2;
}
}
// Result
int result_row = matrix_1_row;
int result_column = matrix_2_column;
// dynamically create array of pointers of size result_row
int** array_3 = new int* [result_row];
// dynamically allocate memory of size result_column for each row
for (int i = 0; i < result_row; i++)
{
array_3[i] = new int[result_column];
}
// Matrix multiplication
for (int i = 0; i < matrix_1_row; i++)
{
for (int j = 0; j < matrix_2_column; j++)
{
array_3[i][j] = 0;
for (int k = 0; k < matrix_1_column; k++)
{
array_3[i][j] += array_1[i][k] * array_2[k][j];
}
}
}
//RESULTS
for (int i = 0; i < result_row; i++)
{
for (int j = 0; j < result_column; j++)
{
std::cout << array_3[i][j] << "\t";
}
}
// deallocate memory using delete[] operator 1st matrix
for (int i = 0; i < matrix_1_row; i++)
{
delete[] array_1[i];
}
delete[] array_1;
// deallocate memory using delete[] operator 2nd matrix
for (int i = 0; i < matrix_2_row; i++)
{
delete[] array_2[i];
}
delete[] array_2;
// deallocate memory using delete[] operator result
for (int i = 0; i < result_row; i++)
{
delete[] array_3[i];
}
delete[] array_3;
return 0;
}
Anyone have an idea how to fix it? At what point did I go wrong? I used pointers, dynamic memory allocation.
Instead of working with arrays directly named as matrix, try something simple and scalable, then optimize. Something like this:
class matrix
{
private:
// sub-matrices
std::shared_ptr<matrix> c11;
std::shared_ptr<matrix> c12;
std::shared_ptr<matrix> c21;
std::shared_ptr<matrix> c22;
// properties
const int n;
const int depth;
const int maxDepth;
// this should be shared-ptr too. Too lazy.
int data[16]; // lowest level matrix = 4x4 without sub matrix
// multiplication memory
std::shared_ptr<std::vector<matrix>> m;
public:
matrix(const int nP=4,const int depthP=0,const int maxDepthP=1):
n(nP),depth(depthP),maxDepth(maxDepthP)
{
if(depth<maxDepth)
{
// allocate c11,c22,c21,c22
// allocate m1,m2,m3,...m7
}
}
// matrix-matrix multiplication
matrix operator * (const matrix & mat)
{
// allocate result
// multiply
if(depth!=maxDepth)
{
// Strassen's multiplication algorithm
*m[0] = (*c11 + *c22) * (*mat.c11 + *mat.c22);
...
*m[6] = (*c12 - *c22) * (*mat.c21 + *mat.c22);
*c11 = *m[0] + *m[3] - *m[4] + *m[6];
..
*c22 = ..
}
else
{
// innermost submatrices (4x4) multiplied normally
result.data[0] = data[0]*mat.data[0] + ....
...
result.data[15]= ...
}
return result;
}
// matrix-matrix adder
matrix operator + (const matrix & mat)
{
// allocate result
// add
if(depth!=maxDepth)
{
*result.c11 = *c11 + *mat.c11;
*result.c12 = *c12 + *mat.c12;
*result.c21 = *c21 + *mat.c21;
*result.c22 = *c22 + *mat.c22;
}
else
{
// innermost matrix
result.data[0] = ...
}
return result;
}
};
This way, it costs less time-complexity and still looks simple to read. After it works, you can use single-block of matrix array inside of class to optimize for more speed, preferably only allocating once at root matrix and use
std::span
for access from submatrices for newer C++ versions. It is even parallelizable easily as each matrix can distribute its work to at least 4 threads and they can to 16 threads, 64 threads, etc. But of course too many threads are just as bad as too many allocations and should be optimized in a better way.
Assume I have a class A that has say 3 methods. So the first methods assigns some values to the first array and the rest of the methods in order modify what is computed by the previous method. Since I wanted to avoid designing the methods that return an array (pointer to local variable) I picked 3 data member and store the intermediate result in each of them. Please note that this simple code is used for illustration.
class A
{
public: // for now how the class members should be accessed isn't important
int * a, *b, *c;
A(int size)
{
a = new int [size];
b = new int [size];
c = new int [size];
}
void func_a()
{
int j = 1;
for int(i = 0; i < size; i++)
a[i] = j++; // assign different values
}
void func_b()
{
int k = 6;
for (int i = 0; i < size; i++)
b[i] = a[i] * (k++);
}
void func_c()
{
int p = 6;
for int (i = 0; i < size; i++)
c[i] = b[i] * (p++);
}
};
Clearly, if I have more methods I have to have more data members.
** I'd like to know how I can re-design the class (having methods that return some values and) at the same time, the class does not have the any of two issues (returning pointers and have many data member to store the intermediate values)
There are two possibilities. If you want each function to return a new array of values, you can write the following:
std::vector<int> func_a(std::vector<int> vec){
int j = 1;
for (auto& e : vec) {
e = j++;
}
return vec;
}
std::vector<int> func_b(std::vector<int> vec){
int j = 6;
for (auto& e : vec) {
e *= j++;
}
return vec;
}
std::vector<int> func_c(std::vector<int> vec){
//same as func_b
}
int main() {
std::vector<int> vec(10);
auto a=func_a(vec);
auto b=func_b(a);
auto c=func_c(b);
//or in one line
auto r = func_c(func_b(func_a(std::vector<int>(10))));
}
Or you can apply each function to the same vector:
void apply_func_a(std::vector<int>& vec){
int j = 1;
for (auto& e : vec) {
e = j++;
}
}
void apply_func_b(std::vector<int>& vec){
int j = 6;
for (auto& e : vec) {
e *= j++;
}
}
void apply_func_c(std::vector<int>& vec){
// same as apply_func_b
}
int main() {
std::vector<int> vec(10);
apply_func_a(vec);
apply_func_b(vec);
apply_func_c(vec);
}
I'm not a big fan of the third version (passing the input parameter as the output):
std::vector<int>& func_a(std::vector<int>& vec)
Most importantly, try to avoid C-style arrays and use std::vector or std::array, and don't use new, but std::make_unique and std::make_shared
I'm assuming you want to be able to modify a single array with no class-level attributes and without returning any pointers. Your above code can be modified to be a single function, but I've kept it as 3 to more closely match your code.
void func_a(int[] arr, int size){
for(int i = 0; i < size; i++)
arr[i] = i+1;
}
void func_b(int[] arr, int size){
int k = 6;
for(int i = 0; i < size; i++)
arr[i] *= (k+i);
}
//this function is exactly like func_b so it is really unnecessary
void func_c(int[] arr, int size){
int p = 6;
for(int i = 0; i < size; i++)
arr[i] *= (p+i);
}
But if you just want a single function:
void func(int[] arr, int size){
int j = 6;
for(int i = 0; i < size; i++)
arr[i] = (i+1) * (j+i) * (j+i);
}
This solution in other answers is better, if you are going to allocate memory then do it like this (and test it!) also if you are not using the default constructor and copy constructor then hide them, this will prevent calling them by accident
class A{
private:
A(const &A){}
A() {}//either define these or hide them as private
public:
int * a, *b, *c;
int size;
A(int sz) {
size = sz;
a = new int[size];
b = new int[size];
c = new int[size];
}
~A()
{
delete[]a;
delete[]b;
delete[]c;
}
//...
};
I'm in the process of trying to learn how to do things in C++, and one of the aspects with which I'm grappling is how to efficiently implement dynamically allocated multidimensional arrays.
For example, say I have an existing function:
void myfunc(int *lambda, int *D, int *tau, int r[*tau][*D])
{
int i, j, k, newj, leftovers;
r[0][0] = *lambda;
j = 0; // j indexes the columns; start with zero
for(i = 1; i < *tau; i++){ // i indexes the rows
leftovers = *lambda;
for(k = 0; k < j; k++){
r[i][k] = r[i - 1][k]; // copy prior to j
leftovers = leftovers - r[i][k];
}
r[i][j] = r[i - 1][j] - 1; // decrement
r[i][j+1] = leftovers - r[i][j]; // initialize to the right of j
if(j == *D - 2){ // second to last column
for(k = 0; k <= j; k++){ if(r[i][k] != 0){ newj = k; } }
j = newj; // can't think of a better way to do this
}else{
j++; // increment j
}
} // next row please
}
From what I've read, it seems a common recommendation is to use std::vector for this purpose. Would anyone care to offer some advice or code snippet on how to implement the r matrix above using the std::vector equivalent?
I would have thought this is a fairly common situation, but interestingly, google turned up fewer than 50 hits for "C99 into C++".
Thank you!
Ben
I think this would be about the most straightforward conversion:
void myfunc(int *lambda, std::vector<std::vector<int> > &r)
{
int i, j, k, newj, leftovers;
int tau = r.size();
r[0][0] = *lambda;
j = 0; // j indexes the columns; start with zero
for(i = 1; i < tau; i++){ // i indexes the rows
int D = r[i].size();
leftovers = *lambda;
for(k = 0; k < j; k++){
r[i][k] = r[i - 1][k]; // copy prior to j
leftovers = leftovers - r[i][k];
}
r[i][j] = r[i - 1][j] - 1; // decrement
r[i][j+1] = leftovers - r[i][j]; // initialize to the right of j
if(j == D - 2){ // second to last column
for(k = 0; k <= j; k++){ if(r[i][k] != 0){ newj = k; } }
j = newj; // can't think of a better way to do this
}else{
j++; // increment j
}
} // next row please
}
You have numerous options.
The quick change:
void myfunc(const int& lambda, const size_t& D, const size_t& tau, int* const* const r) {
...
Using a vector (which will not enforce matching sizes at compilation):
void myfunc(const int& lambda, std::vector<std::vector<int>>& r) {
const size_t tau(r.size()); // no need to pass
const size_t D(r.front().size()); // no need to pass
...
Or using std::array for static sizes:
enum { tau = 5, D = 5 };
void myfunc(const int& lambda, std::array<std::array<int,D>,tau>& r) {
...
Or using template parameters for fixed sizes:
template < size_t tau, size_t D >
void myfunc(const int& lambda, std::array<std::array<int,D>,tau>& r) {
...
or just:
template < size_t tau, size_t D >
void myfunc(const int& lambda, int r[D][tau]) {
...
Note that you can also combine static and dynamic sized arrays as needed in C++.
Finally, Multi Arrays are here to help you: http://www.boost.org/doc/libs/1_53_0/libs/multi_array/doc/user.html
I would change all r[x][y] to R(x,y) and use
int * r;
#define R(x,y) r[ (x) * (*D) + (y) ]
Or maybe change *D to *tau, I can never keep those straight.
I have a two dimensional array that I've allocated dynamically using new.
The problem is I want to allocate the memory as one connected block instead of in separated pieces to increase processing speed.
Does anyone know if it's possible to do this with new, or do I have to use malloc?
Here's my code:
A = new double*[m];
for (int i=0;i<m;i++)
{
A[i]= new double[n];
}
This code causes a segmentation fault
phi = new double**[xlength];
phi[0] = new double*[xlength*ylength];
phi[0][0] = new double[xlength*ylength*tlength];
for (int i=0;i<xlength;i++)
{
for (int j=0;j<ylength;j++)
{
phi[i][j] = phi[0][0] + (ylength*i+j)*tlength;
}
phi[i] = phi[0] + ylength*i;
}
You can allocate one big block and use it appropriately, something like this:
double* A = new double[m*n];
for (int i=0; i<m; i++) {
for (int j=0; j<n; j++) {
A[i*n+j] = <my_value>;
}
}
Instead of using new, you can use malloc - there is no much difference, except that new must be released with delete, and malloc() released with free().
UPDATE1:
You can create "true" 2d array as follows:
double** A = new double*[m];
double* B = new double[m*n];
for (int i=0; i<m; i++) {
A[i] = B + n*i;
}
for (int i=0; i<m; i++) {
for (int j=0; j<n; j++) {
A[i][j] = <my_value>;
}
}
Just be sure to release both A and B in the end.
UPDATE2:
By popular request, this is how you can create "true" 3-dimensional array (with dimensions m x n x o):
double*** A = new double**[m];
double** B = new double*[m*n];
double* C = new double[m*n*o];
for (int i=0; i<m; i++) {
for (int j=0; j<n; j++) {
B[n*i+j] = C + (n*i+j)*o;
}
A[i] = B + n*i;
}
for (int i=0; i<m; i++) {
for (int j=0; j<n; j++) {
for (int k=0; k<o; k++) {
A[i][j][k] = <my_value>;
}
}
}
This uses 2 relatively small "index" arrays A and B, and data array C. As usual, all three should be released after use.
Extending this for more dimensions is left as an exercise for the reader.
There is nothing you can do with malloc that you can't do with new (though the converse doesn't hold). However if you've already allocated the memory in separate blocks, you will have to allocate new (contiguous) memory in order to get a connected block (with either malloc or new). The code you show allocates m non-contiguous n-sized blocks. To get an array with contiguous memory from this, you would need
int MN = m*n;
B = new double[MN];
for (int i=0; i<MN; ++i)
B[i] = A[ i/N ] [ i%N ];
Ok, if the task is to maintain a single block of memory, but keep [][] way of addressing it, I'd try a few tricks with classes. The first one is an inside proxy:
class CoordProxy
{
private:
int coordX;
int arrayWidth;
int * dataArray;
public:
CoordProxy(int * newArray, int newArrayWidth, int newCoordX)
{
coordX = newCoordX;
arrayWidth = newArrayWidth;
dataArray = newArray;
}
int & operator [](int newCoordY)
{
return (dataArray[newCoordY * arrayWidth + coordX]);
}
};
class CoordsWrapper
{
private:
int * dataArray;
int width;
int height;
public:
CoordsWrapper(int * newArray, int newWidth, int newHeight)
{
dataArray = newArray;
width = newWidth;
height = newHeight;
}
CoordProxy operator[] (int coordX)
{
return CoordProxy(dataArray, width, coordX);
}
};
int main(int argc, char * argv[])
{
int * a = new int[4 * 4];
ZeroMemory(a, 4 * 4 * sizeof(int));
CoordsWrapper w(a, 4, 4);
w[0][0] = 10;
w[0][1] = 20;
w[3][3] = 30;
std::for_each(&a[0], &a[4 * 4], [](int x) { printf("%d ", x); });
delete[] a;
}
Note, that this is not time-efficient, but extremely memory efficient: uses 4 ints and 2 pointers more than original class.
There's even nicer and a lot faster solution, but you would have to resign from [][] notation in favor of (,) notation:
class CoordsWrapper2
{
private:
int * data;
int width;
int height;
public:
CoordsWrapper2(int * newData, int newWidth, int newHeight)
{
data = newData;
width = newWidth;
height = newHeight;
}
inline int & Data(int x, int y)
{
return data[y * width + x];
}
};
int main(int argc, char * argv[])
{
int * a = new int[4 * 4];
ZeroMemory(a, 4 * 4 * sizeof(int));
CoordsWrapper2 w(a, 4, 4);
w.Data(0, 0) = 10;
w.Data(0, 1) = 20;
w.Data(3, 3) = 30;
std::for_each(&a[0], &a[4 * 4], [](int x) { printf("%d ", x); });
delete[] a;
}
Note the inline directive. It suggests the compiler to replace the method call for actual source code, which make it a little faster. This solution is even more memory efficient and a either a tiny bit less or equally time efficient as classic indexing.
How to pass by reference multidimensional array with unknown size in C or C++?
EDIT:
For example, in main function I have:
int main(){
int x, y;
int arr[x][y];
// pass_by_ref(/* passing just arr[][] by reference */);
}
and the function:
void pass_by_ref(/* proper parameter for arr[][] */){
// int size_x_Arr = ???
// int size_y_arr = ???
}
How to implement the commented line?
Simply put, you can't. In C, you can't pass by reference, since C has no references. In C++, you can't pass arrays with unknown size, since C++ doesn't support variable-lenght arrays.
Alternative solutions: in C99, pass a pointer to the variable-length array; in C++, pass a reference to std::vector<std::vector<T>>.
Demonstration for C99:
#include <stdio.h>
void foo(int n, int k, int (*arr)[n][k])
{
int i, j;
for (i = 0; i < n; i++) {
for (j = 0; j < k; j++) {
printf("%3d ", (*arr)[i][j]);
}
printf("\n");
}
}
int main(int argc, char *argv[])
{
int a = strtol(argv[1], NULL, 10);
int b = strtol(argv[2], NULL, 10);
int arr[a][b];
int i, j;
for (i = 0; i < a; i++) {
for (j = 0; j < b; j++) {
arr[i][j] = i * j;
}
}
foo(a, b, &arr);
return 0;
}
Demonstration for C++03:
#include <iostream>
#include <vector>
#include <cstdlib>
#include <ctime>
void foo(std::vector < std::vector < int > > &vec)
{
for (std::vector < std::vector < int > >::iterator i = vec.begin(); i != vec.end(); i++) {
for (std::vector<int>::iterator j = i->begin(); j != i->end(); j++) {
std::cout << *j << " ";
}
std::cout << std::endl;
}
}
int main(int argc, char *argv[])
{
int i = strtol(argv[1], NULL, 10);
int j = strtol(argv[2], NULL, 10);
srand(time(NULL));
std::vector < std::vector < int > > vec;
vec.resize(i);
for (std::vector < std::vector < int > >::iterator it = vec.begin(); it != vec.end(); it++) {
it->resize(j);
for (std::vector<int>::iterator jt = it->begin(); jt != it->end(); jt++) {
*jt = random() % 10;
}
}
foo(vec);
return 0;
}
H2CO3's solution will work for C99 or a C2011 compiler that supports VLAs. For C89 or a C2011 compiler that doesn't support VLAs, or (God forbid) a K&R C compiler, you'd have to do something else.
Assuming you're passing a contiguously allocated array, you can pass a pointer to the first element (&a[0][0]) along with the dimension sizes, and then treat it as a 1-D array, mapping indices like so:
void foo( int *a, size_t rows, size_t cols )
{
size_t i, j;
for (i = 0; i < rows; i++)
{
for (j = 0; j < cols; j++)
{
a[i * rows + j] = some_value();
}
}
}
int main( void )
{
int arr[10][20];
foo( &arr[0][0], 10, 20 );
...
return 0;
}
This will work for arrays allocated on the stack:
T a[M][N];
and for dynamically allocated arrays of the form:
T (*ap)[N] = malloc( M * sizeof *ap );
since both will have contiguously allocated rows. This will not work (or at least, not be guaranteed to work) for dynamically allocated arrays of the form:
T **ap = malloc( M * sizeof *ap );
if (ap)
{
size_t i;
for (i = 0; i < M; i++)
{
ap[i] = malloc( N * sizeof *ap[i] );
}
}
since it's not guaranteed that all the rows will be allocated contiguously to each other.
This is a sort of comment to the good answer of #John Bode
This will not work (or at least, not be guaranteed to work) for
dynamically allocated arrays of the form:
But this variant will:
T **ap = malloc( M * sizeof *ap );
if (ap) return NULL; ---> some error atention
if (ap)
{
ap[0] = malloc( M * N * sizeof *ap[i] );
if (ap[0]) { free(ap); return NULL;} ---> some error atention
size_t i;
for (i = 1; i < M; i++)
{
ap[i] = ap[0] + i * N;
}
}
After use :
free(ap[0]);
free(ap);
for T being int you call foo exactly als for the array int ap[M][N];
foo( &ap[0][0], M, N);
since you guaranteed that all the rows are allocated contiguously to each other.
This allocation is a litter more efficient.
John Bode's explanation is very good, but there is a little mistake:
it should be
i * cols + j
instead of
i * rows + j
If you really want references, then it's only in C++.
En example of a two-dimensional int array passed by reference
void function_taking_an_array(int**& multi_dim_array);
But the reference doesn't have any advantage, so simply use :
void function_taking_an_array(int** multi_dim_array);
I would advice you to use a container to hold your array.