matrix showing up empty when passed from cpp to CUDA

matrix showing up empty when passed from cpp to CUDA - c++

I've passed a 2D array from a C++ class to a CUDA function; however, once in the CUDA function the data in the matrix is gone. I'm still in the host, not the device so I don't understand what I've done wrong as this should be very straight forward.
Here is the C++
int main()
{
const int row=8;
const int column=8;
int rnum;
srand(time(0));
rnum = (rand() % 100) + 1;
float table[row][column];
for(int r=0; r<row; r++){
for(int c=0; c<column;c++){
table[row][column] = (rand()%100) + 1.f;
cout << table[row][column] << " ";
}
cout << "\n";
}
//CUDA
handleMatrix(&table[0][0], 8);
}
Here is the CUDA code that is just printing out the matrix.
void handleMatrix(float * A, int size)
{
printf("&A[0]=%i\n",&A);
printf("A[0] is %f \n",A[0]);
for(int j=0; j<size; j++){
for(int k=0; k<size;k++){
printf("%f ",A[j +size*k]); // << " ";
}
printf("\n");
}
}
In the C++ file - the print out of the matrix has real numbers, but the CUDA function just prints out 0's for both the matrix and for the address of A[0]. I don't know if this means I'm not passing in the matrix correctly between the 2 or if there is something I should do with the matrix once I get it to the CUDA function.

Ha, needed a while to find it. Check the indexing in your matrix randomization code. :) You're using the wrong variables and never initialize the float values.

float * A is a pointer on host, not in device space. use cuda malloc+memcpy.
float * A doesnt pass contents, only the address.

Related

how could I change memory layout of multi dimensional arrays with c++ [duplicate]

I have a matrix (relatively big) that I need to transpose. For example assume that my matrix is
a b c d e f
g h i j k l
m n o p q r
I want the result be as follows:
a g m
b h n
c I o
d j p
e k q
f l r
What is the fastest way to do this?

This is a good question. There are many reason you would want to actually transpose the matrix in memory rather than just swap coordinates, e.g. in matrix multiplication and Gaussian smearing.
First let me list one of the functions I use for the transpose (EDIT: please see the end of my answer where I found a much faster solution)
void transpose(float *src, float *dst, const int N, const int M) {
#pragma omp parallel for
for(int n = 0; n<N*M; n++) {
int i = n/N;
int j = n%N;
dst[n] = src[M*j + i];
}
}
Now let's see why the transpose is useful. Consider matrix multiplication C = A*B. We could do it this way.
for(int i=0; i<N; i++) {
for(int j=0; j<K; j++) {
float tmp = 0;
for(int l=0; l<M; l++) {
tmp += A[M*i+l]*B[K*l+j];
}
C[K*i + j] = tmp;
}
}
That way, however, is going to have a lot of cache misses. A much faster solution is to take the transpose of B first
transpose(B);
for(int i=0; i<N; i++) {
for(int j=0; j<K; j++) {
float tmp = 0;
for(int l=0; l<M; l++) {
tmp += A[M*i+l]*B[K*j+l];
}
C[K*i + j] = tmp;
}
}
transpose(B);
Matrix multiplication is O(n^3) and the transpose is O(n^2), so taking the transpose should have a negligible effect on the computation time (for large n). In matrix multiplication loop tiling is even more effective than taking the transpose but that's much more complicated.
I wish I knew a faster way to do the transpose (Edit: I found a faster solution, see the end of my answer). When Haswell/AVX2 comes out in a few weeks it will have a gather function. I don't know if that will be helpful in this case but I could image gathering a column and writing out a row. Maybe it will make the transpose unnecessary.
For Gaussian smearing what you do is smear horizontally and then smear vertically. But smearing vertically has the cache problem so what you do is
Smear image horizontally
transpose output
Smear output horizontally
transpose output
Here is a paper by Intel explaining that
http://software.intel.com/en-us/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions
Lastly, what I actually do in matrix multiplication (and in Gaussian smearing) is not take exactly the transpose but take the transpose in widths of a certain vector size (e.g. 4 or 8 for SSE/AVX). Here is the function I use
void reorder_matrix(const float* A, float* B, const int N, const int M, const int vec_size) {
#pragma omp parallel for
for(int n=0; n<M*N; n++) {
int k = vec_size*(n/N/vec_size);
int i = (n/vec_size)%N;
int j = n%vec_size;
B[n] = A[M*i + k + j];
}
}
EDIT:
I tried several function to find the fastest transpose for large matrices. In the end the fastest result is to use loop blocking with block_size=16 (Edit: I found a faster solution using SSE and loop blocking - see below). This code works for any NxM matrix (i.e. the matrix does not have to be square).
inline void transpose_scalar_block(float *A, float *B, const int lda, const int ldb, const int block_size) {
#pragma omp parallel for
for(int i=0; i<block_size; i++) {
for(int j=0; j<block_size; j++) {
B[j*ldb + i] = A[i*lda +j];
}
}
}
inline void transpose_block(float *A, float *B, const int n, const int m, const int lda, const int ldb, const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
transpose_scalar_block(&A[i*lda +j], &B[j*ldb + i], lda, ldb, block_size);
}
}
}
The values lda and ldb are the width of the matrix. These need to be multiples of the block size. To find the values and allocate the memory for e.g. a 3000x1001 matrix I do something like this
#define ROUND_UP(x, s) (((x)+((s)-1)) & -(s))
const int n = 3000;
const int m = 1001;
int lda = ROUND_UP(m, 16);
int ldb = ROUND_UP(n, 16);
float *A = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
float *B = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
For 3000x1001 this returns ldb = 3008 and lda = 1008
Edit:
I found an even faster solution using SSE intrinsics:
inline void transpose4x4_SSE(float *A, float *B, const int lda, const int ldb) {
__m128 row1 = _mm_load_ps(&A[0*lda]);
__m128 row2 = _mm_load_ps(&A[1*lda]);
__m128 row3 = _mm_load_ps(&A[2*lda]);
__m128 row4 = _mm_load_ps(&A[3*lda]);
_MM_TRANSPOSE4_PS(row1, row2, row3, row4);
_mm_store_ps(&B[0*ldb], row1);
_mm_store_ps(&B[1*ldb], row2);
_mm_store_ps(&B[2*ldb], row3);
_mm_store_ps(&B[3*ldb], row4);
}
inline void transpose_block_SSE4x4(float *A, float *B, const int n, const int m, const int lda, const int ldb ,const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
int max_i2 = i+block_size < n ? i + block_size : n;
int max_j2 = j+block_size < m ? j + block_size : m;
for(int i2=i; i2<max_i2; i2+=4) {
for(int j2=j; j2<max_j2; j2+=4) {
transpose4x4_SSE(&A[i2*lda +j2], &B[j2*ldb + i2], lda, ldb);
}
}
}
}
}

This is going to depend on your application but in general the fastest way to transpose a matrix would be to invert your coordinates when you do a look up, then you do not have to actually move any data.

Some details about transposing 4x4 square float (I will discuss 32-bit integer later) matrices with x86 hardware. It's helpful to start here in order to transpose larger square matrices such as 8x8 or 16x16.
_MM_TRANSPOSE4_PS(r0, r1, r2, r3) is implemented differently by different compilers. GCC and ICC (I have not checked Clang) use unpcklps, unpckhps, unpcklpd, unpckhpd whereas MSVC uses only shufps. We can actually combine these two approaches together like this.
t0 = _mm_unpacklo_ps(r0, r1);
t1 = _mm_unpackhi_ps(r0, r1);
t2 = _mm_unpacklo_ps(r2, r3);
t3 = _mm_unpackhi_ps(r2, r3);
r0 = _mm_shuffle_ps(t0,t2, 0x44);
r1 = _mm_shuffle_ps(t0,t2, 0xEE);
r2 = _mm_shuffle_ps(t1,t3, 0x44);
r3 = _mm_shuffle_ps(t1,t3, 0xEE);
One interesting observation is that two shuffles can be converted to one shuffle and two blends (SSE4.1) like this.
t0 = _mm_unpacklo_ps(r0, r1);
t1 = _mm_unpackhi_ps(r0, r1);
t2 = _mm_unpacklo_ps(r2, r3);
t3 = _mm_unpackhi_ps(r2, r3);
v = _mm_shuffle_ps(t0,t2, 0x4E);
r0 = _mm_blend_ps(t0,v, 0xC);
r1 = _mm_blend_ps(t2,v, 0x3);
v = _mm_shuffle_ps(t1,t3, 0x4E);
r2 = _mm_blend_ps(t1,v, 0xC);
r3 = _mm_blend_ps(t3,v, 0x3);
This effectively converted 4 shuffles into 2 shuffles and 4 blends. This uses 2 more instructions than the implementation of GCC, ICC, and MSVC. The advantage is that it reduces port pressure which may have a benefit in some circumstances.
Currently all the shuffles and unpacks can go only to one particular port whereas the blends can go to either of two different ports.
I tried using 8 shuffles like MSVC and converting that into 4 shuffles + 8 blends but it did not work. I still had to use 4 unpacks.
I used this same technique for a 8x8 float transpose (see towards the end of that answer).
https://stackoverflow.com/a/25627536/2542702. In that answer I still had to use 8 unpacks but I manged to convert the 8 shuffles into 4 shuffles and 8 blends.
For 32-bit integers there is nothing like shufps (except for 128-bit shuffles with AVX512) so it can only be implemented with unpacks which I don't think can be convert to blends (efficiently). With AVX512 vshufi32x4 acts effectively like shufps except for 128-bit lanes of 4 integers instead of 32-bit floats so this same technique might be possibly with vshufi32x4 in some cases. With Knights Landing shuffles are four times slower (throughput) than blends.

If the size of the arrays are known prior then we could use the union to our help. Like this-
#include <bits/stdc++.h>
using namespace std;
union ua{
int arr[2][3];
int brr[3][2];
};
int main() {
union ua uav;
int karr[2][3] = {{1,2,3},{4,5,6}};
memcpy(uav.arr,karr,sizeof(karr));
for (int i=0;i<3;i++)
{
for (int j=0;j<2;j++)
cout<<uav.brr[i][j]<<" ";
cout<<'\n';
}
return 0;
}

Consider each row as a column, and each column as a row .. use j,i instead of i,j
demo: http://ideone.com/lvsxKZ
#include <iostream>
using namespace std;
int main ()
{
char A [3][3] =
{
{ 'a', 'b', 'c' },
{ 'd', 'e', 'f' },
{ 'g', 'h', 'i' }
};
cout << "A = " << endl << endl;
// print matrix A
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++) cout << A[i][j];
cout << endl;
}
cout << endl << "A transpose = " << endl << endl;
// print A transpose
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++) cout << A[j][i];
cout << endl;
}
return 0;
}

transposing without any overhead (class not complete):
class Matrix{
double *data; //suppose this will point to data
double _get1(int i, int j){return data[i*M+j];} //used to access normally
double _get2(int i, int j){return data[j*N+i];} //used when transposed
public:
int M, N; //dimensions
double (*get_p)(int, int); //functor to access elements
Matrix(int _M,int _N):M(_M), N(_N){
//allocate data
get_p=&Matrix::_get1; // initialised with normal access
}
double get(int i, int j){
//there should be a way to directly use get_p to call. but i think even this
//doesnt incur overhead because it is inline and the compiler should be intelligent
//enough to remove the extra call
return (this->*get_p)(i,j);
}
void transpose(){ //twice transpose gives the original
if(get_p==&Matrix::get1) get_p=&Matrix::_get2;
else get_p==&Matrix::_get1;
swap(M,N);
}
}
can be used like this:
Matrix M(100,200);
double x=M.get(17,45);
M.transpose();
x=M.get(17,45); // = original M(45,17)
of course I didn't bother with the memory management here, which is crucial but different topic.

template <class T>
void transpose( const std::vector< std::vector<T> > & a,
std::vector< std::vector<T> > & b,
int width, int height)
{
for (int i = 0; i < width; i++)
{
for (int j = 0; j < height; j++)
{
b[j][i] = a[i][j];
}
}
}

Modern linear algebra libraries include optimized versions of the most common operations. Many of them include dynamic CPU dispatch, which chooses the best implementation for the hardware at program execution time (without compromising on portability).
This is commonly a better alternative to performing manual optimization of your functinos via vector extensions intrinsic functions. The latter will tie your implementation to a particular hardware vendor and model: if you decide to swap to a different vendor (e.g. Power, ARM) or to a newer vector extensions (e.g. AVX512), you will need to re-implement it again to get the most of them.
MKL transposition, for example, includes the BLAS extensions function imatcopy. You can find it in other implementations such as OpenBLAS as well:
#include <mkl.h>
void transpose( float* a, int n, int m ) {
const char row_major = 'R';
const char transpose = 'T';
const float alpha = 1.0f;
mkl_simatcopy (row_major, transpose, n, m, alpha, a, n, n);
}
For a C++ project, you can make use of the Armadillo C++:
#include <armadillo>
void transpose( arma::mat &matrix ) {
arma::inplace_trans(matrix);
}

intel mkl suggests in-place and out-of-place transposition/copying matrices. here is the link to the documentation. I would recommend trying out of place implementation as faster ten in-place and into the documentation of the latest version of mkl contains some mistakes.

I think that most fast way should not taking higher than O(n^2) also in this way you can use just O(1) space :
the way to do that is to swap in pairs because when you transpose a matrix then what you do is: M[i][j]=M[j][i] , so store M[i][j] in temp, then M[i][j]=M[j][i],and the last step : M[j][i]=temp. this could be done by one pass so it should take O(n^2)

my answer is transposed of 3x3 matrix
#include<iostream.h>
#include<math.h>
main()
{
int a[3][3];
int b[3];
cout<<"You must give us an array 3x3 and then we will give you Transposed it "<<endl;
for(int i=0;i<3;i++)
{
for(int j=0;j<3;j++)
{
cout<<"Enter a["<<i<<"]["<<j<<"]: ";
cin>>a[i][j];
}
}
cout<<"Matrix you entered is :"<<endl;
for (int e = 0 ; e < 3 ; e++ )
{
for ( int f = 0 ; f < 3 ; f++ )
cout << a[e][f] << "\t";
cout << endl;
}
cout<<"\nTransposed of matrix you entered is :"<<endl;
for (int c = 0 ; c < 3 ; c++ )
{
for ( int d = 0 ; d < 3 ; d++ )
cout << a[d][c] << "\t";
cout << endl;
}
return 0;
}

Problem creating and returning jagged array (error std::bad_array_new_length)

For this homework problem, we need to create a new jagged array with the code provided by our professor, print the array, and calculate the max, min, and sum of the array's contents. We are only allowed to edit the createAndReturnJaggedArray() and printAndThenFindMaxMinSum(int**,int*,int*,int*) functions, as the rest of the code was provided for us so we could check that we get the correct output.
I'm able to get the program to run, however after printing an initial string it terminates the program giving me the error terminate called after throwing an instance of 'std::bad_array_new_length' what(): std::bad_array_new_length. I believe the problem is in my creation of the jagged array and my allocation of memory for the columns part of the array, however I used the notes we were given as reference and have no idea where the problem is coming from. The entire program is provided below. Thanks for any help!
EDIT/NOTE: We haven't learned vectors yet so we're not allowed to use them.
#include <iostream>
#include <climits>
using namespace std;
class JaggedArray {
public:
int numRows;
int *numColumnsInEachRow;
JaggedArray() {
numRows = 11;
numColumnsInEachRow = new int[numRows];
for (int i = 0; i < numRows; i++) {
if (i <= numRows / 2) {
numColumnsInEachRow[i] = i + 1;
} else {
numColumnsInEachRow[i] = numRows - i;
}
}
readComputeWrite();
}
int **createAndReturnJaggedArray() { // COMPLETE THIS FUNCTION
int **A = new int*[numRows];
for(int i=0;i<numRows;i++){ //allocate columns in each row
A[i] = new int[numColumnsInEachRow[i]];
for(int j=0;j<numColumnsInEachRow[i];j++){
if(i <= numRows/2)
A[i][j] = (i + j);
else
A[i][j] = -1 * (i+j);
}
}
return A;
}
void printAndThenFindMinMaxSum(int **A, int *maxPtr, int *minPtr, int *sumPtr) { // COMPLETE THIS FUNCTION
maxPtr = new int[INT_MIN];
minPtr = new int[INT_MAX];
sumPtr = 0;
for(int i=0;i<numRows;i++){
for(int j=0;j<numColumnsInEachRow[i];j++){
//1. print array
if (j == (numColumnsInEachRow[i]-1))
cout << A[i][j] << endl;
else
cout << A[i][j] << " ";
//2. compute max, min, and sum
sumPtr += A[i][j];
if (A[i][j] > *maxPtr)
maxPtr = new int[A[i][j]];
if (A[i][j] < *minPtr)
minPtr = new int[A[i][j]];
}
}
}
void print(int max, int min, int sum) {
cout << endl;
cout << "Max is " << max << "\n";
cout << "Min is " << min << "\n";
cout << "Sum is " << sum << "\n";
}
void readComputeWrite() {
int max, min, sum;
int **A = createAndReturnJaggedArray();
cout << "*** Jagged Array ***" << endl;
printAndThenFindMinMaxSum(A, &max, &min, &sum);
print(max, min, sum);
}
};
int main() {
JaggedArray jaf;
return 0;
}

As #user4581301 hints at, your problem is in printAndThenFindMinMaxSum. Simply changing it to the below solves your problem:
void printAndThenFindMinMaxSum(int **A, int &maxPtr, int &minPtr, int &sumPtr) { // COMPLETE THIS FUNCTION
maxPtr = INT_MIN;
minPtr = INT_MAX;
sumPtr = 0;
.
.
.
sumPtr += A[i][j];
if (A[i][j] > maxPtr)
maxPtr = A[i][j];
if (A[i][j] < minPtr)
minPtr = A[i][j];
}
}
}
We also need to change readComputeWrite to:
void readComputeWrite() {
int max, min, sum;
int **A = createAndReturnJaggedArray();
cout << "*** Jagged Array ***" << endl;
printAndThenFindMinMaxSum(A, max, min, sum);
print(max, min, sum);
}
I would also recommend changing the name minPtr, maxPtr, and sumPtr to something more appropriate, as they aren't pointer at this point and represent primitive values.
You will note, that I changed pointers to references as this is a more natural adaptation for this type of operation. Essentially, passing by reference allow the user to operate on the passed value in a straightforward manner without the tedious task of making sure you dereference things at the appropriate time. It also allows one to operate in a less error prone manner.
Again, as #user4581301 shrewdly points out, the intent of this assignment was probably to deal with pointers. As such, there are a few things that need to be changed if the OP cannot use references. Observe:
void printAndThenFindMinMaxSum(int **A, int *maxPtr, int *minPtr, int *sumPtr) { // COMPLETE THIS FUNCTION
*maxPtr = INT_MIN; // Make sure to deference before assigning
*minPtr = INT_MAX; // Make sure to deference before assigning
*sumPtr = 0; // Make sure to deference before assigning
for(int i=0;i<numRows;i++){
for(int j=0;j<numColumnsInEachRow[i];j++){
//1. print array
if (j == (numColumnsInEachRow[i]-1))
cout << A[i][j] << endl;
else
cout << A[i][j] << " ";
//2. compute max, min, and sum
*sumPtr += A[i][j]; // Make sure to deference before assigning
if (A[i][j] > *maxPtr) // Make sure to deference before comparing
*maxPtr = A[i][j]; // Make sure to deference before assigning
if (A[i][j] < *minPtr) // Make sure to deference before comparing
*minPtr = A[i][j]; // Make sure to deference before assigning
}
}
}
And the readComputeWrite can stay unaltered from the OP's original attempt.
In the OP's code, they are mainly forgetting to deference before assigning/comparing.

Error:No matching function for call to

I am very very new to C++ and I am trying to call the function "jacobi" which performs a user specified number of iterations for the jacobi method (or at least I hope so). On the line where I call 'jacobi' I get the error "No matching function to call to "jacobi". I have read other posts similar to this one and have tried to apply it to my own code but I have been unsuccessful. Maybe there are other issues in my code causing this problem. As mentioned I am very new C++ so any help would be appreciated and please break it down for me.
#include <iostream>
using namespace std;
void jacobi (int size, int max, int B[size], int A[size][size], int init[size], int x[size]){
////
//// JACOBI
////
int i,j,k,sum[size];
k = 1;
while (k <= max) // Only continue to max number of iterations
{
for (i = 0; i < size; i++)
{
sum[i] = B[i];
for (j = 0; j < size; j++)
{
if (i != j)
{
sum[i] = sum[i] - A[i][j] * init[j]; // summation
}
}
}
for (i = 0; i < size; i++) ////HERE LIES THE DIFFERENCE BETWEEN Guass-Seidel and Jacobi
{
x[i] = sum[i]/A[i][i]; // divide summation by a[i][i]
init[i] = x[i]; //use new_x(k+1) as init_x(k) for next iteration
}
k++;
}
cout << "Jacobi Approximation to "<<k-1<<" iterations is: \n";
for(i=0;i<size;i++)
{
cout <<x[i]<< "\n"; // print found approximation.
}
cout << "\n";
return;
}
int main (){
// User INPUT
// n: number of equations and unknowns
int n;
cout << "Enter the number of equations: \n";
cin >> n;
// Nmax: max number of iterations
int Nmax;
cout << "Enter max number of interations: \n";
cin >> Nmax;
// int tol;
// cout << "Enter the tolerance level: " ;
// cin >> tol;
// b[n] and a[n][n]: array of coefficients of 'A' and array of int 'b'
int b[n];
int i,j;
cout << "Enter 'b' of Ax = b, separated by a space: \n";
for (i = 0; i < n; i++)
{
cin >> b[i];
}
// user enters coefficients and builds matrix
int a[n][n];
int init_x[n],new_x[n];
cout << "Enter matrix coefficients or 'A' of Ax = b, by row and separate by a space: \n";
for (i = 0; i < n; i++)
{
init_x[i] = 0;
new_x[i] = 0;
for (j = 0; j < n; j++)
{
cin >> a[i][j];
}
}
jacobi (n, Nmax, b, a, init_x, new_x);
}

The problem:
There are several problems, related to the use of arrays:
You can't pass arrays as parameter by value.
You can't pass multidimensional arrays as parameter if the dimensions are variable
You can't define arrays of variable length in C++
Of course there are ways to do all these kind of things, but it uses different principles (dynamic allocation, use of pointers) and requires additional work (especially for the access of multidimensional array elements).
Fortunately, there is a much easier solution also !
The solution:
For this kind of code you should go for vector : these manage variable length and can be passed by value.
For the jacobi() function, all you have to do is to change its definition:
void jacobi(int size, int max, vector<int> B, vector<vector<int>> A, vector<int> init, vector<int> x) {
int i, j, k;
vector<int> sum(size); // vector of 'size' empty elements
// The rest of the function will work unchanged
...
}
Attention however: the vectors can be of variable size and this jacobio implementation assumes that all the vectors are of the expected size. In professional level code you should check that it's the case.
For the implementation of main(), the code is almost unchanged. All you have to do is to replace array definitions by vector definitions:
...
vector<int> b(n); // creates a vector that is initialized with n elements.
...
vector<vector<int>> a(n,vector<int>(n)); // same idea for 2 dimensional vector (i.e. a vector of vectors)
vector<int> init_x(n), new_x(n); // same principle as for b
...

c++ error: invalid types 'int[int]' for array subscript

Trying to learn C++ and working through a simple exercise on arrays.
Basically, I've created a multidimensional array and I want to create a function that prints out the values.
The commented for-loop within Main() works fine, but when I try to turn that for-loop into a function, it doesn't work and for the life of me, I cannot see why.
#include <iostream>
using namespace std;
void printArray(int theArray[], int numberOfRows, int numberOfColumns);
int main()
{
int sally[2][3] = {{2,3,4},{8,9,10}};
printArray(sally,2,3);
// for(int rows = 0; rows < 2; rows++){
// for(int columns = 0; columns < 3; columns++){
// cout << sally[rows][columns] << " ";
// }
// cout << endl;
// }
}
void printArray(int theArray[], int numberOfRows, int numberOfColumns){
for(int x = 0; x < numberOfRows; x++){
for(int y = 0; y < numberOfColumns; y++){
cout << theArray[x][y] << " ";
}
cout << endl;
}
}

C++ inherits its syntax from C, and tries hard to maintain backward compatibility where the syntax matches. So passing arrays works just like C: the length information is lost.
However, C++ does provide a way to automatically pass the length information, using a reference (no backward compatibility concerns, C has no references):
template<int numberOfRows, int numberOfColumns>
void printArray(int (&theArray)[numberOfRows][numberOfColumns])
{
for(int x = 0; x < numberOfRows; x++){
for(int y = 0; y < numberOfColumns; y++){
cout << theArray[x][y] << " ";
}
cout << endl;
}
}
Demonstration: http://ideone.com/MrYKz
Here's a variation that avoids the complicated array reference syntax: http://ideone.com/GVkxk
If the size is dynamic, you can't use either template version. You just need to know that C and C++ store array content in row-major order.
Code which works with variable size: http://ideone.com/kjHiR

Since theArray is multidimensional, you should specify the bounds of all its dimensions in the function prototype (except the first one):
void printArray(int theArray[][3], int numberOfRows, int numberOfColumns);

I'm aware of the date of this post, but just for completeness and perhaps for future reference, the following is another solution. Although C++ offers many standard-library facilities (see std::vector or std::array) that makes programmer life easier in cases like this compared to the built-in array intrinsic low-level concepts, if you need anyway to call your printArray like so:
printArray(sally, 2, 3);
you may redefine the function this way:
void printArray(int* theArray, int numberOfRows, int numberOfColumns){
for(int x = 0; x < numberOfRows; x++){
for(int y = 0; y < numberOfColumns; y++){
cout << theArray[x * numberOfColumns + y] << " ";
}
cout << endl;
}
}
In particular, note the first argument and the subscript operation:
the function takes a pointer, so you pass the name of the multidimensional array which also is the address to its first element.
within the subscript operation (theArray[x * numberOfColumns + y]) we access the sequential element thinking about the multidimensional array as an unique row array.

If you pass array as argument you must specify the size of dimensions except for the first dim. Compiler needs those to calculate the offset of each element in the array. Say you may let printArray like
void printArray(int theArray[][3], int numberOfRows, int numberOfColumns){
for(int x = 0; x < numberOfRows; x++){
for(int y = 0; y < numberOfColumns; y++){
cout << theArray[x][y] << " ";
}
cout << endl;
}
}

C++ passing Dynamically-sized 2D Array to function

I'm trying to figure out how to pass 2D array, which is constructed dynamically to a function.
I know that number of columns must be specified, but it my case it depends on user input.
Are there any workarounds?
Example:
// Some function
void function(matrix[i][j]) {
// do stuff
}
// Main function
int N;
cout << "Size: ";
cin >> N;
int matrix[N][N];
for (int i=0;i<N;i++) { //
for (int j=0;j<N;j++) {
cin >> matrix[N][N];
}
}
sort(matrix);
You get the idea :)

If you're on C++, the reasonable options are to:
use boost::multi_array (recommended), or
make your own 2D array class. Well, you don't have to, but encapsulating 2D array logic in a class is useful and makes the code clean.
Manual 2D array indexing would look like this:
void func(int* arrayData, int arrayWidth) {
// element (x,y) is under arrayData[x + y*arrayWidth]
}
But seriously, either wrap this with a class or enjoy that Boost already has that class ready for you. Indexing this manually is tiresome and makes the code more unclean and error-prone.
edit
http://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html says that C99 has one more solution for you:
void func(int len, int array[len][len]) {
// notice how the first parameter is used in the definition of second parameter
}
Should also work in C++ compilers, but I haven't ever used this approach.

In C++, the compiler can figure out the size, since it's part of the type. Won't work with dynamically sized matrices though.
template<size_t N, size_t M>
void function(int (&matrix)[N][M])
{
// do stuff
}
EDIT: In GCC only, which is required for your code defining the array, you can pass variable-length arrays directly:
void func(int N, int matrix[N][N])
{
//do stuff
}
See the gcc documentation

/*******************************************************\
* *
* I am not claiming to be an expert, but I think I know *
* a solution to this one. Try using a Vector Container *
* instead of an array. Here is an example below: *
* *
* Load the target file with a Multiplication Table *
* *
* *
\*******************************************************/
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
std::string user_file;
int user_size = 2;
void array_maker(int user_size, std::string user_file);
int main () {
std::cout << "Enter the name of the file for your data: ";
std::cin >> user_file;
std::cout << std::endl;
std::cout << "Enter the size for your Multiplication Table: ";
std::cin >> user_size;
// Create the users Multiplication data
array_maker(user_size, user_file);
return (0);
}
void array_maker(int user_size, std::string user_file)
{
// Open file to write data & add it to end of file
std::ofstream target_file(user_file,std::ios::out | std::ios::app);
// Declare the vector to use as a runtime sized array
std::vector<std::vector<int>> main_array;
// Initialize the size of the vector array
main_array.resize(user_size+1); // Outer Dimension
for (int i=0; i <= user_size; ++i) // Inner Dimension
{
main_array[i].resize(user_size+1);
}
for (int i=0; i<=user_size; ++i)
{
for (int j=0; j<=user_size; ++j)
{
main_array[i][j] = i * j;
// output line to current record in file
target_file << i << "*"
<< j << "="
<< main_array[i][j] << " "
<< "EOR" // End of Record
<< std::endl;
} // Close Inner For
} // Close Outer For
// close file
target_file.close();
} // Close array_maker function

You can do
void function (int** __matrix, int32_t __row, int32_t __column)
__row - max rows
__column - max columns.
You will need those params to find out the limits of the array.

Just add another parametrs to your function - row_number and column_number. Arrays are not object in C++ so they don't store any additional information about themselfs.

If you pass in the array identifier (as a pointer to a pointer) you will need to use pointer arithmetic:
void function(int** matrix, int num_rows, int num_cols) {
Assert(matrix!=NULL && *matrix!=NULL && num_rows>0 && num_cols>0);
for(int i=0; i<num_rows; i++) {
for(int j=0; j<num_cols; j++) {
// cannot index using [] like matrix[i][j]
// use pointer arithmetic instead like:
// *(matrix + i*num_cols + j)
}
}
}

to pass multi dimensional arays into method the compiler needs to know the depth of each field, so one solution is to use templates and call method in a normal way and the compiler will guess the size of each field.
template <size_t m>
void method(int M[][m])
{
for(int i=0; i<m; ++i)
for(int j=0; j<m; ++j)
{
// do funny stuff with M[i][j]
}
}
int main()
{
int M[5][5] = { {1,0,1,1,0}, {0,1,1,1,0}, {1,1,1,1,1}, {1,0,1,1,1}, {1,1,1,1,1} };
method(M);
// also you can call with method<5>(M)
// if you have different sizes for each dimension try passing them in args
return 0;
}

int r, c
int *matrix = new int[r,c];
for (int i = 0; i < r; i++)
{
/*cout << "Enter data" << endl;*/
for (int j = 0; j < c; j++)
{
cin >> matrix[i,j];
}
}

void function(int &matrix[][] )

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

matrix showing up empty when passed from cpp to CUDA - c++

Ha, needed a while to find it. Check the indexing in your matrix randomization code. :) You're using the wrong variables and never initialize the float values.

float * A is a pointer on host, not in device space. use cuda malloc+memcpy. float * A doesnt pass contents, only the address.

Related

how could I change memory layout of multi dimensional arrays with c++ [duplicate]

Problem creating and returning jagged array (error std::bad_array_new_length)

Error:No matching function for call to

c++ error: invalid types 'int[int]' for array subscript

C++ passing Dynamically-sized 2D Array to function

Categories

Resources