2D MemoryView to C Pointer Error (1D works, but 2D doesnt) - c++

I was able to get pointers for 1D memoryviews using this StackOverflow question, but applying the same method to 2D memoryviews gives me a " Cannot assign type 'double *' to 'double **'" error.
cdef extern from "dgesvd.h" nogil:
void dgesvd(double **A, int m, int n, double *S, double **U, double **VT)
cdef:
double[:] S
double[:,:] A, U, VT
A = np.ascontiguousarray(np.zeros((N,N)))
S = np.zeros(N)
U = np.zeros(N)
VT = np.zeros(N)
dgesvd(&A[0,0], N, N, &S[0], &U[0], &VT[0])
EDIT: I got it to compile by doing
So I got it to compile successfully by doing:
cdef:
double[:] S
double[:,:] A, U, VT
U = np.zeros((N,N))
VT = np.zeros((N,N))
A = np.zeros((N,N))
S = np.zeros(N)
A_p = <double *> malloc(sizeof(double) * N)
U_p = <double *> malloc(sizeof(double) * N)
VT_p = <double *> malloc(sizeof(double) * N)
for i in range(N):
A_p = &A[i, 0]
U_p = &U[i, 0]
VT_p = &VT[i, 0]
dgesvd(&A_p, N, N, &S[0], &U_p, &VT_p)
free(A_p)
free(U_p)
free(VT_p)
BUT I get a segfault when I try to run it, so I probably did this wrong.
Here are the contents of "dgesvd.h" (I did not write it, but I know it works):
/*
This file has my implementation of the LAPACK routine dgesdd for
C++. This program solves for the singular value decomposition of a
rectangular matrix A. The function call is of the form
void dgesdd(double **A, int m, int n, double *S, double *U, double *VT)
A: the m by n matrix that we are decomposing
m: the number of rows in A
n: the number of columns in A (generally, n<m)
S: a min(m,n) element array to hold the singular values of A
U: a [m, min(m,n)] element rectangular array to hold the right
singular vectors of A. These vectors will be the columns of U,
so that U[i][j] is the ith element of vector j.
VT: a [min(m,n), n] element rectangular array to hold the left
singular vectors of A. These vectors will be the rows of VT
(it is a transpose of the vector matrix), so that VT[i][j] is
the jth element of vector i.
Note that S, U, and VT must be initialized before calling this
routine, or there will be an error. Here is a quick sample piece of
code to perform this initialization; in many cases, it can be lifted
right from here into your program.
S = new double[minmn];
U = new double*[m]; for (int i=0; i<m; i++) U[i] = new double[minmn];
VT = new double*[minmn]; for (int i=0; i<minmn; i++) VT[i] = new double[n];
Scot Shaw
24 January 2000 */
void dgesvd(double **A, int m, int n, double *S, double **U, double **VT);
double *dgesvd_ctof(double **in, int rows, int cols);
void dgesvd_ftoc(double *in, double **out, int rows, int cols);
extern "C" void dgesvd_(char *jobu, char *jobvt, int *m, int *n,
double *a, int *lda, double *s, double *u,
int *ldu, double *vt, int *ldvt, double *work,
int *lwork, int *info);
void dgesvd(double **A, int m, int n, double *S, double **U, double **VT)
{
char jobu, jobvt;
int lda, ldu, ldvt, lwork, info;
double *a, *u, *vt, *work;
int minmn, maxmn;
jobu = 'S'; /* Specifies options for computing U.
A: all M columns of U are returned in array U;
S: the first min(m,n) columns of U (the left
singular vectors) are returned in the array U;
O: the first min(m,n) columns of U (the left
singular vectors) are overwritten on the array A;
N: no columns of U (no left singular vectors) are
computed. */
jobvt = 'S'; /* Specifies options for computing VT.
A: all N rows of V**T are returned in the array
VT;
S: the first min(m,n) rows of V**T (the right
singular vectors) are returned in the array VT;
O: the first min(m,n) rows of V**T (the right
singular vectors) are overwritten on the array A;
N: no rows of V**T (no right singular vectors) are
computed. */
lda = m; // The leading dimension of the matrix a.
a = dgesvd_ctof(A, lda, n); /* Convert the matrix A from double pointer
C form to single pointer Fortran form. */
ldu = m;
/* Since A is not a square matrix, we have to make some decisions
based on which dimension is shorter. */
if (m>=n) { minmn = n; maxmn = m; } else { minmn = m; maxmn = n; }
ldu = m; // Left singular vector matrix
u = new double[ldu*minmn];
ldvt = minmn; // Right singular vector matrix
vt = new double[ldvt*n];
lwork = 5*maxmn; // Set up the work array, larger than needed.
work = new double[lwork];
dgesvd_(&jobu, &jobvt, &m, &n, a, &lda, S, u,
&ldu, vt, &ldvt, work, &lwork, &info);
dgesvd_ftoc(u, U, ldu, minmn);
dgesvd_ftoc(vt, VT, ldvt, n);
delete a;
delete u;
delete vt;
delete work;
}
double* dgesvd_ctof(double **in, int rows, int cols)
{
double *out;
int i, j;
out = new double[rows*cols];
for (i=0; i<rows; i++) for (j=0; j<cols; j++) out[i+j*rows] = in[i][j];
return(out);
}
void dgesvd_ftoc(double *in, double **out, int rows, int cols)
{
int i, j;
for (i=0; i<rows; i++) for (j=0; j<cols; j++) out[i][j] = in[i+j*rows];
}

You don't want to be using the "pointer-to-pointer" form. All the Cython/numpy arrays are stored as a single continuous array together with a few length parameters to let it do 2D access. You're probably best replicating the dgesvd wrapper in Cython (to allocate the working arrays, but not do the ftoc or ctof conversions).
I've had a go, below, but it's untested so there may be bugs. It's more for the gist of what to do than to be copied outright.
def dgesvd(double [:,:] A):
"""All sizes implicit in A, returns a tuple of U S V"""
# start by ensuring we have Fortran style ordering
cdef double[::1, :] A_f = A.copy_fortran()
# work out the sizes - it's possible I've got this the wrong way round!
cdef int m = A.shape[0]
cdef int n = A.shape[1]
cdef char jobu[] = 'S'
cdef char jobvt[] = 'S'
cdef double[::1,:] U
cdef double[::1,:] Vt
cdef double[::1] S
cdef double[::1] work
cdef int minnm, maxnm
cdef int info, lwork, ldu, ldvt
if m>=n:
minmn = n
maxmn = m
else:
minmn = m
maxmn = n
ldu = m;
U = np.array((ldu,minmn), order='F')
ldvt = minmn
Vt = np.array((ldvt,n), order='F')
S = np.array((minmn,)) # not absolutely sure - check this!
lwork = 5*maxmn
work = np.array((lwork,))
dgesvd_(&jobu, &jobvt, &m, &n, &A_f[0,0], &lda, &S[0], &U[0],
&ldu, &Vt[0,0], &ldvt, &work[0], &lwork, &info);
return U, S, Vt.T # transpose Vt on the way out

The way you call dgesdd is not consistent with its prototype. Apart from that, this should work. See, for instance, this example, that performs the dgemm call from Cython in a similar way.
Also note, that Scipy 0.16, will include a Cython API for BLAS/LAPACK, and it will probably be the best approach in the future.

Related

how could I change memory layout of multi dimensional arrays with c++ [duplicate]

I have a matrix (relatively big) that I need to transpose. For example assume that my matrix is
a b c d e f
g h i j k l
m n o p q r
I want the result be as follows:
a g m
b h n
c I o
d j p
e k q
f l r
What is the fastest way to do this?
This is a good question. There are many reason you would want to actually transpose the matrix in memory rather than just swap coordinates, e.g. in matrix multiplication and Gaussian smearing.
First let me list one of the functions I use for the transpose (EDIT: please see the end of my answer where I found a much faster solution)
void transpose(float *src, float *dst, const int N, const int M) {
#pragma omp parallel for
for(int n = 0; n<N*M; n++) {
int i = n/N;
int j = n%N;
dst[n] = src[M*j + i];
}
}
Now let's see why the transpose is useful. Consider matrix multiplication C = A*B. We could do it this way.
for(int i=0; i<N; i++) {
for(int j=0; j<K; j++) {
float tmp = 0;
for(int l=0; l<M; l++) {
tmp += A[M*i+l]*B[K*l+j];
}
C[K*i + j] = tmp;
}
}
That way, however, is going to have a lot of cache misses. A much faster solution is to take the transpose of B first
transpose(B);
for(int i=0; i<N; i++) {
for(int j=0; j<K; j++) {
float tmp = 0;
for(int l=0; l<M; l++) {
tmp += A[M*i+l]*B[K*j+l];
}
C[K*i + j] = tmp;
}
}
transpose(B);
Matrix multiplication is O(n^3) and the transpose is O(n^2), so taking the transpose should have a negligible effect on the computation time (for large n). In matrix multiplication loop tiling is even more effective than taking the transpose but that's much more complicated.
I wish I knew a faster way to do the transpose (Edit: I found a faster solution, see the end of my answer). When Haswell/AVX2 comes out in a few weeks it will have a gather function. I don't know if that will be helpful in this case but I could image gathering a column and writing out a row. Maybe it will make the transpose unnecessary.
For Gaussian smearing what you do is smear horizontally and then smear vertically. But smearing vertically has the cache problem so what you do is
Smear image horizontally
transpose output
Smear output horizontally
transpose output
Here is a paper by Intel explaining that
http://software.intel.com/en-us/articles/iir-gaussian-blur-filter-implementation-using-intel-advanced-vector-extensions
Lastly, what I actually do in matrix multiplication (and in Gaussian smearing) is not take exactly the transpose but take the transpose in widths of a certain vector size (e.g. 4 or 8 for SSE/AVX). Here is the function I use
void reorder_matrix(const float* A, float* B, const int N, const int M, const int vec_size) {
#pragma omp parallel for
for(int n=0; n<M*N; n++) {
int k = vec_size*(n/N/vec_size);
int i = (n/vec_size)%N;
int j = n%vec_size;
B[n] = A[M*i + k + j];
}
}
EDIT:
I tried several function to find the fastest transpose for large matrices. In the end the fastest result is to use loop blocking with block_size=16 (Edit: I found a faster solution using SSE and loop blocking - see below). This code works for any NxM matrix (i.e. the matrix does not have to be square).
inline void transpose_scalar_block(float *A, float *B, const int lda, const int ldb, const int block_size) {
#pragma omp parallel for
for(int i=0; i<block_size; i++) {
for(int j=0; j<block_size; j++) {
B[j*ldb + i] = A[i*lda +j];
}
}
}
inline void transpose_block(float *A, float *B, const int n, const int m, const int lda, const int ldb, const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
transpose_scalar_block(&A[i*lda +j], &B[j*ldb + i], lda, ldb, block_size);
}
}
}
The values lda and ldb are the width of the matrix. These need to be multiples of the block size. To find the values and allocate the memory for e.g. a 3000x1001 matrix I do something like this
#define ROUND_UP(x, s) (((x)+((s)-1)) & -(s))
const int n = 3000;
const int m = 1001;
int lda = ROUND_UP(m, 16);
int ldb = ROUND_UP(n, 16);
float *A = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
float *B = (float*)_mm_malloc(sizeof(float)*lda*ldb, 64);
For 3000x1001 this returns ldb = 3008 and lda = 1008
Edit:
I found an even faster solution using SSE intrinsics:
inline void transpose4x4_SSE(float *A, float *B, const int lda, const int ldb) {
__m128 row1 = _mm_load_ps(&A[0*lda]);
__m128 row2 = _mm_load_ps(&A[1*lda]);
__m128 row3 = _mm_load_ps(&A[2*lda]);
__m128 row4 = _mm_load_ps(&A[3*lda]);
_MM_TRANSPOSE4_PS(row1, row2, row3, row4);
_mm_store_ps(&B[0*ldb], row1);
_mm_store_ps(&B[1*ldb], row2);
_mm_store_ps(&B[2*ldb], row3);
_mm_store_ps(&B[3*ldb], row4);
}
inline void transpose_block_SSE4x4(float *A, float *B, const int n, const int m, const int lda, const int ldb ,const int block_size) {
#pragma omp parallel for
for(int i=0; i<n; i+=block_size) {
for(int j=0; j<m; j+=block_size) {
int max_i2 = i+block_size < n ? i + block_size : n;
int max_j2 = j+block_size < m ? j + block_size : m;
for(int i2=i; i2<max_i2; i2+=4) {
for(int j2=j; j2<max_j2; j2+=4) {
transpose4x4_SSE(&A[i2*lda +j2], &B[j2*ldb + i2], lda, ldb);
}
}
}
}
}
This is going to depend on your application but in general the fastest way to transpose a matrix would be to invert your coordinates when you do a look up, then you do not have to actually move any data.
Some details about transposing 4x4 square float (I will discuss 32-bit integer later) matrices with x86 hardware. It's helpful to start here in order to transpose larger square matrices such as 8x8 or 16x16.
_MM_TRANSPOSE4_PS(r0, r1, r2, r3) is implemented differently by different compilers. GCC and ICC (I have not checked Clang) use unpcklps, unpckhps, unpcklpd, unpckhpd whereas MSVC uses only shufps. We can actually combine these two approaches together like this.
t0 = _mm_unpacklo_ps(r0, r1);
t1 = _mm_unpackhi_ps(r0, r1);
t2 = _mm_unpacklo_ps(r2, r3);
t3 = _mm_unpackhi_ps(r2, r3);
r0 = _mm_shuffle_ps(t0,t2, 0x44);
r1 = _mm_shuffle_ps(t0,t2, 0xEE);
r2 = _mm_shuffle_ps(t1,t3, 0x44);
r3 = _mm_shuffle_ps(t1,t3, 0xEE);
One interesting observation is that two shuffles can be converted to one shuffle and two blends (SSE4.1) like this.
t0 = _mm_unpacklo_ps(r0, r1);
t1 = _mm_unpackhi_ps(r0, r1);
t2 = _mm_unpacklo_ps(r2, r3);
t3 = _mm_unpackhi_ps(r2, r3);
v = _mm_shuffle_ps(t0,t2, 0x4E);
r0 = _mm_blend_ps(t0,v, 0xC);
r1 = _mm_blend_ps(t2,v, 0x3);
v = _mm_shuffle_ps(t1,t3, 0x4E);
r2 = _mm_blend_ps(t1,v, 0xC);
r3 = _mm_blend_ps(t3,v, 0x3);
This effectively converted 4 shuffles into 2 shuffles and 4 blends. This uses 2 more instructions than the implementation of GCC, ICC, and MSVC. The advantage is that it reduces port pressure which may have a benefit in some circumstances.
Currently all the shuffles and unpacks can go only to one particular port whereas the blends can go to either of two different ports.
I tried using 8 shuffles like MSVC and converting that into 4 shuffles + 8 blends but it did not work. I still had to use 4 unpacks.
I used this same technique for a 8x8 float transpose (see towards the end of that answer).
https://stackoverflow.com/a/25627536/2542702. In that answer I still had to use 8 unpacks but I manged to convert the 8 shuffles into 4 shuffles and 8 blends.
For 32-bit integers there is nothing like shufps (except for 128-bit shuffles with AVX512) so it can only be implemented with unpacks which I don't think can be convert to blends (efficiently). With AVX512 vshufi32x4 acts effectively like shufps except for 128-bit lanes of 4 integers instead of 32-bit floats so this same technique might be possibly with vshufi32x4 in some cases. With Knights Landing shuffles are four times slower (throughput) than blends.
If the size of the arrays are known prior then we could use the union to our help. Like this-
#include <bits/stdc++.h>
using namespace std;
union ua{
int arr[2][3];
int brr[3][2];
};
int main() {
union ua uav;
int karr[2][3] = {{1,2,3},{4,5,6}};
memcpy(uav.arr,karr,sizeof(karr));
for (int i=0;i<3;i++)
{
for (int j=0;j<2;j++)
cout<<uav.brr[i][j]<<" ";
cout<<'\n';
}
return 0;
}
Consider each row as a column, and each column as a row .. use j,i instead of i,j
demo: http://ideone.com/lvsxKZ
#include <iostream>
using namespace std;
int main ()
{
char A [3][3] =
{
{ 'a', 'b', 'c' },
{ 'd', 'e', 'f' },
{ 'g', 'h', 'i' }
};
cout << "A = " << endl << endl;
// print matrix A
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++) cout << A[i][j];
cout << endl;
}
cout << endl << "A transpose = " << endl << endl;
// print A transpose
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++) cout << A[j][i];
cout << endl;
}
return 0;
}
transposing without any overhead (class not complete):
class Matrix{
double *data; //suppose this will point to data
double _get1(int i, int j){return data[i*M+j];} //used to access normally
double _get2(int i, int j){return data[j*N+i];} //used when transposed
public:
int M, N; //dimensions
double (*get_p)(int, int); //functor to access elements
Matrix(int _M,int _N):M(_M), N(_N){
//allocate data
get_p=&Matrix::_get1; // initialised with normal access
}
double get(int i, int j){
//there should be a way to directly use get_p to call. but i think even this
//doesnt incur overhead because it is inline and the compiler should be intelligent
//enough to remove the extra call
return (this->*get_p)(i,j);
}
void transpose(){ //twice transpose gives the original
if(get_p==&Matrix::get1) get_p=&Matrix::_get2;
else get_p==&Matrix::_get1;
swap(M,N);
}
}
can be used like this:
Matrix M(100,200);
double x=M.get(17,45);
M.transpose();
x=M.get(17,45); // = original M(45,17)
of course I didn't bother with the memory management here, which is crucial but different topic.
template <class T>
void transpose( const std::vector< std::vector<T> > & a,
std::vector< std::vector<T> > & b,
int width, int height)
{
for (int i = 0; i < width; i++)
{
for (int j = 0; j < height; j++)
{
b[j][i] = a[i][j];
}
}
}
Modern linear algebra libraries include optimized versions of the most common operations. Many of them include dynamic CPU dispatch, which chooses the best implementation for the hardware at program execution time (without compromising on portability).
This is commonly a better alternative to performing manual optimization of your functinos via vector extensions intrinsic functions. The latter will tie your implementation to a particular hardware vendor and model: if you decide to swap to a different vendor (e.g. Power, ARM) or to a newer vector extensions (e.g. AVX512), you will need to re-implement it again to get the most of them.
MKL transposition, for example, includes the BLAS extensions function imatcopy. You can find it in other implementations such as OpenBLAS as well:
#include <mkl.h>
void transpose( float* a, int n, int m ) {
const char row_major = 'R';
const char transpose = 'T';
const float alpha = 1.0f;
mkl_simatcopy (row_major, transpose, n, m, alpha, a, n, n);
}
For a C++ project, you can make use of the Armadillo C++:
#include <armadillo>
void transpose( arma::mat &matrix ) {
arma::inplace_trans(matrix);
}
intel mkl suggests in-place and out-of-place transposition/copying matrices. here is the link to the documentation. I would recommend trying out of place implementation as faster ten in-place and into the documentation of the latest version of mkl contains some mistakes.
I think that most fast way should not taking higher than O(n^2) also in this way you can use just O(1) space :
the way to do that is to swap in pairs because when you transpose a matrix then what you do is: M[i][j]=M[j][i] , so store M[i][j] in temp, then M[i][j]=M[j][i],and the last step : M[j][i]=temp. this could be done by one pass so it should take O(n^2)
my answer is transposed of 3x3 matrix
#include<iostream.h>
#include<math.h>
main()
{
int a[3][3];
int b[3];
cout<<"You must give us an array 3x3 and then we will give you Transposed it "<<endl;
for(int i=0;i<3;i++)
{
for(int j=0;j<3;j++)
{
cout<<"Enter a["<<i<<"]["<<j<<"]: ";
cin>>a[i][j];
}
}
cout<<"Matrix you entered is :"<<endl;
for (int e = 0 ; e < 3 ; e++ )
{
for ( int f = 0 ; f < 3 ; f++ )
cout << a[e][f] << "\t";
cout << endl;
}
cout<<"\nTransposed of matrix you entered is :"<<endl;
for (int c = 0 ; c < 3 ; c++ )
{
for ( int d = 0 ; d < 3 ; d++ )
cout << a[d][c] << "\t";
cout << endl;
}
return 0;
}

CUDA: Fill matrix with results of summation

I need to fill a matrix with values returned from function below
__device__ float calc(float *ar, int m, float sum, int i, int j)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < m)
{
ar[idx] = __powf(ar[idx], i + j);
atomicAdd(&sum, ar[idx]);
}
return sum;
}
Matrix set up as one dimensional array and fills up through this function
__global__ void createMatrix(float *A, float *arr, int size)
{
A[threadIdx.y*size + threadIdx.x] = /*some number*/;
}
In theory it should be something like this
__global__ void createMatrix(float *A, float *arr, int size)
{
float sum = 0;
A[threadIdx.y*size + threadIdx.x] = calc(arr, size, sum, threadIdx.x, threadIdx.y);
}
but it doesn't work that way, calc always returns 0. Is there any way I can fill matrix using global function? Thanks in advance.
You're passing sum by value rather than by reference. So all of your atomicAdd()'s have no effect on the zero-initialized value in the kernel.
However, even if you were to pass it by reference, this would still be a poorly-designed kernel. You see, you don't need the atomics if you have a per-thread sum variable (which you do). Also, your calc() function only adds a value once to each sum value, while it seems you expect it to add more than once.

gsl lu decomposition and inversion for float matrix

Due to memory limit, I need to use gsl_matrix_float instead of gsl_matrix which stores data of type double. However, I want to use gsl_linalg_LU_decomp and gsl_linalg_LU_invert which only support gsl_matrix. And I did not find some other method which support the float version decomposition and inversion in gsl.
Is there any way to solve this dilemma? Or I can only transfer from float to double and then back? Thanks in advance!
The best you can probably do is, as you suggest, convert from float to double and back. Here is example code to perform the inversion (only the essential components are given - you have to fill in the blanks):
include <gsl/gsl_blas.h>
include <gsl/gsl_linalg.h>
void matrix_invert(gsl_matrix_float *, gsl_matrix_float *, int);
int main()
{
gsl_matrix_float *X = gsl_matrix_float_alloc(N, N);
gsl_matrix_float *invX = gsl_matrix_float_alloc(N, N);
matrix_invert(X, invX, N); //invM = inv(I)
return 0;
}
void matrix_invert(gsl_matrix_float *matrix, gsl_matrix_float *inverse, int N)
{
int i=0,j=0,signum=0;
gsl_matrix *DM = gsl_matrix_alloc(N, N);
gsl_matrix *DM_I = gsl_matrix_alloc(N, N);
for (i=0;i<N;i++)
for (j=0;j<N;j++)
gsl_matrix_set(DM, i, j, gsl_matrix_float_get(matrix,i,j));
gsl_permutation *p = gsl_permutation_alloc(N);
gsl_linalg_LU_decomp(DM, p, &signum);
gsl_linalg_LU_invert(DM, p, DM_I);
gsl_permutation_free(p);
gsl_matrix_free(DM);
for (i=0;i<N;i++)
for (j=0;j<N;j++)
gsl_matrix_float_set(inverse, i, j, gsl_matrix_get(DM_I,i,j));
}

c++ error :: EXC_BAD_ACCESS for pointer arrays

I keep getting the error message, exc_bad_access code=1 for my line
asize = *(***(y) + **(y + 1));
in the summation function. I dont quite understand what to do with this error, but i know that it is not a memory leak.
I am trying to get the values stored in the y pointer array, add them, and store it in the variable asize.
void allocArr (int **&x, int ***&y, int **&q, int ****&z)
{
x = new int *[2];
y = new int **(&*x);
q = &*x;
z = new int ***(&q);
}
void putArr(int **&x, int &size1, int &size2)
{
*(x) = *new int* [size1];
*(x + 1) = *new int* [size2];
}
void Input (int **&x, int *&arr, int &size1,int &size2, int a, int b)
{
cout << "Please enter 2 non-negative integer values: "<< endl;
checkVal(size1, a);
checkVal(size2, b);
putArr(x, size1, size2);
arr[0] = size1;
arr[1] = size2;
cout << x[0];
}
void summation(int ***&y, int *&arr)
{
int asize = 0;
asize = *(***(y) + **(y + 1));
**y[2] = *new int [asize];
*(arr + 2) = asize;
}
int main()
{
int size1, size2;
int a = 1, b = 2;
int** x;
int*** y;
int** q;
int**** z;
int *arr = new int [2];
allocArr(x, y, q, z);
Input(x, arr, size1, size2, a, b);
summation(y, arr);
display(z);
}
Thank you for the help. Im really struggling here...
Not sure how you got started with the code. The code can be simplified quite a bit to help you, and readers of your code, understand what's going on.
Function allocArr
The lines
y = new int **(&*x);
q = &*x;
can be
y = new int **(x); // &*x == x
q = x;
Function putArr
You have the function declaration as:
void putArr(int **&x, int &size1, int &size2)
It can be changed to:
void putArr(int **x, int size1, int size2)
without changing how you are using the variables.
Your code in the function seems strange. Did you mean for x[0] and x[1] to point to an array of size1 and size2 ints, respectively? If you did, the code would be:
x[0] = new int[size1];
x[1] = new int[size2];
If you don't mean the above, it's hard to figure out what you are trying to do with your code.
Function Input
You have the function declaration as:
void Input (int **&x, int *&arr, int &size1,int &size2, int a, int b)
It can be changed to:
void Input (int **x, int *arr, int &size1,int &size2, int a, int b)
without changing how you are using the variables.
You are calling a function checkVal, but your posted code doesn't have that function. It's not clear what that function is doing. You have the line
cout << "Please enter 2 non-negative integer values: "<< endl;
just before the calls to checkVal. Presumably, checkVal reads the input and stores them in size1 in the first call and size2 in the second call. It's not clear how the second argument to checkVal is used.
And then, you have the line:
cout << x[0];
It's not clear what you wish to accomplish from printing an int* to cout. Perhaps it was part of your debugging code. The line doesn't change anything else in the program. It's just strange to see it there.
Function summation
You have the function declaration as:
void summation(int ***&y, int *&arr)
It can be changed to:
void summation(int ***y, int *arr)
without changing how you are using the variables.
In this function, you have the expression:
asize = *(***(y) + **(y + 1));
What do you get when you evaluate ***(y)?
***(y) = **(*y) = **(x) = *(*x) = *(x[0]) = uninitialized value from the line:
x[0] = new int[size1];
You will get unpredictable behavior when you use an uninitialized value.
The second term of the line, **(y + 1) is the worse culprit.
You allocated memory for y as:
y = new int **(&*x);
It's a pointer to a single object of type int**, not an array. y+1 is not a valid pointer. Dereferencing (y+1) leads to undefined behavior. In your case, you are seeing exc_bad_access, which makes sense now since you are accessing memory that is out of bounds.
Since I don't know what you are trying to compute in that expression, it's hard for me to suggest something useful. I hope you have enough to take it from here.

Weird Behavior when using Eigen

I am writing a wrapper to Eigen for my personal use and I encountered the following weird behavior:
void get_QR(MatrixXd A, MatrixXd& Q, MatrixXd& R) {
HouseholderQR<MatrixXd> qr(A);
Q = qr.householderQ()*(MatrixXd::Identity(A.rows(),A.cols()));
R = qr.matrixQR().block(0,0,A.cols(),A.cols()).triangularView<Upper>();
}
void get_QR(double* A, int m, int n, double*& Q, double*& R) {
// Maps the double to MatrixXd.
Map<MatrixXd> A_E(A, m, n);
// Obtains the QR of A_E.
MatrixXd Q_E, R_E;
get_QR(A_E, Q_E, R_E);
// Maps the MatrixXd to double.
Q = Q_E.data();
R = R_E.data();
}
Below is the test:
int main(int argc, char* argv[]) {
srand(time(NULL));
int m = atoi(argv[1]);
int n = atoi(argv[2]);
// Check the double version.
double* A = new double[m*n];
double* Q;
double* R;
double RANDMAX = double(RAND_MAX);
// Initialize A as a random matrix.
for (int index=0; index<m*n; ++index) {
A[index] = rand()/RANDMAX;
}
get_QR(A, m, n, Q, R);
std::cout << Q[0] << std::endl;
// Check the MatrixXd version.
Map<MatrixXd> A_E(A, m, n);
MatrixXd Q_E(m,n), R_E(n,n);
get_QR(A_E, Q_E, R_E);
std::cout << Q[0] << std::endl;
}
I get different values of Q[0]. For instance, I get "-0.421857" and "-1.49563".
Thanks
The answer of George is correct but suffers from unnecessary copies. A better solution consists in mapping Q and R:
void get_QR(const double* A, int m, int n, double*& Q, double*& R) {
Map<const MatrixXd> A_E(A, m, n);
Map<MatrixXd> Q_E(Q, m, n);
Map<MatrixXd> R_E(Q, n, n);
HouseholderQR<MatrixXd> qr(A_E);
Q_E = qr.householderQ()*(MatrixXd::Identity(m,n));
R_E = qr.matrixQR().block(0,0,n,n).triangularView<Upper>();
}
In order to be able to reuse the get_QR function taking Eigen's object, then use Ref<MatrixXd> instead of MatrixXd:
void get_QR(Ref<const MatrixXd> A, Ref<MatrixXd> Q, Ref<MatrixXd> R) {
HouseholderQR<MatrixXd> qr(A);
Q = qr.householderQ()*(MatrixXd::Identity(A.rows(),A.cols()));
R = qr.matrixQR().block(0,0,A.cols(),A.cols()).triangularView<Upper>();
}
void get_QR(const double* A, int m, int n, double* Q, double* R) {
Map<const MatrixXd> A_E(A, m, n);
Map<MatrixXd> Q_E(Q, m, n);
Map<MatrixXd> R_E(R, n, n);
get_QR(A_E, Q_E, R_E);
}
The Ref<MatrixXd> can wrap any Eigen's object that is similar to a MatrixXd without any copy. This include MatrixXd itself, as well as Map and Block expressions. This requires Eigen 3.2.
I don't think it has anything to do with Eigen.
It looks like you are assigning pointer Q to a memory location belonging to a local variable Q_E
Q = Q_E.data();
which leaks the previous memory allocation
double* Q = new double[m*n];
and is meaningless or undefined outside of the get_QR() function.
You should use memcpy instead:
memcpy(Q, Q_E.data(), m*n*sizeof(double));