I have two 1d-arrays in arrayfire, x and y. I would like to divide them through each other, and create a 2d-array from the result, i.e. as shown in the following code:
#include <arrayfire>
int main(void){
const size_t x_len = 1024, y_len = 2048;
af::array x(x_len, f64), y(y_len, f64);
//Fill x, y with y \neq 0
//Now either
af::array xy(x_len, y_len, f64); //Gives a 2d-array
for(size_t i = 0; i < x.dims(0); ++i)
for(size_t j = 0; j < y.dims(0); ++j)
xy(i, j) = x(i) / y(j);
//or
af::array xy = x / y; //Gives a 1d-array
return 0;
}
The former approach gives me the targeted 2d-array, the latter approach does not (and will result in a crash if x_len != y_len. I could use the approach written above, but I assume that it will be significantly slower than specialized commands.
Therefore, are there such commands available in arrayfire, or do I have to use loops?
af::array xy = matmulNT(x, 1/y);
Related
(I didn't write this code. This is an existing piece of code from UC Berkeley's parallel algorithms homework which I should parallelize using Pthreads)
There is this piece of code in my Parallel Algorithms homework's int main() function:
particle_t *particles = (particle_t*) malloc( n * sizeof(particle_t) );
init_particles( n, particles );
vector<particle_t*> *bins = new vector<particle_t*>[numbins];
with particle_t being defined outside of the int main() as:
typedef struct
{
double x;
double y;
double vx;
double vy;
double ax;
double ay;
} particle_t;
and init_partciles() as :
void init_particles( int n, particle_t *p )
{
srand48( time( NULL ) );
int sx = (int)ceil(sqrt((double)n));
int sy = (n+sx-1)/sx;
int *shuffle = (int*)malloc( n * sizeof(int) );
for( int i = 0; i < n; i++ )
shuffle[i] = i;
for( int i = 0; i < n; i++ )
{
//
// make sure particles are not spatially sorted
//
int j = lrand48()%(n-i);
int k = shuffle[j];
shuffle[j] = shuffle[n-i-1];
//
// distribute particles evenly to ensure proper spacing
//
p[i].x = size*(1.+(k%sx))/(1+sx);
p[i].y = size*(1.+(k/sx))/(1+sy);
//
// assign random velocities within a bound
//
p[i].vx = drand48()*2-1;
p[i].vy = drand48()*2-1;
}
free( shuffle );
}
The piece of code I can't understand is here
// clear bins at each time step
for (int m = 0; m < numbins; m++)
bins[m].clear();
// place particles in bins
for (int i = 0; i < n; i++)
bins[binNum(particles[i],bpr)].push_back(particles + i);
As I understand, the cells of bins, are pointers to particle_t objects, and not vectors themselves. Unfortunately in the first loop, bins's cells have been treated as vectors themselves, because .clear() function has been used on them. In the second loop as well, bins's cells have been treated as vectors themselves by using .push_back() on them. Where is the misunderstanding? The code compiles successfully everytime.
I am attempting to load in a .mat file containing a tensor of known dimensions in C++; 144x192x256.
I have adjusted the linear index for the read operation to be column major as in MATLAB. However I am still getting memory access issues.
void FeatureLoader::readMat(const std::string &fname, Image< std::vector<float> > *out) {
//Read MAT file.
const char mode = 'r';
MATFile *matFile = matOpen(fname.c_str(), &mode);
if (matFile == NULL) {
throw std::runtime_error("Cannot read MAT file.");
}
//Copy the data from column major to row major storage.
float *newData = newImage->GetData();
const mxArray *arr = matGetVariable(matFile, "map");
if (arr == NULL) {
throw std::runtime_error("Cannot read variable.");
}
double *arrData = (double*)mxGetPr(arr);
#pragma omp parallel for
for (int i = 0; i < 144; i++) {
#pragma omp parallel for
for (int j = 0; j < 192; j++) {
for (int k = 0; k < 256; k++) {
int rowMajIdx = (i * 192 + j) * 256 + k;
int colMajIdx = (j * 144 + i) * 256 + k;
newData[rowMajIdx] = static_cast<float>(arrData[colMajIdx]);
}
}
}
}
In the above snippet, am I right to be accessing the data linearly as with a flattened 3D array in C++? For example:-
idx_row_major = (x*WIDTH + y)*DEPTH + z
idx_col_major = (y*HEIGHT + x)*DEPTH + z
Is this the underlying representation that MATLAB uses?
You have some errors in the indexing of the row mayor and column mayor Idx. Additionally, naively accessing the data can lead to very slow times due to random memory access (memory latency is key! Read more here).
The best way to pass from MATLAB to C++ types (From 3D to 1D) is following the example below.
In this example we illustrate how to take a double real-type 3D matrix from MATLAB, and pass it to a C double* array.
The main objectives of this example are showing how to obtain data from MATLAB MEX arrays and to highlight some small details in matrix storage and handling.
matrixIn.cpp
#include "mex.h"
void mexFunction(int nlhs , mxArray *plhs[],
int nrhs, mxArray const *prhs[]){
// check amount of inputs
if (nrhs!=1) {
mexErrMsgIdAndTxt("matrixIn:InvalidInput", "Invalid number of inputs to MEX file.");
}
// check type of input
if( !mxIsDouble(prhs[0]) || mxIsComplex(prhs[0])){
mexErrMsgIdAndTxt("matrixIn:InvalidType", "Input matrix must be a double, non-complex array.");
}
// extract the data
double const * const matrixAux= static_cast<double const *>(mxGetData(prhs[0]));
// Get matrix size
const mwSize *sizeInputMatrix= mxGetDimensions(prhs[0]);
// allocate array in C. Note: its 1D array, not 3D even if our input is 3D
double* matrixInC= (double*)malloc(sizeInputMatrix[0] *sizeInputMatrix[1] *sizeInputMatrix[2]* sizeof(double));
// MATLAB is column major, not row major (as C). We need to reorder the numbers
// Basically permutes dimensions
// NOTE: the ordering of the loops is optimized for fastest memory access!
// This improves the speed in about 300%
const int size0 = sizeInputMatrix[0]; // Const makes compiler optimization kick in
const int size1 = sizeInputMatrix[1];
const int size2 = sizeInputMatrix[2];
for (int j = 0; j < size2; j++)
{
int jOffset = j*size0*size1; // this saves re-computation time
for (int k = 0; k < size0; k++)
{
int kOffset = k*size1; // this saves re-computation time
for (int i = 0; i < size1; i++)
{
int iOffset = i*size0;
matrixInC[i + jOffset + kOffset] = matrixAux[iOffset + jOffset + k];
}
}
}
// we are done!
// Use your C matrix here
// free memory
free(matrixInC);
return;
}
The relevant concepts to be aware of:
MATLAB matrices are all 1D in memory, no matter how many dimensions they have when used in MATLAB. This is also true for most (if not all) main matrix representation in C/C++ libraries, as allows optimization and faster execution.
You need to explicitly copy matrices from MATLAB to C in a loop.
MATLAB matrices are stored in column major order, as in Fortran, but C/C++ and most modern languages are row major. It is important to permute the input matrix , or else the data will look completely different.
The relevant function in this example are:
mxIsDouble checks if input is double type.
mxIsComplex checks if input is real or imaginary.
mxGetData returns a pointer to the real data in the input array. NULL if there is no real data.
mxGetDimensions returns an pointer to a mwSize array, with the size of the dimension in each index.
I have a square matrix double **A
I know how to iterate through this matrix:
for (int i = 0; i < MATRIX_SIZE; i++) {
for (int j = 0; j < MATRIX_SIZE; j ++) {
int val = A[i][j];
printf("val: %d\n", val);
}
}
However, I'm wondering how I can assign an entire row or column to a variable given I have this ** matrix (The ** pointer to pointer nonsense is still a little confusing. I believe its saying a list of list of doubles).
To add a little more background, I'm trying to extract rows and columns so I can perform a cuda matrix multiplication. I see a lot of documentation online that uses one-dimensional vectors to represent matrices (i.e. double* A) However, I am getting confused with the **
A two-dimensional array of doubles (double **) can be looked at as a one-dimensional array of one-dimensional arrays of doubles.
double **arr; // properly initialized
for(int rowNumber = 0; rowNumber < MATRIX_SIZE; ++rowNumber)
{
double *row = arr[rowNumber];
// do something with this row
for(int colNumber = 0; colNumber < MATRIX_SIZE; ++colNumber)
{
double value = row[colNumber];
// do something with value
}
}
In the above example, row is a pointer to a contiguous row of values from the initial array. This works because a two dimensional array is usually allocated like this:
double **arr = new double*[ROW_COUNT];
for(int rowNumber = 0; rowNumber < ROW_COUNT; ++rowNumber)
{
arr[rowNumber] = new double[COL_COUNT];
}
Getting a pointer to a column in the matrix (like we did with row above) is not possible because the values in a column are not contiguous, only the values in each row are contiguous.
In C++, you can use std::array
std::array< std::array<int, MATRIX_SIZE>, MATRIX_SIZE> A;
std::array<int, MATRIX_SIZE> ith_row = A[i];
std::array<int, MATRIX_SIZE> &ith_row_ref = A[i];
A[i][j] is a type of int, but A[i] is a int pointer, so if you want get a row to a variable, you can do this:
for (int i = 0; i < MATRIX_SIZE; i ++) {
int* val = A[i];
for (int j = 0; j < MATRIX_SIZE; j ++) {
printf("%d\n", val[j]);
}
}
but you can't get a column to a variable.
You can assign rows to variables easily but you can't assign columns because of the way that memory is laid out.
You can think of double pointers like this.
The first pointer points to the item to give you the row.
I'm going to make a 3w 4c matrix to show you an example
Theoretical (How you should think about it in your head)
Your first double pointer
p
|
V 0 1 2 <-indexes
0 [p1]->[1,2,3]
1 [p2]->[0,2,3]
2 [p3]->[1,0,3]
3 [p4]->[1,2,0]
which corresponds to the matrix
1,2,3
0,2,1
1,0,3
1,2,0
So you can thinking about getting the 0 at index (1,0) as
int **p = //some place that holds the matrix;
int *row2 = p[1];
int value = p[0];
The reason why it's not as straightforward as declaring a two dimensional
array is because when get the double pointer, you're not sure of the layout of the memory. The numbers could be stored like this
p1 p3 p2 p4
| | | |
[123103021120...] <- //this is basically RAM or "memory"
and you would have no idea as the programmer.
I hope this cleared some things up!
This question already has answers here:
how to use memset for double dimentional array?
(2 answers)
Closed 9 years ago.
What is the fastest way to set a 2-dim array of double,such as double x[N][N] all to -1?
I tried to use memset, but failed. Any good idea?
Use: std::fill_n from algorithm
std::fill_n(*array, sizeof(array) / sizeof (**array), -1 );
Example:
double array[10][10];
std::fill_n( *array, sizeof(array) / sizeof (**array), -1.0 );
//Display Matrix
for(auto i=0;i<10;i++)
{
for(auto j=0;j<10;j++)
cout<<array[i][j]<< " ";
cout<<endl;
}
A simple loop:
#include <stdio.h>
int main(void)
{
#define N 5
double x[N][N];
size_t i, n = sizeof(x) / sizeof(double);
for (i = 0; i < n; i++)
x[0][i] = -1.0;
for (i = 0; i < n; i++)
printf("%zu) %f\n", i, x[0][i]);
}
// create constants
const int rows = 10;
const int columns = 10;
// declare a 2D array
double myArray [rows][columns];
// run a double loop to fill up the array
for (int i = 0; i < rows; i++)
for (int k = 0; k < columns; k++)
myArray[rows][columns] = -1.0;
// print out the results
for (int i = 0; i < rows; i++) {
for (int k = 0; k < columns; k++)
cout << myArray[rows][columns];
cout << endl;
}
Also you can set directly
double x[4][4] = {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}
if the array index is small.
Using std::array and its fill method:
#include <array>
#include <iostream>
int main()
{
const std::size_t N=4
std::array<double, N*N> arr; // better to keep the memory 1D and access 2D!
arr.fill(-1.);
for(auto element : arr)
std::cout << element << '\n';
}
Using C++ containers you can use the fill method
array<array<double, 1024>, 1024> matrix;
matrix.fill(-1.0);
if, for some reason, you have to stick with C-style arrays you can initialize the first row manually and then memcpy to the other rows. This works regardless if you have defined it as static array or allocated row by row.
const int rows = 1024;
const int cols = 1024;
double matrix[rows][cols]
for ( int i=0; i<cols; ++i)
{
matrix[0][cols] = -1.0;
}
for ( int r=1; r<rows; ++r)
{
// use the previous row as source to have it cache friendly for large matrices
memcpy(&(void*)(matrix[row][0]), &(void*)(matrix[row-1][0]), cols*sizeof(double));
}
But I rather would try to move from C style arrays to the C++ containers than doing that kind of stunt.
memset shouldn't be used here because it is based on void *. So all bytes in are the same. (float) -1 is 0xbf800000 (double 0xbff0000000000000) so not all bytes are the same...
I would use manual filling:
const int m = 1024;
const int n = 1024;
double arr[m][n];
for (size_t i = 0; i < m*n; i++)
arr[i] = -1;
Matrix is like array in memory, so better to have 1 loop, it slightly faster.
Or you can use this:
std::fill_n(arr, m*n, -1);
Not sure which one is faster, but both looks similar. So probably you'll need to make small test to find it out, but as far as I know people usually use one or another. And another thing first one is more C on some compiler it won't work and second is real C++ it and never works on C. So you should choose by the programming language I think :)
I have a rather unexpected issue with one of my functions. Let me explain.
I'm writing a calibration algorithm and since I want to do some grid search (non-continuous optimization), I'm creating my own mesh - different combinations of probabilities.
The size of the grid and the grid itself are computed recursively (I know...).
So in order:
Get variables
Compute corresponding size recursively
Allocate memory for the grid
Pass the empty grid by reference and fill it recursively
The problem I have is after step 4 once I try to retrieve this grid. During step 4, I 'print' on the console the results to check them and everything is fine. I computed several grids with several variables and they all match the results I'm expecting. However, as soon as the grid is taken out of the recursive function, the last column is filled with 0 (all the values from before are replace in this column only).
I tried allocating one extra column for the grid in step 3 but this only made the problem worse (-3e303 etc. values). Also I have the error no matter what size I compute it with (very small to very large), so I assume it isn't a memory error (or at least a 'lack of memory' error). Finally the two functions used and their call have been listed below, this has been quickly programmed, so some variables might seem kind of useless - I know. However I'm always open to your comments (plus I'm no expert in C++ - hence this thread).
void size_Grid_Computation(int nVars, int endPoint, int consideredVariable, int * indexes, int &sum, int nChoices)
{
/** Remember to initialize r at 1 !! - we exclude var_0 and var_(m-1) (first and last variables) in this algorithm **/
int endPoint2 = 0;
if (consideredVariable < nVars - 2)
{
for (indexes[consideredVariable] = 0; indexes[consideredVariable] < endPoint; indexes[consideredVariable] ++)
{
endPoint2 = endPoint - indexes[consideredVariable];
size_Grid_Computation(nVars, endPoint2, consideredVariable + 1, indexes, sum, nChoices);
}
}
else
{
for (int i = 0; i < nVars - 2; i++)
{
sum -= indexes[i];
}
sum += nChoices;
return;
}
}
The above function is for the grid size. Below for the grid itself -
void grid_Creation(double* choicesVector, double** varVector, int consideredVariable, int * indexes, int endPoint, int nVars, int &r)
{
if (consideredVariable > nVars-1)
return;
for (indexes[consideredVariable] = 0; indexes[consideredVariable] < endPoint; indexes[consideredVariable]++)
{
if (consideredVariable == nVars - 1)
{
double sum = 0.0;
for (int j = 0; j <= consideredVariable; j++)
{
varVector[r][j] = choicesVector[indexes[j]];
sum += varVector[r][j];
printf("%lf\t", varVector[r][j]);
}
varVector[r][nVars - 1] = 1 - sum;
printf("%lf row %d\n", varVector[r][nVars - 1],r+1);
r += 1;
}
grid_Creation(choicesVector, varVector, consideredVariable + 1, indexes, endPoint - indexes[consideredVariable], nVars, r);
}
}
Finally the call
#include <stdio.h>
#include <stdlib.h>
int main()
{
int nVars = 5;
int gridPrecision = 3;
int sum1 = 0;
int r = 0;
int size = 0;
int * index, * indexes;
index = (int *) calloc(nVars - 1, sizeof(int));
indexes = (int *) calloc(nVars, sizeof(int));
for (index[0] = 0; index[0] < gridPrecision + 1; index[0] ++)
{
size_Grid_Computation(nVars, gridPrecision + 1 - index[0], 1, index, size, gridPrecision + 1);
}
double * Y;
Y = (double *) calloc(gridPrecision + 1, sizeof(double));
for (int i = 0; i <= gridPrecision; i++)
{
Y[i] = (double) i/ (double) gridPrecision;
}
double ** varVector;
varVector = (double **) calloc(size, sizeof(double *));
for (int i = 0; i < size; i++)
{
varVector[i] = (double *) calloc(nVars, sizeof(double *));
}
grid_Creation(Y, varVector, 0, indexes, gridPrecision + 1, nVars - 1, r);
for (int i = 0; i < size; i++)
{
printf("%lf\n", varVector[i][nVars - 1]);
}
}
I left my barbarian 'printf', they help narrow down the problem. Most likely, I have forgotten or butchered one memory allocation. But I can't see which one. Anyway, thanks for the help!
It seems to me that you have a principal mis-design, namely your 2D array. What you are programming here is not a 2D array but an emulation of it. It only makes sense if you want to have a sort of sparse data structure where you may leave out parts. In your case it looks as if it is just a plain old matrix that you need.
Nowadays it is neither appropriate in C nor in C++ to program like this.
In C, since that seems what you are after, inside functions you declare matrices even with dynamic bounds as
double A[n][m];
If you fear that this could smash your "stack", you may allocate it dynamically
double (*B)[m] = malloc(sizeof(double[n][m]));
You pass such beasts to functions by putting the bounds first in the parameter list
void toto(size_t n, size_t m, double X[n][m]) {
...
}
Once you have clean and readable code, you will find your bug much easier.