(I didn't write this code. This is an existing piece of code from UC Berkeley's parallel algorithms homework which I should parallelize using Pthreads)
There is this piece of code in my Parallel Algorithms homework's int main() function:
particle_t *particles = (particle_t*) malloc( n * sizeof(particle_t) );
init_particles( n, particles );
vector<particle_t*> *bins = new vector<particle_t*>[numbins];
with particle_t being defined outside of the int main() as:
typedef struct
{
double x;
double y;
double vx;
double vy;
double ax;
double ay;
} particle_t;
and init_partciles() as :
void init_particles( int n, particle_t *p )
{
srand48( time( NULL ) );
int sx = (int)ceil(sqrt((double)n));
int sy = (n+sx-1)/sx;
int *shuffle = (int*)malloc( n * sizeof(int) );
for( int i = 0; i < n; i++ )
shuffle[i] = i;
for( int i = 0; i < n; i++ )
{
//
// make sure particles are not spatially sorted
//
int j = lrand48()%(n-i);
int k = shuffle[j];
shuffle[j] = shuffle[n-i-1];
//
// distribute particles evenly to ensure proper spacing
//
p[i].x = size*(1.+(k%sx))/(1+sx);
p[i].y = size*(1.+(k/sx))/(1+sy);
//
// assign random velocities within a bound
//
p[i].vx = drand48()*2-1;
p[i].vy = drand48()*2-1;
}
free( shuffle );
}
The piece of code I can't understand is here
// clear bins at each time step
for (int m = 0; m < numbins; m++)
bins[m].clear();
// place particles in bins
for (int i = 0; i < n; i++)
bins[binNum(particles[i],bpr)].push_back(particles + i);
As I understand, the cells of bins, are pointers to particle_t objects, and not vectors themselves. Unfortunately in the first loop, bins's cells have been treated as vectors themselves, because .clear() function has been used on them. In the second loop as well, bins's cells have been treated as vectors themselves by using .push_back() on them. Where is the misunderstanding? The code compiles successfully everytime.
Related
I have two 1d-arrays in arrayfire, x and y. I would like to divide them through each other, and create a 2d-array from the result, i.e. as shown in the following code:
#include <arrayfire>
int main(void){
const size_t x_len = 1024, y_len = 2048;
af::array x(x_len, f64), y(y_len, f64);
//Fill x, y with y \neq 0
//Now either
af::array xy(x_len, y_len, f64); //Gives a 2d-array
for(size_t i = 0; i < x.dims(0); ++i)
for(size_t j = 0; j < y.dims(0); ++j)
xy(i, j) = x(i) / y(j);
//or
af::array xy = x / y; //Gives a 1d-array
return 0;
}
The former approach gives me the targeted 2d-array, the latter approach does not (and will result in a crash if x_len != y_len. I could use the approach written above, but I assume that it will be significantly slower than specialized commands.
Therefore, are there such commands available in arrayfire, or do I have to use loops?
af::array xy = matmulNT(x, 1/y);
I have a pointer to a 3-dimensional array, like this:
char ***_cube3d
And I am initialising it like this:
_cube3d = (char ***)malloc(size * (sizeof(char**)));
for (int i = 0; i< size; i++) {
_cube3d[i] = (char **)malloc(size * sizeof(char*));
for (int j = 0; j<size; j++) {
_cube3d[i][j] = (char *)malloc(size * sizeof(char));
}
}
Note that the array is of dynamic size, and can contain thousands of elements, so we cannot declare it as an array in advance.
Now, I want to copy all of its contents into another array, as efficiently as possible. I know the nested loop solution where we copy each element one by one, however, it seems extremely inefficient to me. Is there a way to speed this process up? C++ code is welcome, although I would prefer it in plain C, since I am planning to iterate this solution into Objective C, and I would like to avoid injecting C++ code into a clean Objective C project.
Can anyone point me in the right direction?
Using what you already have (but fixing the first malloc with sizeof(char***))
You could copy the array by running a bunch of for loops like this:
char new_cube[side][side][side];
for(unsigned int x = 0; x < side; x++)
for(unsigned int y = 0; y < side; y++)
for(unsigned int z = 0; z < side; z++)
new_cube[x][y][z] = old_cube[x][y][z];
OR:
char new_cube[side][side][side];
for(unsigned int x = 0; x < side; x++)
for(unsigned int y = 0; y < side; y++)
memcpy(new_cude[x][y], old_cube[x][y], sizeof(char)*side);
which might be a bit faster.
using this method you avoid using any c++(as you said you would like) and your code complexity is kept minimal.
If you are using C.99, you can use a variable length array (VLA) to dynamically allocate your 3-dimensional array. Once side is determined, you can declare your pointer to be:
char (*cube3d_)[side][side];
And then initialize it like this:
cube3d_ = malloc(side * sizeof(*cube3d_));
Note that in C, you are not required to cast the return value of malloc(), and doing so can actually lead to undefined behavior in the worst case. Since the "cube" has been allocated as a contiguous block, it can be copied with memcpy().
C++ does not have VLA. You can use a vector to get the C++ equivalent of your multi-dynamic allocation structure:
std::vector<std::vector<std::vector<char> > >
cube3d_(side, std::vector<std::vector<char> >(side, std::vector<char>(side)));
You can then copy it using a copy constructor or an assignment.
If cube3d_ is a member variable of an object/structure, so long as your object knows the value of side, you can still use a VLA pointer to access the memory. For example:
struct Obj {
size_t side_;
void *cube3d_;
};
//...
size_t side = 3;
//...
Obj o;
o.side_ = side;
char (*p)[o.side_][o.side_] = malloc(o.side_ * sizeof(*p));
o.cube3d_ = p;
//...
char (*q)[o.side_][o.side_] = o.cube3d_;
q[1][2][2] = 'a';
Here is an approach using C and structs to provide some degree of object oriented along with a set of helper functions.
The idea here was to use Kerrick's suggestion of a contiguous array.
I am not sure if I got the offset calculation correct and it has not been tested so it is worth what you are paying for it. However it may be helpful as a starting place.
The idea is to have a single contiguous area of memory to make memory management easier. And to use a function to access a particular element using a zero based offset in the x, y, and z directions. And since I was not sure as to the element size/type, I made that a variable as well.
#include <malloc.h>
typedef struct _Array3d {
int elSize; // size of each element of the array in bytes
int side; // length of each side of the 3d cube in elements
char * (*Access) (struct _Array3d *pObj, int x, int y, int z);
char buffer[1];
} Array3d;
static char * Array3d_Access (Array3d *pObj, int x, int y, int z)
{
char *pBuf = NULL;
if (pObj && x < pObj->side && y < pObj->side && z < pObj->side) {
pBuf = &(pObj->buffer[x * pObj->side * pObj->elSize * pObj->side * pObj->elSize + y * pObj->side * pObj->elSize + z * pObj->elSize]);
}
return pBuf;
}
// Create an Array3d cube by specifying the length of each side along with the size of each element.
Array3d *Array3d_Factory (int side, int elSize)
{
Array3d *pBuffer = malloc (sizeof(Array3d) + side * elSize * side * elSize * side * elSize);
if (pBuffer) {
pBuffer->elSize = elSize;
pBuffer->side = side;
pBuffer->Access = Array3d_Access;
}
return pBuffer;
}
// Create an Array3d cube that is the same size as an existing Array3d cube.
Array3d *Array3d_FactoryObj (Array3d *pObj)
{
Array3d *pBuffer = NULL;
if (pObj) {
int iBufferSize = pObj->side * pObj->elSize * pObj->side * pObj->elSize * pObj->side * pObj->elSize;
pBuffer = malloc (sizeof(Array3d) + iBufferSize);
if (pBuffer) {
pBuffer->elSize = pObj->elSize;
pBuffer->side = pObj->side;
pBuffer->Access = pObj->Access;
}
}
return pBuffer;
}
// Duplicate or clone an existing Array3d cube into new one.
// Returns NULL if cloning did not happen.
Array3d *Array3d_Dup (Array3d *pObjDest, Array3d *pObjSrc)
{
if (pObjSrc && pObjDest && pObjSrc->elSize == pObjDest->elSize && pObjSrc->side == pObjDest->side) {
int iBufferSize = pObjSrc->side * pObjSrc->elSize * pObjSrc->side * pObjSrc->elSize * pObjSrc->side * pObjSrc->elSize;
memcpy (pObjDest->buffer, pObjSrc->buffer, iBufferSize);
} else {
pObjDest = NULL;
}
return pObjDest;
}
int main(int argc, _TCHAR* argv[])
{
Array3d *pObj = Array3d_Factory (10, 20 * sizeof(char));
char *pChar = pObj->Access (pObj, 1, 2, 3);
return 0;
}
This question already has answers here:
how to use memset for double dimentional array?
(2 answers)
Closed 9 years ago.
What is the fastest way to set a 2-dim array of double,such as double x[N][N] all to -1?
I tried to use memset, but failed. Any good idea?
Use: std::fill_n from algorithm
std::fill_n(*array, sizeof(array) / sizeof (**array), -1 );
Example:
double array[10][10];
std::fill_n( *array, sizeof(array) / sizeof (**array), -1.0 );
//Display Matrix
for(auto i=0;i<10;i++)
{
for(auto j=0;j<10;j++)
cout<<array[i][j]<< " ";
cout<<endl;
}
A simple loop:
#include <stdio.h>
int main(void)
{
#define N 5
double x[N][N];
size_t i, n = sizeof(x) / sizeof(double);
for (i = 0; i < n; i++)
x[0][i] = -1.0;
for (i = 0; i < n; i++)
printf("%zu) %f\n", i, x[0][i]);
}
// create constants
const int rows = 10;
const int columns = 10;
// declare a 2D array
double myArray [rows][columns];
// run a double loop to fill up the array
for (int i = 0; i < rows; i++)
for (int k = 0; k < columns; k++)
myArray[rows][columns] = -1.0;
// print out the results
for (int i = 0; i < rows; i++) {
for (int k = 0; k < columns; k++)
cout << myArray[rows][columns];
cout << endl;
}
Also you can set directly
double x[4][4] = {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}
if the array index is small.
Using std::array and its fill method:
#include <array>
#include <iostream>
int main()
{
const std::size_t N=4
std::array<double, N*N> arr; // better to keep the memory 1D and access 2D!
arr.fill(-1.);
for(auto element : arr)
std::cout << element << '\n';
}
Using C++ containers you can use the fill method
array<array<double, 1024>, 1024> matrix;
matrix.fill(-1.0);
if, for some reason, you have to stick with C-style arrays you can initialize the first row manually and then memcpy to the other rows. This works regardless if you have defined it as static array or allocated row by row.
const int rows = 1024;
const int cols = 1024;
double matrix[rows][cols]
for ( int i=0; i<cols; ++i)
{
matrix[0][cols] = -1.0;
}
for ( int r=1; r<rows; ++r)
{
// use the previous row as source to have it cache friendly for large matrices
memcpy(&(void*)(matrix[row][0]), &(void*)(matrix[row-1][0]), cols*sizeof(double));
}
But I rather would try to move from C style arrays to the C++ containers than doing that kind of stunt.
memset shouldn't be used here because it is based on void *. So all bytes in are the same. (float) -1 is 0xbf800000 (double 0xbff0000000000000) so not all bytes are the same...
I would use manual filling:
const int m = 1024;
const int n = 1024;
double arr[m][n];
for (size_t i = 0; i < m*n; i++)
arr[i] = -1;
Matrix is like array in memory, so better to have 1 loop, it slightly faster.
Or you can use this:
std::fill_n(arr, m*n, -1);
Not sure which one is faster, but both looks similar. So probably you'll need to make small test to find it out, but as far as I know people usually use one or another. And another thing first one is more C on some compiler it won't work and second is real C++ it and never works on C. So you should choose by the programming language I think :)
I have a rather unexpected issue with one of my functions. Let me explain.
I'm writing a calibration algorithm and since I want to do some grid search (non-continuous optimization), I'm creating my own mesh - different combinations of probabilities.
The size of the grid and the grid itself are computed recursively (I know...).
So in order:
Get variables
Compute corresponding size recursively
Allocate memory for the grid
Pass the empty grid by reference and fill it recursively
The problem I have is after step 4 once I try to retrieve this grid. During step 4, I 'print' on the console the results to check them and everything is fine. I computed several grids with several variables and they all match the results I'm expecting. However, as soon as the grid is taken out of the recursive function, the last column is filled with 0 (all the values from before are replace in this column only).
I tried allocating one extra column for the grid in step 3 but this only made the problem worse (-3e303 etc. values). Also I have the error no matter what size I compute it with (very small to very large), so I assume it isn't a memory error (or at least a 'lack of memory' error). Finally the two functions used and their call have been listed below, this has been quickly programmed, so some variables might seem kind of useless - I know. However I'm always open to your comments (plus I'm no expert in C++ - hence this thread).
void size_Grid_Computation(int nVars, int endPoint, int consideredVariable, int * indexes, int &sum, int nChoices)
{
/** Remember to initialize r at 1 !! - we exclude var_0 and var_(m-1) (first and last variables) in this algorithm **/
int endPoint2 = 0;
if (consideredVariable < nVars - 2)
{
for (indexes[consideredVariable] = 0; indexes[consideredVariable] < endPoint; indexes[consideredVariable] ++)
{
endPoint2 = endPoint - indexes[consideredVariable];
size_Grid_Computation(nVars, endPoint2, consideredVariable + 1, indexes, sum, nChoices);
}
}
else
{
for (int i = 0; i < nVars - 2; i++)
{
sum -= indexes[i];
}
sum += nChoices;
return;
}
}
The above function is for the grid size. Below for the grid itself -
void grid_Creation(double* choicesVector, double** varVector, int consideredVariable, int * indexes, int endPoint, int nVars, int &r)
{
if (consideredVariable > nVars-1)
return;
for (indexes[consideredVariable] = 0; indexes[consideredVariable] < endPoint; indexes[consideredVariable]++)
{
if (consideredVariable == nVars - 1)
{
double sum = 0.0;
for (int j = 0; j <= consideredVariable; j++)
{
varVector[r][j] = choicesVector[indexes[j]];
sum += varVector[r][j];
printf("%lf\t", varVector[r][j]);
}
varVector[r][nVars - 1] = 1 - sum;
printf("%lf row %d\n", varVector[r][nVars - 1],r+1);
r += 1;
}
grid_Creation(choicesVector, varVector, consideredVariable + 1, indexes, endPoint - indexes[consideredVariable], nVars, r);
}
}
Finally the call
#include <stdio.h>
#include <stdlib.h>
int main()
{
int nVars = 5;
int gridPrecision = 3;
int sum1 = 0;
int r = 0;
int size = 0;
int * index, * indexes;
index = (int *) calloc(nVars - 1, sizeof(int));
indexes = (int *) calloc(nVars, sizeof(int));
for (index[0] = 0; index[0] < gridPrecision + 1; index[0] ++)
{
size_Grid_Computation(nVars, gridPrecision + 1 - index[0], 1, index, size, gridPrecision + 1);
}
double * Y;
Y = (double *) calloc(gridPrecision + 1, sizeof(double));
for (int i = 0; i <= gridPrecision; i++)
{
Y[i] = (double) i/ (double) gridPrecision;
}
double ** varVector;
varVector = (double **) calloc(size, sizeof(double *));
for (int i = 0; i < size; i++)
{
varVector[i] = (double *) calloc(nVars, sizeof(double *));
}
grid_Creation(Y, varVector, 0, indexes, gridPrecision + 1, nVars - 1, r);
for (int i = 0; i < size; i++)
{
printf("%lf\n", varVector[i][nVars - 1]);
}
}
I left my barbarian 'printf', they help narrow down the problem. Most likely, I have forgotten or butchered one memory allocation. But I can't see which one. Anyway, thanks for the help!
It seems to me that you have a principal mis-design, namely your 2D array. What you are programming here is not a 2D array but an emulation of it. It only makes sense if you want to have a sort of sparse data structure where you may leave out parts. In your case it looks as if it is just a plain old matrix that you need.
Nowadays it is neither appropriate in C nor in C++ to program like this.
In C, since that seems what you are after, inside functions you declare matrices even with dynamic bounds as
double A[n][m];
If you fear that this could smash your "stack", you may allocate it dynamically
double (*B)[m] = malloc(sizeof(double[n][m]));
You pass such beasts to functions by putting the bounds first in the parameter list
void toto(size_t n, size_t m, double X[n][m]) {
...
}
Once you have clean and readable code, you will find your bug much easier.
I'm currently working on finding the sum of squared distances of two matricies, the data is held in double* arrays. the first of them stays the same while the other is cycled through using a function that returns a 32x32 array between two indices.
However when i try and call "getTile(d,e)" after the first incrementation of "e" it throws a heap corruption exception:
double* Matrix::ssd(int i, int j, Matrix& rhs){
double sum = 0, val = 0; int g = 0, h=0;
double* bestMatch = new double[32*32]; double* sameTile = new double[32*32]; double* changeTile = new double[32*32];
for(int x = i-32; x <i; x++){
for(int y = j-32; y <j; y++){
sameTile[g*32+h] = data[x*N+y];
h++;
}g++; h = 0;
}
system("pause");
for(int d = 32; d<=512; d+=32){
for(int e = 32; e<=512; e+=32){
changeTile = rhs.getTile(d,e);
for(int out = 0; out < 32; out++){
for(int in = 0; in < 32; in++){
val = sameTile[out*32+in] - changeTile[out*32+in];
val = val*val;
sum = sum + val;
}
}
cout << sum << endl;
sum = 0; val = 0;
system("pause");
}
}
The getTile(int i, int j) function:
double* Matrix::getTile(int i, int j){
double* tile = new double[32*32]; int g = 0; int h = 0;
for(int x=i-32; x<i; x++){
for(int y=j-32; y<j; y++){
tile[g*32+h] = data[x*N+y];
h++;
}
cout << endl;
g++;
}
return tile;
}
I believe the error occurs with the allocation of memory in the changeTile double*?
Any help would be very much appreciated.
There are a bunch of issues in your code all related to improperly accessing array elements.
In the first loop the line:
sameTile[g*32+h] = data[x*N+y];
at the very least underflows the data array. Consider if i=0, j=0, and N=512 then you are trying to access data[-16416] in the first pass of the loop.
Second issue is the getTile() method where you forget to reset h to 0 at the end of the inner loop (like you do in the ssd() method). This results in the overflow of tile[]
I would also double-check the line:
changeTile = rhs.getTile(d, e);
and the method getTile() to ensure an array overflow doesn't occur on data[].
Overall I would suggest using proper std:: containers if at all possible. Using them correctly should completely eliminate this type of error. If you really do need to use raw pointers/arrays then you need to make sure all your indexing into them is as clear as possible in addition to bounds checking where needed.