MKL dgemv - Invalid memory access for dynamically allocated arrays

MKL dgemv - Invalid memory access for dynamically allocated arrays - c++

I do a blas matrix/vector product with this simple code:
#include "mkl.h"
#include <stdio.h>
int main(){
const int M = 2;
const int N = 3;
double *x = new double[N];
double *A = new double[M*N];
double *b = new double[M];
for (int i = 0; i < M; i++){
b[i] = 0.0; //Not necessary but anyway...
for (int j = 0; j < N; j++){
A[j * M + i] = i + j * 2;
}
}
for (int j = 0; j < N; j++)
x[j] = j*j;
const int incr = 1;
const double alpha = 1.0;
const double beta = 0.0;
const char no = 'N';
dgemv(&no, &M, &N, &alpha, A, &M, x, &incr, &beta, b, &incr );
printf("b = [%e %e]'\n",b[0],b[1]);
delete[] x;
delete[] A;
delete[] b;
}
While the displayed result is as expected ([18, 23]), Intel Inspector finds one invalid memory access and 2 invalid partial memory access when calling dgemv. The invalid memory access and one invalid partial memory access are related to memory allocated corresponding to the vector b. The second invalid partial memory access is related with the memory allocated for A. I do not get any error if I use a static array.
It also happens with other MKL functions, such as dgesv or when I try to use cblas_dgemv. I use Intel Inspector XE 2016 and intel C++ Compiler 16.0 with MKL sequential.
Is my dgemv call wrong, or is that a false positive. Anyone experienced that?
Thanks
EDIT:
As suggested by Josh Milthorpe: the error appears only on small-size arrays, probably because MKL is trying to access memory in large chunks for efficiency.
I did several tests, and M needs to be at least 20 in order to not get an error. N can be any positive number. I suppose that this is not a bug, and MKL is just accessing memory outside of the allocated space for the matrix, but does not alter or really use it.

Related

double free or corruption (out) while deallocating memory in c++

I have a function which returns a 2D array in c++ as follows
float** Input_data(float** train_data, int Nv, int N){
float** x_train=new float*[Nv];
int a = 0,b = 0;
for(a = 1;a<= Nv;a++){
x_train[a] = new float[N+1];
for(b = 1; b <= N+1; b++){
if(b == 1){
x_train[a][b] = 1;
}else{
x_train[a][b] = train_data[a][b-1];
}
}return x_train;}
the purpose of the above code is to add ones in the first column and add remaining data from train_data pointer into x_train. after processing and using x_train i am trying to deallocate x_train as follows
void destroyx_array(float**x_train,int Nv){
for (int free_x = 1;free_x <= Nv;free_x++){
delete[] x_train[free_x];}delete[] x_train;}
and calling the destroy function as follows
destroyx_array(x_train,Nv)
the Input_data functions works fine but when i try to destroy_x_array it gives me double free or corruption(out) aborted (core dumped) can anybody explain what wrong i am doing ? thank you

Simply put, your code corrupts memory. The best thing is to not use raw pointers and instead use container classes such as std::vector.
Having said that, to fix your current code, the issue is that you're writing beyond the bounds of the memory here:
for(a = 1;a<= Nv;a++)
when a == Nv, you are writing one "row" beyond what was allocated. This looks like a manifestation of attempting to fake 1-based arrays. Arrays in C++ start from 0, not 1. Trying to fake 1-based arrays invariably can lead to bugs and memory corruption.
The fix is to rewrite your function to start from 0, not 1, and ensure your loop iterates to n-1, where n is the total number of rows:
for (a = 0; a < Nv; ++a)
the purpose of the above code is to add ones in the first column and
add remaining data from train_data pointer into x_train
Instead of the loop you wrote to test for the first column, you could simplify this by simply using memcpy:
for (int i = 0; i < Nv; ++i)
{
x_train[i][0] = 1;
memcpy(&x_train[i][1], &train_data[i][0], N * sizeof(float));
}
Thus the entire function would look like this:
float** Input_data(float** train_data, int Nv, int N)
{
float** x_train=new float*[Nv];
for(int a = 0; a < Nv; a++)
x_train[a] = new float[N+1];
for (int a = 0; a < Nv; a++)
{
x_train[i][0] = 1;
memcpy(&x_train[i][1], &train_data[i][0], N * sizeof(float));
}
return x_train;
}

Incorrect checksum for freed object - error when printing

I have made a class that's supposed to make a symmetric toeplitz matrix (see here). The implementation of the class is shown here
class toeplitz{
private:
int size;
double* matrix;
public:
toeplitz(const double* array, const int dim){
size = dim;
matrix = new double(size*size);
for(int i = 0; i < size; i++){
for (int j = 0; j < size; j++){
int index = std::abs(i - j);
matrix[i*size + j] = array[index];
}
}
}
~toeplitz(){
delete[] matrix;
}
void print() const{
//loop over rows
for (int i = 0; i < size; i++){
//loop over colums
for (int j = 0; j < size; j++){
double out = matrix[i*size + j];
std::cout << std::setw(4) << out;
}
//start new line for each row
std::cout << "\n";
}
}
};
I can't see what's wrong with this, but when I try and use this in a simple test function, I get malloc errors. The main function I have is
int main(){
double array[] = {0,1,1,2};
int len = sizeof(array)/sizeof(array[0]);
std::cout<<"length of array " << len << std::endl;
toeplitz tp = toeplitz(array, len);
tp.print();
}
It compiles and runs when I leave out the tp.print() line, but when I add this line I get error
test_toeplitz(8747,0x7fffdbee63c0) malloc: *** error for object 0x7fb119402788:
incorrect checksum for
freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
I cannot figure out why this is. I've looked at the other questions about this on here but I can't tell how they relate to what I've done. As I understand it has to do with either double freeing memory or trying to modify memory after it's been freed, but I can't see where my code is doing that. Any insight into what's going on would be appreciated.

You stumbled on the classical:
matrix = new double(size*size);
which allocates a double worth size*size when you wanted to do:
matrix = new double[size*size];
to allocate an array of the proper size. So you get undefined behaviour. Sometimes it works sometimes not depending on the memory configuration.
Since you're using C++, I suggest you use std::vector<double> or Eigen matrix template, and drop C arrays forever (no more memory leaks, no more failed allocations, possible boundary checking, only advantages)

slow performance for 3D array delete C++

int newHeight = _height/2;
int newWidth = _width/2;
double*** imageData = new double**[newHeight];
for (int i = 0; i < newHeight; i++)
{
imageData[i] = new double*[newWidth];
for (int j = 0; j < newWidth; j++)
{
imageData[i][j] = new double[4];
}
}
I have dynamically allocated this 3D matrix.
what is the fastest and safest way to free the memory here?
here is that I have done but this takes a few seconds my matrix is big (1500,2000,4)
for (int i = 0; i != _height/2; i++)
{
for (int j = 0; j != _width/2; j++)
{
delete[] imageData[i][j];
}
delete[] imageData[i];
}
delete[] imageData;
Update
As suggested I have chosen this solution:
std::vector<std::vector<std::array<double,4>>>
the performance is great for my case

Allocate the entire image data as one block so you can free it as one block, ie. double* imageData = new double[width*height*4]; delete [] imageData; and index into it using offsets. Right now you are making 3 million separate allocations which is thrashing your heap.

I agree with qartar's answer right up until he said "index into it using offsets". That isn't necessary. You can have your single allocation and multiple subscript access (imageData[i][j][k]) too. I previously showed this method here, it's not difficult to adapt it for the 3-D case:
allocation code as follows:
double*** imageData;
imageData = new double**[width];
imageData[0] = new double*[width * height];
imageData[0][0] = new double[width * height * 4];
for (int i = 0; i < width; i++) {
if (i > 0) {
imageData[i] = imageData[i-1] + height;
imageData[i][0] = imageData[i-1][0] + height * 4;
}
for (int j = 1; j < height; j++) {
imageData[i][j] = imageData[i][j-1] + 4;
}
}
Deallocation becomes simpler:
delete[] imageData[0][0];
delete[] imageData[0];
delete[] imageData;
Of course, you can and should use std::vector to do the deallocation automatically:
std::vector<double**> imageData(width);
std::vector<double*> imageDataRows(width * height);
std::vector<double> imageDataCells(width * height * 4);
for (int i = 0; i < width; i++) {
imageData[i] = &imageDataRows[i * height];
for (int j = 0; j < height; j++) {
imageData[i][j] = &imageDataCells[(i * height + j) * 4];
}
}
and deallocation is completely automatic.
See my other answer for more explanation.
Or use std::array<double,4> for the last subscript, and use 2-D dynamic allocation via this method.

A slight variation on the first idea of Ben Voigt's answer:
double ***imagedata = new double**[height];
double **p = new double*[height * width];
double *q = new double[height * width * length];
for (int i = 0; i < height; ++i, p += width) {
imagedata[i] = p;
for (int j = 0; j < width; ++j, q += length) {
imagedata[i][j] = q;
}
}
// ...
delete[] imagedata[0][0];
delete[] imagedata[0];
delete[] imagedata;
It is possible to do the whole thing with a single allocation, but that would introduce a bit of complexity that you might not want to pay.
Now, the fact that each table lookup involves a couple of back-to-back reads of pointers from memory, this solution will pretty much always be quite inferior to allocating a flat array, and doing index calculations to convert a triple of indices into one flat index (and you should write a wrapper class that does these index calculations for you).
The main reason to use arrays of pointers to arrays of pointers to arrays is when your array is ragged — that is, imagedata[a][b] and imagedata[c][d] have different lengths — or maybe for swapping rows around, such as swap(imagedata[a][b], imagedata[c][d]). And under these circumstances, vector as you've used it is preferable to use until proven otherwise.

The primary portion of your algorithm that is killing performance is the granularity and sheer number of allocations you're making. In total you're producing 3001501 broken down as:
1 allocation for 1500 double**
1500 allocations, each of which obtains 2000 double*
3000000 allocations each of which obtains double[4]
This can be considerably reduced. You can certainly do as other suggest and simply allocate 1 massive array of double, leaving the index calculation to accessor functions. Of course, if you do that you need to ensure you bring the sizes along for the ride. The result, however, will easily deliver the fastest allocation time and access performance. Using a std::vector<double> arr(d1*d2*4); and doing the offset math as needed will serve very well.
Another Way
If you are dead set on using a pointer array approach, you can eliminate the 3000000 allocations by obtaining both of the inferior dimensions in single allocations. Your most-inferior dimension is fixed (4), thus you could do this: (but you'll see in a moment there is a much more C++-centric mechanism):
double (**allocPtrsN(size_t d1, size_t d2))[4]
{
typedef double (*Row)[4];
Row *res = new Row[d1];
for (size_t i=0; i<d1; ++i)
res[i] = new T[d2][4];
return res;
}
and simply invoke as:
double (**arr3D)[4] = allocPtrsN(d1,d2);
where d1 and d2 are your two superior dimensions. This produces exactly d1 + 1 allocations, the first being d1 pointers, the remaining be d1 allocations, one for each double[d2][4].
Using C++ Standard Containers
The prior code is obviously tedious, and frankly prone to considerable error. C++ offers a tidy solution this using a vector of vector of fixed array, doing this:
std::vector<std::vector<std::array<double,4>>> arr(1500, std::vector<std::array<double,4>>(2000));
Ultimately this will do nearly the same allocation technique as the rather obtuse code shown earlier, but provide you all the lovely benefits of the standard library while doing it. You get all those handy members of the std::vector and std::array templates, and RAII features as an added bonus.
However, this is one significant difference. The raw pointer method shown earlier will not value-initialize each allocated entity; the vector of vector of array method will. If you think it doesn't make a difference...
#include <iostream>
#include <vector>
#include <array>
#include <chrono>
using Quad = std::array<double, 4>;
using Table = std::vector<Quad>;
using Cube = std::vector<Table>;
Cube allocCube(size_t d1, size_t d2)
{
return Cube(d1, Table(d2));
}
double ***allocPtrs(size_t d1, size_t d2)
{
double*** ptrs = new double**[d1];
for (size_t i = 0; i < d1; i++)
{
ptrs[i] = new double*[d2];
for (size_t j = 0; j < d2; j++)
{
ptrs[i][j] = new double[4];
}
}
return ptrs;
}
void freePtrs(double***& ptrs, size_t d1, size_t d2)
{
for (size_t i=0; i<d1; ++i)
{
for (size_t j=0; j<d2; ++j)
delete [] ptrs[i][j];
delete [] ptrs[i];
}
delete [] ptrs;
ptrs = nullptr;
}
double (**allocPtrsN(size_t d1, size_t d2))[4]
{
typedef double (*Row)[4];
Row *res = new Row[d1];
for (size_t i=0; i<d1; ++i)
res[i] = new double[d2][4];
return res;
}
void freePtrsN(double (**p)[4], size_t d1, size_t d2)
{
for (size_t i=0; i<d1; ++i)
delete [] p[i];
delete [] p;
}
std::vector<std::vector<std::array<double,4>>> arr(1500, std::vector<std::array<double,4>>(2000));
template<class C>
void print_duration(const std::chrono::time_point<C>& beg,
const std::chrono::time_point<C>& end)
{
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - beg).count() << "ms\n";
}
int main()
{
using namespace std::chrono;
time_point<system_clock> tp;
volatile double vd;
static constexpr size_t d1 = 1500, d2 = 2000;
tp = system_clock::now();
for (int i=0; i<10; ++i)
{
double ***cube = allocPtrs(d1,d2);
cube[d1/2][d2/21][1] = 1.0;
vd = cube[d1/2][d2/2][3];
freePtrs(cube, 1500, 2000);
}
print_duration(tp, system_clock::now());
tp = system_clock::now();
for (int i=0; i<10; ++i)
{
Cube cube = allocCube(1500,2000);
cube[d1/2][d2/21][1] = 1.0;
vd = cube[d1/2][d2/2][3];
}
print_duration(tp, system_clock::now());
tp = system_clock::now();
for (int i=0; i<10; ++i)
{
auto cube = allocPtrsN(d1,d2);
cube[d1/2][d2/21][1] = 1.0;
vd = cube[d1/2][d2/21][1];
freePtrsN(cube, d1, d2);
}
print_duration(tp, system_clock::now());
}
Output
5328ms
418ms
95ms
Thusly, if you're planning on loading up every element with something besides zero anyway, it is something to keep in mind.
Conclusion
If performance were critical I would use the 24MB (on my implementation, anyway) single-allocation, likely in a std::vector<double> arr(d1*d2*4);, and do the offset calculations as needed using one form of secondary indexing or another. Other answers proffer up interesting ideas on this, notably Ben's, which radically reduces the allocation count two a mere three blocks (data, and two secondary pointer arrays). Sorry, I didn't have time to bench it, but I would suspect the performance would be stellar. But if you really want to keep your existing technique, consider doing it in a C++ container as shown above. If the extra cycles spent value initializing the world aren't too heavy a price to pay, it will be much easier to manage (and obviously less code to deal with in comparison to raw pointers).
Best of luck.

C++ Segmentation Fault After When Trying to Write to Matrix

I have this 3D matrix I allocated as one block of memory, but when I try to write to the darn thing, it gives me a segmentation fault. The thing works fine for two dimensions, but for some reason, I'm having trouble with the third...I have no idea where the error is in the allocation. It looks perfect to me.
Here's the code:
phi = new double**[xlength];
phi[0] = new double*[xlength*ylength];
phi[0][0] = new double[xlength*ylength*tlength];
for (int i=0;i<xlength;i++)
{
phi[i] = phi[0] + ylength*i;
for (int j=0;j<ylength;j++)
{
phi[i][j] = phi[i][0] + tlength*j;
}
}
Any help would be greatly appreciated. (Yes, I want a 3D matrix)
Also, this is where I get the segmentation fault if it matters:
for (int i = 0; i < xlength; i++)
{
for (int j = 0; j < ylength; j++)
{
phi[i][j][1] = 0.1*(4.0*i*h-i*i*h*h)
*(2.0*j*h-j*j*h*h);
}
}
This does work for two dimensions though!
phi = new double*[xlength];
phi[0] = new double[xlength*ylength];
for (int i=0;i<xlength;i++)
{
phi[i] = phi[0] + ylength*i;
}

You did not allocate other submatrixes like e.g. phi[1] or phi[0][1]
You need at least
phi = new double**[xlength];
for (int i=0; i<xlength; i++) {
phi[i] = new double* [ylength];
for (int j=0; j<ylength; j++) {
phi[i][j] = new double [zlength];
for (k=0; k<zlength; k++) phi[i][j][k] = 0.0;
}
}
and you should consider using std::vector (or even, if in C++2011, std::array), i.e.
std::vector<std::vector<double> > phi;
and then with std::vector you'll need to phi.resize(xlength) and a loop to resize each subelement phi[i].resize(ylength) etc.
If you want to allocate all the memory at once, you could have
double* phi = new double[xlength*ylength*zlength]
but then you cannot use the phi[i][j][k] notation, so you should
#define inphi(I,J,K) phi[(I)*xlength*ylength+(J)*xlength+(K)]
and write inphi(i,j,k) instead of phi[i][j][k]
Your second code does not work: it is undefined behavior (it don't crash because you are lucky, it could crash on other systems....), just some memory leak which don't crash yet (but could crash later, perhaps even by re-running the program again). Use a memory leakage detector like valgrind

Two-dimensional array initialization

CASE1:
int nrows=5;
int ncols=10;
int **rowptr;
rowptr=new int*;
for(int rows=0;rows<nrows;rows++) {
for(int cols=0;cols<ncols;cols++) {
*rowptr=new int;
}
}
CASE2:
int nrows=5;
int ncols=10;
int **rowptr;
for(int rows=0;rows<nrows;rows++) {
rowptr=new int*;
for(int cols=0;cols<ncols;cols++) {
*rowptr=new int;
}
}
I am able to insert and print values using both ways. What is the difference in initializations?

What is the difference?
#1 just allocates memory enough to hold a integer pointer and not an array of integer pointers.
#2 Causes a memory leak by just overwritting the memory allocation of the previous iteration.
I am able to insert and print values using both the ways
Memory leaks and Undefined behaviors may not produce immediate observale erroneous results in your program but they sure are good cases of the Murphy's Law.
The correct way to do this is:
int nrows = 5;
int ncols = 10;
//Allocate enough memory for an array of integer pointers
int **rowptr = new int*[nrows];
//loop through the array and create the second dimension
for (int i = 0;i < nrows;i++)
rowptr[i] = new int[ncols];

You have a memory leak in both cases.
The proper way to initialize such a "2d" array is
int** arr = new int*[nrows];
for (int i = 0; i < nrows; i++)
arr[i] = new int[ncols];
Note however, that it isn't a 2d array as defined by C/C++. It may not, and probably will not, be consecutive in memory. Also, the assembly code for accessing members is different.
In your case, the accessing by indexing is equivalent to *(*(arr+i)+j)
And in the case of a 2d array it's *(arr + N_COLS*i + j) when N_COLS is a compile time constant.
If you want a true 2d array you should do something like this:
int (*arr)[N_COLS] = (int(*)[N_COLS])(new int[N_ROWS * N_COLS])

You'd better use 1d array to manage 2d array
int **x = new int*[nrows];
x[0] = new int[nrows*ncols];
for (int i = 1; i < nrows; i++)
x[i] = x[i-1] + ncols;
for (int i = 0; i < nrows; i++)
for (int j = 0; j < ncols; j++)
x[i][j] = 0;
delete [] x[0];
delete [] x;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MKL dgemv - Invalid memory access for dynamically allocated arrays - c++

Related

double free or corruption (out) while deallocating memory in c++

Incorrect checksum for freed object - error when printing

slow performance for 3D array delete C++

C++ Segmentation Fault After When Trying to Write to Matrix

Two-dimensional array initialization

Categories

Resources