How to use MPI_Win_allocate_shared without getting an error? - c++

I have to replicate an algorithm in which I need two buffers (matrices (L+1)x(N+2)) that must be shared across processes (every process must be able to write in them and read what other processes wrote).
I found that a solution could be using MPI_Win_allocate_shared, however I think I didn't understand very well how to use it, because I get errors.
I'll report below my code with the two trials that I think are close to the solution (I avoid he whole algorithm to focus on the problem):
#include "Options.h"
#include <math.h>
#include <array>
#include <algorithm>
#include <memory>
#include <cmath>
#include <mpi.h>
std::pair <double, double> Options::BinomialPriceAmericanPut(void) {
int rank,size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// shared buffers to save data for seller and buyer
MPI_Win win_seller, win_buyer;
// size of the local window in bytes
MPI_Aint buff_size;
///////////////// TRIAL 1 /////////////////////////
// pointers that will (locally) point to the shared memory
typedef std::array<PWL, N+2> row_type;
row_type *seller_buff;
row_type *buyer_buff;
///////////////// TRIAL 2 /////////////////////////
// pointers that will (locally) point to the shared memory
typedef std::array<PWL, N+2> row_type;
row_type seller_buff[L+1];
row_type buyer_buff[L+1];
// with this TRIAL 2 I'll remove "&"in front of seller_buff and buyer_buff
// in MPI_Win_allocate_shared and MPI_Win_shared_query
// allocate shared memory
if (rank == 0) {
buff_size = (N+2) * (L+1) * sizeof(PWL);
MPI_Win_allocate_shared(buff_size, sizeof(PWL), MPI_INFO_NULL,
MPI_COMM_WORLD, &seller_buff, &win_seller);
MPI_Win_allocate_shared(buff_size, sizeof(PWL), MPI_INFO_NULL,
MPI_COMM_WORLD, &buyer_buff, &win_buyer);
}
else {
int disp_unit;
MPI_Win_allocate_shared(0, sizeof(PWL), MPI_INFO_NULL,
MPI_COMM_WORLD, &seller_buff, &win_seller);
MPI_Win_allocate_shared(0, sizeof(PWL), MPI_INFO_NULL,
MPI_COMM_WORLD, &buyer_buff, &win_buyer);
MPI_Win_shared_query(win_seller, 0, &buff_size, &disp_unit,
&seller_buff);
MPI_Win_shared_query(win_buyer, 0, &buff_size, &disp_unit,
&buyer_buff);
}
// up- and down- move factors
double u = exp( sigma * sqrt(expiry/N) );
// cash accumulation factor
double r = exp( R*expiry / N );
// initialize algorithm
int p(size);
int n = N + 2; // number of nodes in the current base level
int s = rank * ( n/p );
int e = (rank==p-1)? n: (rank+1) * ( n/p );
// each core works on e-s nodes in the current level
// compute u and z for both seller and buyer: payoff (0,0) at time N+1
for (int l=s; l<e; l++) {
const double St = S0*pow (u, N+1-2*l);
const double Sa = St * (1+k);
const double Sb = St * (1-k);
// compute functions
PWL u_s( {Line(-Sa, 0), Line(-Sb,0)} );
PWL u_b( {Line(-Sa, 0), Line(-Sb, 0)} );
// fill buffers
seller_buff[0][l] = u_s;
buyer_buff[0][l] = u_b;
}
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 0 ) {
std::cout << "Row: " << 11 << std::endl
<< "\tAsk = " << seller_buff[0][7].valueInPoint(0) << std::endl
<< "\tBid = " << -buyer_buff[0][7].valueInPoint(0) << std::endl;
}
int U = 0; // variable for the mapping from tree to buffers
int B=N+1; // current base level
while ( B>0 ) {
// DO stuffs with buffers
}
// compute ask and bid prices
double ask(0), bid(0);
// clear shared windows
MPI_Win_free(&win_seller);
MPI_Win_free(&win_buyer);
return std::make_pair(bid, ask);
}
I added the "if" after MPI_Barrier to see if buffers work, where column 7 (with N=10) is supposed to be computed by rank 1.
Actually TRIAL 1 worked when using another simpler class, but with PWL class not. Error in two trials are:
1) In TRIAL 1 I get a segmentation fault due to the call of valueInPoint() in the if: the problem is that rank 0 cannot see what rank 1 wrote in its columns but I do not understand why.
mpiexec -np 3 main
[localhost:09623] *** Process received signal ***
[localhost:09623] Signal: Segmentation fault (11)
[localhost:09623] Signal code: Address not mapped (1)
[localhost:09623] Failing at address: 0x26fd440
[localhost:09623] [ 0] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libpthread.so.0(+0x10e20)[0x7fa78a99de20]
[localhost:09623] [ 1] main[0x4048ac]
[localhost:09623] [ 2] main[0x4048f8]
[localhost:09623] [ 3] main[0x401e55]
[localhost:09623] [ 4] main[0x40178e]
[localhost:09623] [ 5] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(__libc_start_main+0xf0)[0x7fa78a60c6b0]
[localhost:09623] [ 6] main[0x401389]
[localhost:09623] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 9623 on node localhost exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
make: *** [Makefile:26: run] Error 139
2) In this case rank 0 is able to access and print what rank 1 did, but I get another error.
mpiexec -np 3 main
Row: 11
Ask = 0
Bid = -0
[localhost:09651] *** Process received signal ***
[localhost:09651] Signal: Segmentation fault (11)
[localhost:09651] Signal code: Address not mapped (1)
[localhost:09651] Failing at address: 0x7fe9777c90bc
[localhost:09651] [ 0] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libpthread.so.0(+0x10e20)[0x7fe976627e20]
[localhost:09651] [ 1] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(cfree+0x14)[0x7fe9762eef74]
[localhost:09651] [ 2] main[0x404724]
[localhost:09651] [ 3] main[0x404096]
[localhost:09651] [ 4] main[0x40362e]
[localhost:09651] [ 5] main[0x403127]
[localhost:09651] [ 6] main[0x40274f]
[localhost:09651] [ 7] main[0x402528]
[localhost:09651] [ 8] main[0x4025e6]
[localhost:09651] [ 9] main[0x402017]
[localhost:09651] [10] main[0x40178e]
[localhost:09651] [11] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(__libc_start_main+0xf0)[0x7fe9762966b0]
[localhost:09651] [12] main[0x401389]
[localhost:09651] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 9651 on node localhost exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
make: *** [Makefile:26: run] Error 139
Moreover, actually when I run the all algorithm (without commenting the while loop) with TRIAL 2 i get another error:
mpiexec -np 3 main
Row: 11
Ask = 0
Bid = -0
*** Error in `main': free(): invalid pointer: 0x00007f2ccac660c4 ***
======= Backtrace: =========
/u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(+0x6f2e4)[0x7f2cc977f2e4]
/u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(+0x74d16)[0x7f2cc9784d16]
/u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(+0x754fe)[0x7f2cc97854fe]
main[0x405dd6]
main[0x40556c]
main[0x404930]
main[0x404429]
main[0x40396f]
main[0x404eb7]
main[0x404334]
main[0x403833]
main[0x4023ee]
main[0x40178e]
/u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(__libc_start_main+0xf0)[0x7f2cc97306b0]
main[0x401389]
======= Memory map: ========
00400000-0040e000 r-xp 00000000 00:25 659 /vagrant/Google Drive/ACP/TENTATIVO4/main
0060d000-0060e000 r--p 0000d000 00:25 659 /vagrant/Google Drive/ACP/TENTATIVO4/main
0060e000-0060f000 rw-p 0000e000 00:25 659 /vagrant/Google Drive/ACP/TENTATIVO4/main
00fac000-01239000 rw-p 00000000 00:00 0 [heap]
7f2cb0000000-7f2cb0021000 rw-p 00000000 00:00 0
7f2cb0021000-7f2cb4000000 ---p 00000000 00:00 0
7f2cb7fff000-7f2cc0000000 rw-s 00000000 fd:00 202783851 /tmp/openmpi-sessions-vagrant#localhost_0/63096/1/shared_mem_pool.localhost (deleted)
7f2cc0000000-7f2cc0021000 rw-p 00000000 00:00 0
7f2cc0021000-7f2cc4000000 ---p 00000000 00:00 0
7f2cc48f1000-7f2cc4cf2000 rw-s 00000000 fd:00 135477240 /tmp/openmpi-sessions-vagrant#localhost_0/63096/1/2/vader_segment.localhost.2
7f2cc4cf2000-7f2cc50f3000 rw-s 00000000 fd:00 68300033 /tmp/openmpi-sessions-vagrant#localhost_0/63096/1/1/vader_segment.localhost.1
7f2cc50f3000-7f2cc54f4000 rw-s 00000000 fd:00 1474379 /tmp/openmpi-sessions-vagrant#localhost_0/63096/1/0/vader_segment.localhost.0
7f2cc54f4000-7f2cc54ff000 r-xp 00000000 fd:00 2626640 /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libnss_files-2.23.so
7f2cc54ff000-7f2cc56fe000 ---p 0000b000 fd:00 2626640 /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libnss_files-2.23.so
7f2cc56fe000-7f2cc56ff000 r--p 0000a000 fd:00 2626640 /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libnss_files-2.23.so
7f2cc56ff000-7f2cc5700000 rw-p 0000b000 fd:00 2626640 /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libnss_files-2.23.so
7f2cc5700000-7f2cc5706000 rw-p 00000000 00:00 0
7f2cc5706000-7f2cc5707000 ---p 00000000 00:00 0
7f2cc5707000-7f2cc5f07000 rw-p 00000000 00:00 0 [stack:9979]
7f2cc5f07000-7f2cc5f2b000 r-xp 00000000 fd:00 4240669 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/liblzma.so.5.2.2
7f2cc5f2b000-7f2cc612b000 ---p 00024000 fd:00 4240669 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/liblzma.so.5.2.2
7f2cc612b000-7f2cc612c000 r--p 00024000 fd:00 4240669 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/liblzma.so.5.2.2
7f2cc612c000-7f2cc612d000 rw-p 00025000 fd:00 4240669 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/liblzma.so.5.2.2
7f2cc612d000-7f2cc6142000 r-xp 00000000 fd:00 1363668 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libz.so.1.2.8
7f2cc6142000-7f2cc6341000 ---p 00015000 fd:00 1363668 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libz.so.1.2.8
7f2cc6341000-7f2cc6342000 r--p 00014000 fd:00 1363668 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libz.so.1.2.8
7f2cc6342000-7f2cc6343000 rw-p 00015000 fd:00 1363668 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libz.so.1.2.8
7f2cc6343000-7f2cc7bbf000 r--p 00000000 fd:00 1549735 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicudata.so.57.1
7f2cc7bbf000-7f2cc7dbe000 ---p 0187c000 fd:00 1549735 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicudata.so.57.1
7f2cc7dbe000-7f2cc7dbf000 r--p 0187b000 fd:00 1549735 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicudata.so.57.1
7f2cc7dbf000-7f2cc7f4d000 r-xp 00000000 fd:00 1549736 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicuuc.so.57.1
7f2cc7f4d000-7f2cc814d000 ---p 0018e000 fd:00 1549736 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicuuc.so.57.1
7f2cc814d000-7f2cc815f000 r--p 0018e000 fd:00 1549736 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicuuc.so.57.1
7f2cc815f000-7f2cc8160000 rw-p 001a0000 fd:00 1549736 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicuuc.so.57.1
7f2cc8160000-7f2cc8162000 rw-p 00000000 00:00 0
7f2cc8162000-7f2cc83c3000 r-xp 00000000 fd:00 1549762 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicui18n.so.57.1
7f2cc83c3000-7f2cc85c3000 ---p 00261000 fd:00 1549762 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicui18n.so.57.1
7f2cc85c3000-7f2cc85d0000 r--p 00261000 fd:00 1549762 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicui18n.so.57.1
7f2cc85d0000-7f2cc85d2000 rw-p 0026e000 fd:00 1549762 /u/sw/pkgs/toolchains/gcc-glibc/5/base/lib/libicui18n.so.57.1
7f2cc85d2000-7f2cc85d3000 rw-p 00000000 00:00 0
7f2cc85d3000-7f2cc85d5000 r-xp 00000000 fd:00 2590182 /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libdl-2.23.so[localhost:09977] *** Process received signal ***
[localhost:09977] Signal: Aborted (6)
[localhost:09977] Signal code: (-6)
[localhost:09977] [ 0] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libpthread.so.0(+0x10e20)[0x7f2cc9ac1e20]
[localhost:09977] [ 1] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(gsignal+0x38)[0x7f2cc9743228]
[localhost:09977] [ 2] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(abort+0x16a)[0x7f2cc97446aa]
[localhost:09977] [ 3] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(+0x6f2e9)[0x7f2cc977f2e9]
[localhost:09977] [ 4] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(+0x74d16)[0x7f2cc9784d16]
[localhost:09977] [ 5] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(+0x754fe)[0x7f2cc97854fe]
[localhost:09977] [ 6] main[0x405dd6]
[localhost:09977] [ 7] main[0x40556c]
[localhost:09977] [ 8] main[0x404930]
[localhost:09977] [ 9] main[0x404429]
[localhost:09977] [10] main[0x40396f]
[localhost:09977] [11] main[0x404eb7]
[localhost:09977] [12] main[0x404334]
[localhost:09977] [13] main[0x403833]
[localhost:09977] [14] main[0x4023ee]
[localhost:09977] [15] main[0x40178e]
[localhost:09977] [16] /u/sw/pkgs/toolchains/gcc-glibc/5/prefix/lib/libc.so.6(__libc_start_main+0xf0)[0x7f2cc97306b0]
[localhost:09977] [17] main[0x401389]
[localhost:09977] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 9977 on node localhost exited on signal 6 (Aborted).
--------------------------------------------------------------------------
make: *** [Makefile:26: run] Error 134
Please could someone help me to understand what's going on and how to fix it?? Thank you all.

Related

AVX intrinsics for tiled matrix multiplication [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I was trying to use AVX512 intrinsics to vectorize my loop of matrix multiplication (tiled). I used __mm256d as variables to store intermediate results and store them in my results. However, somehow this triggers memory corruption. I've got no hint why this is the case, as the non-AVX version works fine. Also, another weird thing is that tile sizes somehow affects the result now.
The matrix structs are attached in the following code section. The function takes two matrix pointers, m1 and m2 and an integer for tileSize.Thanks for #harold's feedback, I've now replaced the _mm256_load_pd for matrix m1 with broadcast. However, the memory corrupution problem still persist. I've also attached the output of memory corruption below
__m256d rResult rm1, rm2, rmult;
for (int bi = 0; bi < result->row; bi += tileSize) {
for (int bj = 0; bj < result->col; bj += tileSize) {
for (int bk = 0; bk < m1->col; bk += tileSize) {
for (int i = 0; i < tileSize; i++ ) {
for (int j = 0; j < tileSize; j+=4) {
rResult = _mm256_setzero_pd();
for (int k = 0; k < tileSize; k++) {
// result->val[bi+i][bj+j] += m1.val[bi+i][bk+k]*m2.val[bk+k][bj+j];
rm1 = _mm256_broadcast_pd((__m128d const *) &m1->val[bi+i][bk+k]);
rm2 = _mm256_load_pd(&m2->val[bk+k][bj+j]);
rmult = _mm256_mul_pd(rm1,rm2);
rResult = _mm256_add_pd(rResult,rmult);
_mm256_store_pd(&result->val[bi+i][bj+j],rResult);
}
}
}
}
}
}
return result;
*** Error in `./matrix': free(): invalid next size (fast): 0x0000000001880910 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81609)[0x2b04a26d0609]
./matrix[0x4016cc]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b04a2671495]
./matrix[0x400e29]
======= Memory map: ========
00400000-0040c000 r-xp 00000000 00:2c 6981358608 /home/matrix
0060b000-0060c000 r--p 0000b000 00:2c 6981358608 /home/matrix
0060c000-0060d000 rw-p 0000c000 00:2c 6981358608 /home/matrix
01880000-018a1000 rw-p 00000000 00:00 0 [heap]
2b04a1f13000-2b04a1f35000 r-xp 00000000 00:16 12900 /usr/lib64/ld-2.17.so
2b04a1f35000-2b04a1f3a000 rw-p 00000000 00:00 0
2b04a1f4e000-2b04a1f52000 rw-p 00000000 00:00 0
2b04a2134000-2b04a2135000 r--p 00021000 00:16 12900 /usr/lib64/ld-2.17.so
2b04a2135000-2b04a2136000 rw-p 00022000 00:16 12900 /usr/lib64/ld-2.17.so
2b04a2136000-2b04a2137000 rw-p 00000000 00:00 0
2b04a2137000-2b04a2238000 r-xp 00000000 00:16 13188 /usr/lib64/libm-2.17.so
2b04a2238000-2b04a2437000 ---p 00101000 00:16 13188 /usr/lib64/libm-2.17.so
2b04a2437000-2b04a2438000 r--p 00100000 00:16 13188 /usr/lib64/libm-2.17.so
2b04a2438000-2b04a2439000 rw-p 00101000 00:16 13188 /usr/lib64/libm-2.17.so
2b04a2439000-2b04a244e000 r-xp 00000000 00:16 12867 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2b04a244e000-2b04a264d000 ---p 00015000 00:16 12867 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2b04a264d000-2b04a264e000 r--p 00014000 00:16 12867 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2b04a264e000-2b04a264f000 rw-p 00015000 00:16 12867 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2b04a264f000-2b04a2811000 r-xp 00000000 00:16 13172 /usr/lib64/libc-2.17.so
2b04a2811000-2b04a2a11000 ---p 001c2000 00:16 13172 /usr/lib64/libc-2.17.so
2b04a2a11000-2b04a2a15000 r--p 001c2000 00:16 13172 /usr/lib64/libc-2.17.so
2b04a2a15000-2b04a2a17000 rw-p 001c6000 00:16 13172 /usr/lib64/libc-2.17.so
2b04a2a17000-2b04a2a1c000 rw-p 00000000 00:00 0
2b04a2a1c000-2b04a2a1e000 r-xp 00000000 00:16 13184 /usr/lib64/libdl-2.17.so
2b04a2a1e000-2b04a2c1e000 ---p 00002000 00:16 13184 /usr/lib64/libdl-2.17.so
2b04a2c1e000-2b04a2c1f000 r--p 00002000 00:16 13184 /usr/lib64/libdl-2.17.so
2b04a2c1f000-2b04a2c20000 rw-p 00003000 00:16 13184 /usr/lib64/libdl-2.17.so
2b04a4000000-2b04a4021000 rw-p 00000000 00:00 0
2b04a4021000-2b04a8000000 ---p 00000000 00:00 0
7ffc8448e000-7ffc844b1000 rw-p 00000000 00:00 0 [stack]
7ffc845ed000-7ffc845ef000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted
That code loads a small row vector from m1 and a small row vector from m2 and multiplies them, which is not how matrix multiplication works, I assume it's direct vectorization of the identical scalar loop. You can use a broadcast-load from m1, that way the product with the row vector from m2 results in a row vector of the result which is convenient (the other way around, broadcasting from m2, you get a column vector of the result which is tricky to store - unless of course you use the column-major matrix layout).
Never resetting rResult is also wrong, and takes extra care when using tiling, because the tiling means that individual results are put aside and then picked up again later. It's convenient to implement C += A*B because then you don't have to distinguish between the second time that a result is worked on (loading rResult back out of the result matrix) and the first time that a result is worked on (either zeroing the accumulator, or if you implement C += A*B, then it's also just loading it out of the result).
There are some performance bugs,
Using one accumulator. This limits the inner loop to run once every 4 cycles (Skylake) in the long term, because of the loop-carried dependency through the addition (or FMA). There should be 2 FMAs per cycle but that way there would be one FMA every 4 cycles, 1/8th speed.
Using a 2:1 load-to-FMA ratio (assuming the mul+add is contracted), it needs to be 1:1 or better to avoid getting bottlenecked by load throughput. A 2:1 ratio is limited to half speed.
The solution for both of them is multiplying a small column vector from m1 with a small row vector from m2 in the inner loop, summing into a small matrix of accumulators rather than just one of them. For example if you use a 3x16 region (3x4 vectors, with a vector length of 4 and the vectors corresponding to loads from m2, from m1 you would do broadcast-loads), then there are 12 accumulators, and therefore 12 independent dependency chains: enough to hide the high latency-throughput product of FMA (2 per cycle, but 4 cycles long on Skylake, so you need at least 8 independent chains, and at least 10 on Haswell). It also means there are 7 loads and 12 FMAs in the inner loop, even better than 1:1, it can even support Turbo frequencies without overclocking the cache.
I would also like to note that setting the tile size the same in every dimension is not necessarily the best. Maybe it is, but probably not, the dimensions do act a little differently.
More advanced performance issue,
Not re-packing tiles. This means tiles will span more pages than necessary, which hurts the effectiveness of the TLB. You can easily get into a situation where your tiles fit in the cache, but are spread over too many pages to fit in the TLB. TLB thrashing is not good.
Using asymmetric tile sizes you can arrange for either m1 tiles or m2 tiles to be TLB-friendly, but not both at the same time.
If you care about performance, normally you want one contiguous chunk of memory, not an array of pointers to rows.
Anyway, you're probably reading off the end of a row if your tile size isn't a multiple of 4 doubles per vector. Or if your rows or cols aren't a multiple of the tile size, then you need to stop after the last full tile, and write cleanup code for the end.
e.g. bi < result->row - (tileSize-1) for the outer loops
If your tile size isn't a multiple of 4, then you'd also need i < tileSize-3. But hopefully you are doing power-of-2 loop tiling / cache blocking. But you'd want a size - 3 boundary for vector cleanup in a partial tile. Then probably scalar cleanup for the last few elements. (Or if you can use an unaligned final vector that ends at the end of a row, that can work, maybe with masked loads/stores. But trickier for matmul than for algorithms that just make a single pass.)

C++11 Bi-dimensional Array Destructor Segmentation Fault

this problem is blowing my mind... My experience with C++ (really with every OO programming paradigm) is very low, but i dont find the solution on my problem... Why my destructor make me problem on copy overloaded assignment?
Any help is appreciated, have a good day
matrice.h
#ifndef MATRICE_H_
#define MATRICE_H_
typedef double tipoelem;
class matrice {
public:
matrice(int, int, tipoelem inizializzatore = 0); /* costruttore */
~matrice(void);
tipoelem leggiMatrice(int, int);
void scriviMatrice(int, int, tipoelem);
void prodottoScalare(tipoelem);
matrice matriceTrasposta(void);
matrice matriceProdotto(matrice& M);
matrice& operator=(const matrice&);
void rand(void);
void stampa(void);
private:
int righe;
int colonne;
tipoelem **elementi;
};
#endif /* MATRICE_H_ */
matrice.cpp
#include "matrice.h"
#include <stdlib.h>
#include <iostream>
// costruttore
matrice::matrice(int r, int c, tipoelem inizializzatore){
this->colonne = c;
this->righe = r;
// allocazione dinamica della matrice
elementi = new tipoelem*[righe];
for (auto i=0; i!=righe; i++)
this->elementi[i] = new tipoelem[colonne];
// inizializzazione degli elementi
for (auto i=0; i!=righe; i++)
for(auto j=0; j!=colonne; j++)
this->elementi[i][j] = inizializzatore;
}
matrice::~matrice(void){
for(auto j=0;j<this->colonne;++j){
delete[] this->elementi[j];
}
delete[] this->elementi;
}
tipoelem matrice::leggiMatrice(int i, int j){
return elementi[i][j];
}
void matrice::scriviMatrice(int i, int j, tipoelem scrittura){
elementi[i][j] = scrittura;
return;
}
void matrice::prodottoScalare(tipoelem scalare){
for(auto i = 0; i<righe;i++)
for(auto j = 0; j<colonne;j++)
elementi[i][j]=elementi[i][j]*scalare;
return;
}
matrice matrice::matriceTrasposta(void){
matrice trasposta(colonne, righe);
for(auto i=0; i<righe;i++)
for(auto j=0; j<colonne;j++)
trasposta.scriviMatrice(j,i,leggiMatrice(i,j));
return trasposta;
}
matrice matrice::matriceProdotto(matrice& M){
matrice prodotto(righe, colonne);
for(auto i=0; i<righe;i++)
for(auto j=0; j<righe;j++)
prodotto.scriviMatrice(i,j,(matrice::leggiMatrice(i,j)*M.leggiMatrice(i,j)));
return prodotto;
}
matrice& matrice::operator=(const matrice &m){
if(this != &m){
if(colonne != m.colonne || righe != m.righe){
this->~matrice();
this->righe = m.righe;
this->colonne = m.colonne;
matrice(righe,colonne);
}
for(auto i=0;i!=righe;i++)
for(auto j=0;j!=colonne;j++)
elementi[i][j] = m.elementi[i][j];
}
return (*this);
}
void matrice::rand(void){
for(auto i=0; i<righe;i++)
for(auto j=0;j<colonne;j++)
matrice::scriviMatrice(i,j,(random() % 100));
return;
}
void matrice::stampa(void){
for(auto i=0; i<righe;i++){
for(auto j=0; j<colonne;j++)
std::cout << elementi[i][j] << " ";
std::cout << std::endl;
}
}
TestMatrice.cpp (for testing propose)
#include <iostream>
#include "matrice.h"
int main(void){
matrice A(3,2), T(2,3);
A.rand();
std::cout <<"Stampa A" << std::endl;
A.stampa();
std::cout << "Stampa Trasposta T" << std::endl;
T = A.matriceTrasposta();
T.stampa();
std::cout << std::endl;
std::cout << "Stampa B" << std::endl;
matrice B(4,4);
B.stampa();
std::cout << "Stampa copia t in b" << std::endl;
B = T;
B.stampa();
return (0);
}
Tx
P.s.
Console output and debugging info:
Stampa A
83 86
77 15
93 35
Stampa Trasposta T
83 77 93
86 15 35
Stampa B
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Stampa copia t in b
83 77 93
86 15 35
*** Error in `/home/ibanez89/UniBa/Workspace/[ASD 2015]/[ASD]Esercitazione_lab3/Debug/[ASD]Esercitazione_lab3': free(): invalid pointer: 0x0000000001f63e10 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72055)[0x7fb1eb9c6055]
/usr/lib/libc.so.6(+0x779a6)[0x7fb1eb9cb9a6]
/usr/lib/libc.so.6(+0x7818e)[0x7fb1eb9cc18e]
/home/ibanez89/UniBa/Workspace/[ASD 2015]/[ASD]Esercitazione_lab3/Debug/[ASD]Esercitazione_lab3[0x400e82]
/home/ibanez89/UniBa/Workspace/[ASD 2015]/[ASD]Esercitazione_lab3/Debug/[ASD]Esercitazione_lab3[0x400c3e]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7fb1eb974610]
/home/ibanez89/UniBa/Workspace/[ASD 2015]/[ASD]Esercitazione_lab3/Debug/[ASD]Esercitazione_lab3[0x400a09]
======= Memory map: ========
00400000-00402000 r-xp 00000000 08:03 14945604 /home/ibanez89/UniBa/Workspace/[ASD 2015]/[ASD]Esercitazione_lab3/Debug/[ASD]Esercitazione_lab3
00601000-00602000 rw-p 00001000 08:03 14945604 /home/ibanez89/UniBa/Workspace/[ASD 2015]/[ASD]Esercitazione_lab3/Debug/[ASD]Esercitazione_lab3
01f52000-01f84000 rw-p 00000000 00:00 0 [heap]
7fb1e4000000-7fb1e4021000 rw-p 00000000 00:00 0
7fb1e4021000-7fb1e8000000 ---p 00000000 00:00 0
7fb1eb954000-7fb1ebaef000 r-xp 00000000 08:02 1051822 /usr/lib/libc-2.22.so
7fb1ebaef000-7fb1ebcee000 ---p 0019b000 08:02 1051822 /usr/lib/libc-2.22.so
7fb1ebcee000-7fb1ebcf2000 r--p 0019a000 08:02 1051822 /usr/lib/libc-2.22.so
7fb1ebcf2000-7fb1ebcf4000 rw-p 0019e000 08:02 1051822 /usr/lib/libc-2.22.so
7fb1ebcf4000-7fb1ebcf8000 rw-p 00000000 00:00 0
7fb1ebcf8000-7fb1ebd0e000 r-xp 00000000 08:02 1052090 /usr/lib/libgcc_s.so.1
7fb1ebd0e000-7fb1ebf0d000 ---p 00016000 08:02 1052090 /usr/lib/libgcc_s.so.1
7fb1ebf0d000-7fb1ebf0e000 rw-p 00015000 08:02 1052090 /usr/lib/libgcc_s.so.1
7fb1ebf0e000-7fb1ec00b000 r-xp 00000000 08:02 1051873 /usr/lib/libm-2.22.so
7fb1ec00b000-7fb1ec20a000 ---p 000fd000 08:02 1051873 /usr/lib/libm-2.22.so
7fb1ec20a000-7fb1ec20b000 r--p 000fc000 08:02 1051873 /usr/lib/libm-2.22.so
7fb1ec20b000-7fb1ec20c000 rw-p 000fd000 08:02 1051873 /usr/lib/libm-2.22.so
7fb1ec20c000-7fb1ec37e000 r-xp 00000000 08:02 1061398 /usr/lib/libstdc++.so.6.0.21
7fb1ec37e000-7fb1ec57e000 ---p 00172000 08:02 1061398 /usr/lib/libstdc++.so.6.0.21
7fb1ec57e000-7fb1ec588000 r--p 00172000 08:02 1061398 /usr/lib/libstdc++.so.6.0.21
7fb1ec588000-7fb1ec58a000 rw-p 0017c000 08:02 1061398 /usr/lib/libstdc++.so.6.0.21
7fb1ec58a000-7fb1ec58e000 rw-p 00000000 00:00 0
7fb1ec58e000-7fb1ec5b0000 r-xp 00000000 08:02 1051821 /usr/lib/ld-2.22.so
7fb1ec763000-7fb1ec769000 rw-p 00000000 00:00 0
7fb1ec7ad000-7fb1ec7af000 rw-p 00000000 00:00 0
7fb1ec7af000-7fb1ec7b0000 r--p 00021000 08:02 1051821 /usr/lib/ld-2.22.so
7fb1ec7b0000-7fb1ec7b1000 rw-p 00022000 08:02 1051821 /usr/lib/ld-2.22.so
7fb1ec7b1000-7fb1ec7b2000 rw-p 00000000 00:00 0
7ffdcc751000-7ffdcc772000 rw-p 00000000 00:00 0 [stack]
7ffdcc794000-7ffdcc796000 r--p 00000000 00:00 0 [vvar]
7ffdcc796000-7ffdcc798000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
The issue has to do with your assignment operator. You are explicitly calling the destructor on this line:
this->~matrice();
You should not call the destructor explicitly, unlesss you're using placement-new. What is happening is that when the "normal" destructor is called when the object goes out of scope, you are deallocating the same pointer value twice, thus causing undefined behavior to occur.
If we combine the answers given to you for the destructor fixes, then your assignment operator can be written easily. However, your code is missing a user defined copy constructor. Following the Rule of 3, if you have a user-defined destructor, then a user-defined copy constructor and assignment operator should exist.
So let's define a copy constructor for your class:
First, the class is missing this:
matrice(const matrice& rhs);
We want to copy the rhs to the new object. The implementation can look like this:
matrice::matrice(const matrice& rhs) : colonne(rhs.colonne),
righe(rhs.righe),
elementi(new tipoelem*[rhs.righe])
{
for (auto i = 0; i != righe; i++)
this->elementi[i] = new tipoelem[colonne];
for (auto i = 0; i != righe; i++)
for (auto j = 0; j != colonne; j++)
this->elementi[i][j] = rhs.elementi[i][j];
}
So we copy all the items from rhs, create the matrix, and then copy the matrix values from rhs to this.
Once we have this, then the assignment operator can be written easily using the Copy / Swap Idiom:
matrice& matrice::operator=(const matrice &m)
{
matrice temp(m);
std::swap(temp.colonne, colonne);
std::swap(temp.righe, righe);
std::swap(temp.elementi, elementi);
return *this;
}
This works by creating a temporary copy of rhs (the copy constructor did this job), and then swap out the internals of the existing object with the temporary copy. Since the temporary object now has the old data swapped into it, when it dies (at the end of the function), that old data gets deleted with it. That in a nutshell is what copy / swap does and why you must have a working copy constructor and destructor to use it.
Once we have this, and combine the fixes from the other answer for the destructor, we see that the code now does not crash:
Full Example
Please Note: I used a constant 100.0 instead of rand(), as your code does not compile due to random returning void.
You mixed colone with righe
matrice::~matrice(void){
for(auto j=0;j<this->righe;++j){
// ^^^^^^
delete[] this->elementi[j];
}
delete[] this->elementi;
}
.. and there too ...
matrice matrice::matriceProdotto(matrice& M){
matrice prodotto(righe, colonne);
for(auto i=0; i<righe;i++)
for(auto j=0; j<colone ;j++)
// ^^^^^^^
prodotto.scriviMatrice(i,j,(matrice::leggiMatrice(i,j)*M.leggiMatrice(i,j)));
return prodotto;
}

Double free or corruption: C++

So I am getting a memory leak error from my code:
*** glibc detected *** ./KalmanFiltering: double free or corruption (!prev): 0x00000000015af7b0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7f0897395b96]
./KalmanFiltering[0x40654d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f089733876d]
./KalmanFiltering[0x4012b9]
======= Memory map: ========
00400000-00415000 r-xp 00000000 00:15 6312794 /home/iggy/Dropbox/Documents/Research_Work/SimpleHealth/KalmanFilter/KalmanFilter_C++/cmpfit-1.2/KalmanFiltering
00614000-00615000 r--p 00014000 00:15 6312794 /home/iggy/Dropbox/Documents/Research_Work/SimpleHealth/KalmanFilter/KalmanFilter_C++/cmpfit-1.2/KalmanFiltering
00615000-00616000 rw-p 00015000 00:15 6312794 /home/iggy/Dropbox/Documents/Research_Work/SimpleHealth/KalmanFilter/KalmanFilter_C++/cmpfit-1.2/KalmanFiltering
015ae000-01641000 rw-p 00000000 00:00 0 [heap]
7f0897317000-7f08974cc000 r-xp 00000000 08:01 421630 /lib/x86_64-linux-gnu/libc-2.15.so
7f08974cc000-7f08976cc000 ---p 001b5000 08:01 421630 /lib/x86_64-linux-gnu/libc-2.15.so
7f08976cc000-7f08976d0000 r--p 001b5000 08:01 421630 /lib/x86_64-linux-gnu/libc-2.15.so
7f08976d0000-7f08976d2000 rw-p 001b9000 08:01 421630 /lib/x86_64-linux-gnu/libc-2.15.so
7f08976d2000-7f08976d7000 rw-p 00000000 00:00 0
7f08976d7000-7f08976ec000 r-xp 00000000 08:01 395568 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f08976ec000-7f08978eb000 ---p 00015000 08:01 395568 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f08978eb000-7f08978ec000 r--p 00014000 08:01 395568 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f08978ec000-7f08978ed000 rw-p 00015000 08:01 395568 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f08978ed000-7f08979e8000 r-xp 00000000 08:01 422139 /lib/x86_64-linux-gnu/libm-2.15.so
7f08979e8000-7f0897be7000 ---p 000fb000 08:01 422139 /lib/x86_64-linux-gnu/libm-2.15.so
7f0897be7000-7f0897be8000 r--p 000fa000 08:01 422139 /lib/x86_64-linux-gnu/libm-2.15.so
7f0897be8000-7f0897be9000 rw-p 000fb000 08:01 422139 /lib/x86_64-linux-gnu/libm-2.15.so
7f0897be9000-7f0897ccb000 r-xp 00000000 08:01 531352 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f0897ccb000-7f0897eca000 ---p 000e2000 08:01 531352 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f0897eca000-7f0897ed2000 r--p 000e1000 08:01 531352 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f0897ed2000-7f0897ed4000 rw-p 000e9000 08:01 531352 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f0897ed4000-7f0897ee9000 rw-p 00000000 00:00 0
7f0897ee9000-7f0897f0b000 r-xp 00000000 08:01 422388 /lib/x86_64-linux-gnu/ld-2.15.so
7f08980a3000-7f08980ea000 rw-p 00000000 00:00 0
7f0898107000-7f089810b000 rw-p 00000000 00:00 0
7f089810b000-7f089810c000 r--p 00022000 08:01 422388 /lib/x86_64-linux-gnu/ld-2.15.so
7f089810c000-7f089810e000 rw-p 00023000 08:01 422388 /lib/x86_64-linux-gnu/ld-2.15.so
7fffc58ae000-7fffc58cf000 rw-p 00000000 00:00 0 [stack]
7fffc59af000-7fffc59b0000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)
Running where in gdb I get:
#0 0x00007ffff723e425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff7241b8b in __GI_abort () at abort.c:91
#2 0x00007ffff727c39e in __libc_message (do_abort=2, fmt=0x7ffff7386748 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:201
#3 0x00007ffff7286b96 in malloc_printerr (action=3, str=0x7ffff7386858 "double free or corruption (!prev)", ptr=<optimized out>) at malloc.c:5039
#4 0x000000000040654d in main (argc=6, argv=0x7fffffffe0c8) at KalmanFiltering.cpp:828
where 828 is the return line in my main() function.
The code in question is:
int main(){
...
EKSmoothParams *EKParams = new EKSmoothParams;
prepareSmoother(optPar, ECGsd, peaks, phase, x, fs, EKParams);
delete EKParams;
return 0;
}
void prepareSmoother(vector<double> optPar, vector<double> ECGsd, vector<double> peaks, vector<double> phase, vector<double> x, double fs, EKSmoothParams *params){
const int N = PEAK_NUM; // number of Gaussian kernels
vector<int> JJ;
JJ.reserve(peaks.size());
for(int i = 0; i < peaks.size(); i++){
if(peaks.at(i) != 0)
JJ.push_back(i);
}
vector<double> fm; // heart-rate
fm.reserve(JJ.size()-1);
for(int i = 0; i < JJ.size()-1; i++){
fm.push_back(fs/(JJ.at(i+1)-JJ.at(i)));
}
vector<double> twoPiFm = fm;
for(int i = 0; i < fm.size(); i++)
twoPiFm[i] = 2*PI*fm.at(i);
double w = calculateMean(twoPiFm); // average heart-rate in rads.
double wsd = sd2(twoPiFm); // heart-rate standard deviation in rads.
params->X0[0][0] = 1.0;
params->X0[0][1] = -PI;
params->X0[1][0] = 1.0;
params->X0[1][1] = 0.0;
params->P0[0][0] = pow(2*PI,2);
params->P0[0][1] = 0.0;
params->P0[1][0] = 2.0;
params->P0[1][1] = 10*pow(findAbsMax(x),2.0);
vector<double> diagonal(3*N+2, 0.0);
for(int i = 0; i < N; i++)
diagonal[i] = pow(0.1*optPar.at(i),2.0);
for(int i = N; i < 3*N; i++)
diagonal[i] = pow(0.5,2.0);
diagonal[3*N] = pow(wsd,2.0);
vector<double>::const_iterator first = ECGsd.begin();
vector<double>::const_iterator last = ECGsd.begin() + round(ECGsd.size()/10.0);
vector<double> ECGsdPartial(first, last);
displayVector(ECGsd);
diagonal[3*N+1] = pow(0.05*calculateMean(ECGsdPartial), 2.0);
for(int i = 0; i < diagonal.size(); i++)
params->Q[i][i] = diagonal[i];
params->R[0][0] = pow(w/fs,2)/12.0;;
params->R[0][1] = 0.0;
params->R[1][0] = 0.0;
params->R[1][1] = pow(calculateMean(ECGsdPartial), 2.0);
for(int i = 0; i < optPar.size(); i++){
params->wMean[i] = optPar.at(i);
params->inits[i] = optPar.at(i);
}
params->wMean[N*3] = w;
params->wMean[N*3+1] = 0;
params->inits[N*3] = w;
params->inits[N*3+1] = fs;
params->vMean[0] = 0.0;
params->vMean[1] = 0.0;
params->inovWlen = round(0.5*fs+0.5);
params->tau = 0;
params->gamma = 1;
params->rAdaptWlen = round(fs/2.0 + 0.5);
params->flag = 1;
for(int i = 0; i < 2; i++){
for(int j = 0; j < 2; j++){
cout << params->R[i][j] << " ";
}
}
}
And the struct is statically allocated:
struct EKSmoothParams {
int tau;
int gamma;
int flag;
int inovWlen;
int rAdaptWlen;
double wMean[3*PEAK_NUM+2];
double vMean[2];
double inits[3*PEAK_NUM+2];
double X0[2][2];
double P0[2][2];
double Q[3*PEAK_NUM+2][3*PEAK_NUM+2];
double R[2][2];
};
where:
#define MEAN_PEAK_NUM_HIGH 3
#define MEAN_PEAK_NUM_LOW 3
#define PEAK_NUM MEAN_PEAK_NUM_HIGH+MEAN_PEAK_NUM_LOW
Any help would be greatly appreciated. I thought that if a struct was statically allocated then only calling
delete struct_name;
would delete the reference. Thanks for the help!
UPDATE:
So I ran valgrind and it told me to remove the delete EKParms. I removed the delete EKParms line and ran it again as:
valgrind --leak-check=full ./KalmanFiltering x.txt pphase.txt phase.txt opt.txt peaks.txt
and the output i got was:
==32755== Memcheck, a memory error detector
==32755== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==32755== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==32755== Command: ./KalmanFiltering x.txt pphase.txt phase.txt opt.txt peaks.txt
==32755==
0.117777 0.102793 0.0911142 0.107277 0.109 0.126729 0.115012 0.109627 0.131102 0.100467 0.0822639 0.122442 0.0908527 0.116644 0.108093 0.104796 0.12665 0.102979 0.0999146 0.119981 0.107912 0.122379 0.113098 0.0889197 0.106954 0.101472 0.125473 0.107778 0.13228 0.10528 0.11511 0.107965 0.0961817 0.125068 0.12075 0.110846 0.120621 0.132226 0.106999 0.114672 0.0997654 0.104048 0.117857 0.105461 0.127318 0.11103 0.134415 0.12594 0.126633 0.116603 0.109422 0.117 0.130797 0.112808 0.113414 0.0951991 0.112291 0.109693 0.118444 0.104215 0.124635 0.0993083 0.122034 0.122363 0.12139 0.0969221 0.108173 0.109436 0.115881 0.118631 0.0968963 0.104841 0.118923 0.10789 0.108117 0.119053 0.115187 0.119369 0.089593 0.0893818 0.127805 0.109007 0.108001 0.128517 0.105524 0.117847 0.127699 0.101618 0.113646 0.112389 0.114674 0.108706 0.117413 0.119509 0.110195 0.116943 0.132244 0.108374 0.117175 0.114302 0.113753 0.127603 0.104102 0.112583 0.110015 0.102419 0.122587 0.104333 0.122883 0.129287 0.129104 0.10733 0.11312 0.125945 0.119181 0.128817 0.129468 0.114589 0.146289 0.135648 0.118936 0.146207 0.160105 0.167322 0.16074 0.140913 0.153806 0.178517 0.215381 0.251905 0.163988 0.137749 0.107549 0.121755 0.121141 0.0867208 0.103768 0.130058 0.142986 0.115026 0.12086 0.12443 0.122726 0.110762 0.125137 0.126337 0.0953488 0.10774 0.112677 0.116888 0.115948 0.104844 0.114403 0.121069 0.110119 0.0980817 0.109335 0.104094 0.10667 0.118813 0.123157 0.11163 0.105456 0.103909 0.112385 0.126633 0.123956 0.108601 0.113358 0.0971531 0.123609 0.116769 0.130958 0.103691 0.114814 0.116871 0.12273 0.116116 0.118833 0.11895 0.100572 0.128861 0.110058 0.121104 0.122787 0.122287 0.114645 0.12352 0.122679 0.121228 0.116913 0.128488 0.111704 0.102892 0.119502 0.113897 0.144082 0.132502 0.115685 0.145348 0.137543 0.12479 0.132752 0.137675 0.144116 0.127518 0.146219 0.152045 0.123085 0.152635 0.153129 0.159488 0.139282 0.150634 0.119596 0.1195 0.127458 0.109524 0.106355 0.116666 0.114375 0.104727 0.0978894 0.0941401 0.11789 0.11224 0.110342 0.106331 0.104715 0.0991576 0.116447 0.0908483 0.11542 0.105876 0.0955746 0.120995 0.125514 0.130953 0.12472 0.118668 0.118989 0.106662 0.117213 0.111635 0.106181 0.11708 0.101769 0.10301 0.112952 0.104064
3.31532e-06 0 0 0.0118791
==32755==
==32755== HEAP SUMMARY:
==32755== in use at exit: 0 bytes in 0 blocks
==32755== total heap usage: 61,624 allocs, 61,624 frees, 6,091,874 bytes allocated
==32755==
==32755== All heap blocks were freed -- no leaks are possible
==32755==
==32755== For counts of detected and suppressed errors, rerun with: -v
==32755== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
However, when I run the program normally I still get the double free or corruption error as before.
Problem is probably memory corruption. You have buggy #define PEAK_NUM. Needs parenthesis:
#define PEAK_NUM (MEAN_PEAK_NUM_HIGH+MEAN_PEAK_NUM_LOW)
Without those extra parenthesis, you for example have this:
double Q[3*MEAN_PEAK_NUM_HIGH+MEAN_PEAK_NUM_LOW+2][3*MEAN_PEAK_NUM_HIGH+MEAN_PEAK_NUM_LOW+2];
And that obviously is much less than you expect, so your function probably corrupts memory and anything can happen after that.
Start by fixing that! Also, this demonstrates why macros are evil. You have to be very careful when using them, and still they occasionally bit even an experienced programmer.
Even though this is out of the scope of the question, here's a way to more safely define the constants:
enum {
MEAN_PEAK_NUM_HIGH = 3,
MEAN_PEAK_NUM_LOW = 3,
PEAK_NUM = MEAN_PEAK_NUM_HIGH + MEAN_PEAK_NUM_LOW
};

c++: glibc invalid pointer error when src is 32bit compiled

I've written a program which compiles and runs well on my 64-bit machine (running linux SUSE). Now I need to call an external library but I only have access to the 32-bit binary. My source code compiles and links with no errors from ssh command line to a 32 bit machine, but I get a memory error at runtime now before the library is called, or any of the interesting stuff happens...
I have a simple class cWorld to initialize some other classes, it has a method cWorld::ReadData() which opens a text file and parses/reads lines from the file and stores values in various members of cWorld, and then closes the file. The file, input.txt, just holds some explanation text and initial condition values, separated by commas and semicolons. Nothing groundbreaking!
Debugging with gdb showed that the file opens, closes successfully, all the data is stored successfully, then the SIGABRT is thrown at the very end when the ReadData() method is exited.
Extracted the problem code from my program:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
class cWorld {
public:
cWorld ();
void CallReadData ();
private:
int N_target, N_steps;
double t0, tf, delt;
std::vector<double> data;
void ReadData ();
};
cWorld::cWorld () {
N_target = 0;
N_steps = 0;
delt = 0.0;
t0 = 0.0;
tf = 0.0;
}
void cWorld::CallReadData() {
ReadData();
}
void cWorld::ReadData() {
std::string line;
std::ifstream input("input_test.txt");
if (input.is_open()) {
// RETRIEVE INPUT OPTIMIZATION PARAMETERS
input.ignore(1000, '>'); // ignore text until first '>' appears
std::getline(input, line, ';'); // get int N_target
std::stringstream(line) >> N_target;
input.ignore(1000, '>'); // ignore text until first '>' appears
std::getline(input, line, ','); // get t0
std::stringstream(line) >> t0;
std::getline(input, line, ','); // get delt
std::stringstream(line) >> delt;
std::cout << "delt = " << delt << std::endl;
std::getline(input, line, ','); // get tf
std::stringstream(line) >> tf;
N_steps = (int)( (tf - t0) / delt ) + 1; // set an int cWorld::N_steps
// RETRIEVE INPUT STATE PARAMETERS
int index = 0; // initialize local iterator
data.resize(12*N_target, 0.0); // set data size
std::cout << "data elements = " << data.size() << std::endl;
while (!input.eof()) {
// if there's '<' end loop
if (input.peek() == '<') break;
// if there's a semicolon, store following text in data...
else if (input.peek() == ';') {
input.ignore(1000, '>');
std::getline(input, line, ',');
std::stringstream(line) >> data[index];
index++;
}
// else if there's a comma, store following text in data...
else {
std::getline(input, line, ',');
std::stringstream(line) >> data[index];
index++;
}
}
input.close();
}
else std::cout << "Can't open file 'input.txt'.\n";
}
int main() {
cWorld world_1;
world_1.CallReadData();
return 0;
}
input text file:
/****************************************************************/
/* */
/* p2pOpt.C INPUT FILE */
/* */
/****************************************************************/
System Parameters: number of paths to optimize
format: N_target; (int)
>3;
System Parameters: start time, step size, end time
format: t0,delt,tf,; (doubles)
>0.0,0.001,1,;
Target 1 Parameters: Initial Conditions
format: x,y,z,theta1,theta2,theta3,xdot,ydot,zdot,theta1dot,theta2dot,theta3dot,;(doubles)
>1.0,0.0,0.0,3.14159265359,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,;
>2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,;
>3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,;
<
Here's the debug output:
======= Memory map: ========
08048000-0804b000 r-xp 00000000 00:29 18254842 /home/ston_sa/core/motion_planning/algorithms_cpp/p2pOpt/test_3_32
0804b000-0804c000 r--p 00002000 00:29 18254842 /home/ston_sa/core/motion_planning/algorithms_cpp/p2pOpt/test_3_32
0804c000-0804d000 rw-p 00003000 00:29 18254842 /home/ston_sa/core/motion_planning /algorithms_cpp/p2pOpt/test_3_32
0804d000-0806e000 rw-p 00000000 00:00 0 [heap]
b7b00000-b7b21000 rw-p 00000000 00:00 0
b7b21000-b7c00000 ---p 00000000 00:00 0
b7cd8000-b7cdb000 rw-p 00000000 00:00 0
b7cdb000-b7e42000 r-xp 00000000 08:06 114523898 /lib/libc-2.11.3.so
b7e42000-b7e44000 r--p 00167000 08:06 114523898 /lib/libc-2.11.3.so
b7e44000-b7e45000 rw-p 00169000 08:06 114523898 /lib/libc-2.11.3.so
b7e45000-b7e48000 rw-p 00000000 00:00 0
b7e48000-b7e64000 r-xp 00000000 08:06 114544736 /lib/libgcc_s.so.1
b7e64000-b7e65000 r--p 0001b000 08:06 114544736 /lib/libgcc_s.so.1
b7e65000-b7e66000 rw-p 0001c000 08:06 114544736 /lib/libgcc_s.so.1
b7e66000-b7e8c000 r-xp 00000000 08:06 114353773 /lib/libm-2.11.3.so
b7e8c000-b7e8d000 r--p 00026000 08:06 114353773 /lib/libm-2.11.3.so
b7e8d000-b7e8e000 rw-p 00027000 08:06 114353773 /lib/libm-2.11.3.so
b7e8e000-b7f70000 r-xp 00000000 08:06 2169219 /usr/lib/libstdc++.so.6.0.16
b7f70000-b7f74000 r--p 000e2000 08:06 2169219 /usr/lib/libstdc++.so.6.0.16
b7f74000-b7f75000 rw-p 000e6000 08:06 2169219 /usr/lib/libstdc++.so.6.0.16
b7f75000-b7f7c000 rw-p 00000000 00:00 0
b7fdd000-b7fdf000 rw-p 00000000 00:00 0
b7fdf000-b7ffe000 r-xp 00000000 08:06 114544574 /lib/ld-2.11.3.so
b7ffe000-b7fff000 r--p 0001e000 08:06 114544574 /lib/ld-2.11.3.so
b7fff000-b8000000 rw-p 0001f000 08:06 114544574 /lib/ld-2.11.3.so
bffdf000-c0000000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
Program received signal SIGABRT, Aborted.
0xffffe424 in __kernel_vsyscall ()
and backtrace:
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7d05e20 in raise () from /lib/libc.so.6
#2 0xb7d07755 in abort () from /lib/libc.so.6
#3 0xb7d44d65 in __libc_message () from /lib/libc.so.6
#4 0xb7d4ac54 in malloc_printerr () from /lib/libc.so.6
#5 0xb7d4c563 in _int_free () from /lib/libc.so.6
#6 0xb7d4f69d in free () from /lib/libc.so.6
#7 0xb7f3fa0f in operator delete(void*) () from /usr/lib/libstdc++.so.6
#8 0xb7f26f6b in std::string::_Rep::_M_destroy(std::allocator<char> const&) () from /usr/lib/libstdc++.so.6
#9 0xb7f26fac in ?? () from /usr/lib/libstdc++.so.6
#10 0xb7f2701e in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() () from /usr/lib/libstdc++.so.6
#11 0x080495bf in cWorld::ReadData (this=0xbfffefe0) at test_3.cpp:91
#12 0x0804961b in cWorld::CallReadData (this=0xbfffefe0) at test_3.cpp:30
#13 0x08049646 in main () at test_3.cpp:100
at #11 test_3.cpp:91 is the closing bracket of the ReadData() method.
First note, you didn't include a sample input.txt to test against. Second note, what are some sample values the variables are initialized to?
So, given that tf=0.0, t0=0.0, and delt=1.0 and using an input.txt of:
>
1;
1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,11.0,12.0
<
I get an a vector of data with 11 entries, with the first 11 values in the list and no errors. Are you sure your input.txt is formatted as the code expects? Do you really want to delete the last item in the list?
Your first problem is that your loop does 37 reads but only resizes data to be 36 elements. You should restructure how you are parsing your input. Maybe use scanf() if nothing else.

(C++) Passing a pointer to dynamically allocated array to a function

I've had problems with variables overwriting each other in memory, so I decided I'd try to allocate one of my arrays dynamically.
In the simplified code below, I'm attempting to create an array of integers using dynamic allocation, then have a function edit the values within that array of integers. Once the function has finished executing, I'd like to have a nicely processed array for use in other functions.
From what I know, an array cannot be passed to a function, so I'm simply passing a pointer to the array to the function.
#include <iostream>
using namespace std;
void func(int *[]);
int main(){
//dynamically allocate an array
int *anArray[100];
anArray[100] = new int [100];
func(anArray);
int i;
for (i=0; i < 99; i++)
cout << "element " << i << " is: " << anArray[i] << endl;
delete [] anArray;
}
void func(int *array[]){
//fill with 0-99
int i;
for (i=0; i < 99; i++){
(*array)[i] = i;
cout << "element " << i << " is: " << array[i] << endl;
}
}
When I attempt to compile the code above, g++ gives me the following warning:
dynamicArray.cc: In function ‘int main()’:
dynamicArray.cc:21:12: warning: deleting array ‘int* anArray [100]’ [enabled by default]
When I run the compiled a.out executable anyway, it outputs nothing, leaving me with nothing but the message
Segmentation fault (core dumped)
in terminal.
What am I doing wrong? My code not attempting to access or write to anything outside of the array I created. In fact, I'm not even attempting to read or writing to the last element of the array!
Something REALLY weird happens when I comment out the part that actually modifies the array, like so
//(*array)[i] = i;
G++ compiles with the same warning, but when I execute a.out I get this instead:
element 0 is: 0x600df0
element 1 is: 0x400a3d
element 2 is: 0x7f5b00000001
element 3 is: 0x10000ffff
element 4 is: 0x7fffa591e320
element 5 is: 0x400a52
element 6 is: 0x1
element 7 is: 0x400abd
element 8 is: 0x7fffa591e448
element 0 is: 0x600df0
element 1 is: 0x400a3d
element 2 is: 0x7f5b00000001
element 3 is: 0x10000ffff
element 4 is: 0x7fffa591e320
element 5 is: 0x400a52
element 6 is: 0x1
element 7 is: 0x400abd
element 8 is: 0x7fffa591e448
*** glibc detected *** ./a.out: munmap_chunk(): invalid pointer: 0x00007fffa591e2f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7f5b92ff4b96]
./a.out[0x400976]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f5b92f9776d]
./a.out[0x400829]
======= Memory map: ========
00400000-00401000 r-xp 00000000 00:13 4070334 /home/solderblob/Documents/2013 Spring Semester/CSC 1254s2 C++ II/Assignment 1/a.out
00600000-00601000 r--p 00000000 00:13 4070334 /home/solderblob/Documents/2013 Spring Semester/CSC 1254s2 C++ II/Assignment 1/a.out
00601000-00602000 rw-p 00001000 00:13 4070334 /home/solderblob/Documents/2013 Spring Semester/CSC 1254s2 C++ II/Assignment 1/a.out
01eb5000-01ed6000 rw-p 00000000 00:00 0 [heap]
7f5b92a64000-7f5b92a79000 r-xp 00000000 08:16 11276088 /lib/x86_64- linux-gnu/libgcc_s.so.1
7f5b92a79000-7f5b92c78000 ---p 00015000 08:16 11276088 /lib/x86_64- linux-gnu/libgcc_s.so.1
7f5b92c78000-7f5b92c79000 r--p 00014000 08:16 11276088 /lib/x86_64- linux-gnu/libgcc_s.so.1
7f5b92c79000-7f5b92c7a000 rw-p 00015000 08:16 11276088 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5b92c7a000-7f5b92d75000 r-xp 00000000 08:16 11276283 /lib/x86_64-linux-gnu/libm-2.15.so
7f5b92d75000-7f5b92f74000 ---p 000fb000 08:16 11276283 /lib/x86_64-linux-gnu/libm-2.15.so
7f5b92f74000-7f5b92f75000 r--p 000fa000 08:16 11276283 /lib/x86_64-linux-gnu/libm-2.15.so
7f5b92f75000-7f5b92f76000 rw-p 000fb000 08:16 11276283 /lib/x86_64-linux-gnu/libm-2.15.so
7f5b92f76000-7f5b9312b000 r-xp 00000000 08:16 11276275 /lib/x86_64-linux-gnu/libc-2.15.so
7f5b9312b000-7f5b9332a000 ---p 001b5000 08:16 11276275 /lib/x86_64-linux-gnu/libc-2.15.so
7f5b9332a000-7f5b9332e000 r--p 001b4000 08:16 11276275 /lib/x86_64-linux-gnu/libc-2.15.so
7f5b9332e000-7f5b93330000 rw-p 001b8000 08:16 11276275 /lib/x86_64-linux-gnu/libc-2.15.so
7f5b93330000-7f5b93335000 rw-p 00000000 00:00 0
7f5b93335000-7f5b93417000 r-xp 00000000 08:16 31987823 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f5b93417000-7f5b93616000 ---p 000e2000 08:16 31987823 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f5b93616000-7f5b9361e000 r--p 000e1000 08:16 31987823 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f5b9361e000-7f5b93620000 rw-p 000e9000 08:16 31987823 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16
7f5b93620000-7f5b93635000 rw-p 00000000 00:00 0
7f5b93635000-7f5b93657000 r-xp 00000000 08:16 11276289 /lib/x86_64-linux-gnu/ld-2.15.so
7f5b93834000-7f5b93839000 rw-p 00000000 00:00 0
7f5b93853000-7f5b93857000 rw-p 00000000 00:00 0
7f5b93857000-7f5b93858000 r--p 00022000 08:16 11276289 /lib/x86_64-linux-gnu/ld-2.15.so
7f5b93858000-7f5b9385a000 rw-p 00023000 08:16 11276289 /lib/x86_64-linux-gnu/ld-2.15.so
7fffa5900000-7fffa5921000 rw-p 00000000 00:00 0 [stack]
7fffa59ff000-7fffa5a00000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)
When writing:
int *anArray[100];
anArray[100] = new int [100];
In the first line, you are allocating an array of 100 pointers to int.
In the second line, you dynamically allocating an array of ints and assigning the address of that array to the 100 cell of the array of pointers. Proper syntax would be:
int *anArray;
anArray = new int [100];
Array indexing starts from 0.
Array of len 100 has indexes from 0 to 99.
So anArray[100] gives you Segmentation Fault.
May be you want to do this:
anArray[99] = new int[100];
OR if you just want to dynamically allocate an array of pointer to ints, do that following:
int **anArray = new int*[100];
//dynamically allocate an array
int *anArray[100];
anArray[100] = new int [100];
anArray is an array of 100 pointers-to-int. To the 101th element (buffer overflow!) you assign a pointer that points to the first element of a dynamically allocated array of 100 ints. You want to fix that and merge the two lines as int* anArray = new int[100];.
int *anArray[100];
anArray[100] = new int [100];
First you are allocating an array of 100 pointers.
Then you are performing out of range access. The last element of anArray is anArray[99], but you are allocating memory to anArray[100] which does not exist. This will cause a segmentation fault.
In the end, you are deleting a static array of type int*. anArray is allocated at compile time and contains 100 pointers of type int. Remove the delete[] statement.