Parallelizing with openmp makes memory leak - c++

#include <iostream>
#include <random>
int main()
{
int a;
int *arr;
a = 3;
arr = new int[a];
#pragma omp parallel for
for (int i = 0; i < a; i++)
arr[i] = i;
delete[] arr;
return 0;
}
When I test this simple code with valgrind, it saids:
==2606== HEAP SUMMARY:
==2606== in use at exit: 3,360 bytes in 7 blocks
==2606== total heap usage: 10 allocs, 3 frees, 108,892 bytes allocated
==2606==
==2606== LEAK SUMMARY:
==2606== definitely lost: 0 bytes in 0 blocks
==2606== indirectly lost: 0 bytes in 0 blocks
==2606== possibly lost: 912 bytes in 3 blocks
==2606== still reachable: 2,448 bytes in 4 blocks
==2606== suppressed: 0 bytes in 0 blocks
Am I misunderstanding openmp or memory allocation usage?
Without "#pragma omp parallel for", it doesn't make any issues.
UPDATE
==2682== 912 bytes in 3 blocks are possibly lost in loss record 4 of 5
==2682== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2682== by 0x40149DA: allocate_dtv (dl-tls.c:286)
==2682== by 0x40149DA: _dl_allocate_tls (dl-tls.c:532)
==2682== by 0x4DE4322: allocate_stack (allocatestack.c:622)
==2682== by 0x4DE4322: pthread_create##GLIBC_2.2.5 (pthread_create.c:660)
==2682== by 0x4A4FDEA: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2682== by 0x4A478E0: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2682== by 0x109184: main (main.cpp:10)
This is what --leak-check-full says.

Related

Why does MATLAB Engine API for C++ leaves "still reachable 8,132 bytes in 99 blocks" even for a simple program with just engine open and close?

Following is a section of the output of the log-file from Valgrind using the following command:
valgrind --leak-check=full --show-leak-kinds=all --verbose
--log-file=valgrind_output.txt ./TestMatlab
==12340==
==12340== HEAP SUMMARY:
==12340== in use at exit: 8,132 bytes in 99 blocks
==12340== total heap usage: 3,810 allocs, 3,711 frees, 436,330 bytes allocated
==12340==
==12340== Searching for pointers to 99 not-freed blocks
==12340== Checked 3,940,672 bytes
...
...
...
==12340== LEAK SUMMARY:
==12340== definitely lost: 0 bytes in 0 blocks
==12340== indirectly lost: 0 bytes in 0 blocks
==12340== possibly lost: 0 bytes in 0 blocks
==12340== still reachable: 8,132 bytes in 99 blocks
==12340== suppressed: 0 bytes in 0 blocks
==12340==
==12340== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==12340== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
The C++ program is a very simple code that just instantiates an object of the engine API for C++ interface and subsequently closes the engine interface. The code is shown below:
#include "engine.h"
#include <iostream>
#include <stdlib.h>
int main() {
Engine *ep;
if (!(ep = engOpen(""))) {
std::cout <<"\nCan't start MATLAB engine" << std::endl;
return EXIT_FAILURE;
}
std::cout << "Done!" << std::endl;
engClose(ep);
return 0;
}
Can this issue be fixed? If so how can I do it?

MergeSort with Dynamic arrays

I'm trying to implement a function mergeSort that returns a dynamic array of type intervalo_t. As it is, the function is working nicely, the problem I have is that when I run Valgrind to check for memory loss, turns out I'm loosing quite a bit.
Intervalo_t definition:
struct intervalo_t {
nat inicio;
nat fin;
};
This is the code:
intervalo_t* mergeSort(intervalo_t *intervalos, nat n)
{
intervalo_t* ret=new intervalo_t[n];
if(n==2){
if (intervalos[0].fin>intervalos[1].fin){
ret[0]=intervalos[1];
ret[1]=intervalos[0];
}else{
ret[0]=intervalos[0];
ret[1]=intervalos[1];
}
//caso base
}else if (n==1){
ret[0]=intervalos[0];
//caso base
}else{
nat k=0;
if((n%2)!=0){
k=1;
}//Si es par o no
intervalo_t* interA =new intervalo_t[n/2 + k];
intervalo_t* interB =new intervalo_t[n/2];
for (nat i=0; i<n/2; i++){
interA[i]=intervalos[i];
interB[i]=intervalos[i+(n/2)];
}//for
if (k==1){
interA[(n/2)]=intervalos[n-1];
}
interA=mergeSort(interA, n/2 + k);
interB=mergeSort(interB, n/2);
nat i=0;
nat j=0;
nat r=0;
while((i<((n/2)+k)) && (j<(n/2))){
if (interA[i].fin>interB[j].fin){
ret[r]=interB[j];
j++;
}else{
ret[r]=interA[i];
i++;
}
r++;
}
while(i<(n/2)+k){
ret[r]=interA[i];
i++;
r++;
}
while(j<(n/2)){
ret[r]=interA[j];
i++;
j++;
}
delete[] interA;
delete[] interB;
//recursion
}
return ret;
}
And this is Valgrind's output:
==15556==
==15556== HEAP SUMMARY:
==15556== in use at exit: 24 bytes in 2 blocks
==15556== total heap usage: 12 allocs, 10 frees, 77,959 bytes allocated
==15556==
==15556== 8 bytes in 1 blocks are definitely lost in loss record 1 of 2
==15556== at 0x4C2F06F: operator new[](unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==15556== by 0x401536: mergeSort(intervalo_t*, unsigned int) (intervalos.cpp:26)
==15556== by 0x4017C3: max_cantidad(intervalo_t*, unsigned int) (intervalos.cpp:67)
==15556== by 0x401130: main (principal.cpp:170)
==15556==
==15556== 16 bytes in 1 blocks are definitely lost in loss record 2 of 2
==15556== at 0x4C2F06F: operator new[](unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==15556== by 0x401509: mergeSort(intervalo_t*, unsigned int) (intervalos.cpp:25)
==15556== by 0x4017C3: max_cantidad(intervalo_t*, unsigned int) (intervalos.cpp:67)
==15556== by 0x401130: main (principal.cpp:170)
==15556==
==15556== LEAK SUMMARY:
==15556== definitely lost: 24 bytes in 2 blocks
==15556== indirectly lost: 0 bytes in 0 blocks
==15556== possibly lost: 0 bytes in 0 blocks
==15556== still reachable: 0 bytes in 0 blocks
==15556== suppressed: 0 bytes in 0 blocks
==15556==
==15556== For lists of detected and suppressed errors, rerun with: -s
==15556== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
I ran valgrind with:
valgrind --leak-check=full ./myFile <test.in
Test.in:
mergeSort
15
30
45
Thanks in advance for your help!
The problem is that the memory you allocated here:
intervalo_t* interA =new intervalo_t[n/2 + k];
intervalo_t* interB =new intervalo_t[n/2];
leaks when you overwrite those pointers here:
interA=mergeSort(interA, n/2 + k);
interB=mergeSort(interB, n/2);
It is rarely a good idea to reuse variables for multiple purposes, so use separate variables for the recursive results:
intervalo_t* resultA=mergeSort(interA, n/2 + k);
intervalo_t* resultB=mergeSort(interB, n/2);
and then use those for merging (and remember to release them).
I would also recommend disposing of the inputs immediately after recursing so you don't forget it.
Or, you could use std::vector and save yourself some headaches.

std::cout causes memory leak

I have a very simple C++ program.
#include <iostream>
int main()
{
std::cout << "HI" << std::endl;
return 0;
}
I compile this on a Mac with the command c++ --std=c++11 leak.cpp.
When I debug this with valgrind --leak-check=full ./a.out, I get the following output:
==2187== HEAP SUMMARY:
==2187== in use at exit: 38,906 bytes in 429 blocks
==2187== total heap usage: 508 allocs, 79 frees, 45,074 bytes allocated
==2187==
==2187== LEAK SUMMARY:
==2187== definitely lost: 0 bytes in 0 blocks
==2187== indirectly lost: 0 bytes in 0 blocks
==2187== possibly lost: 0 bytes in 0 blocks
==2187== still reachable: 4,096 bytes in 1 blocks
==2187== suppressed: 34,810 bytes in 428 blocks
==2187== Reachable blocks (those to which a pointer was found) are not shown.
==2187== To see them, rerun with: --leak-check=full --show-leak-kinds=all
Turns out there are 4096 bytes that are "still reachable". If I remove the cout statement then there are no more "still reachable" bytes.
Why is it the case that outputting to std::cout causes a memory leak?
It could be a false positive in the leak report. Valgrind can only be so clever; your standard library implementation is taking certain liberties that Valgrind doesn't have a special case for.
I'd be more worried about figuring out why this tiny program is performing 508 allocations, to a total of 45,074 bytes.

Memory leaks in boost threads?

I'm trying out boost threads and I noticed from valgrind that it is leaking 320 bytes just from looping through an empty block of code. I found some posts on google from 2010 that suggests that they are likely a false positive from threads not closing before valgind runs through, but this is slightly different. In those examples you had a few blocks that were still reachable (therefor, freeable if threads were still running) where my run shows 8 as still reachable and 20 blocks as definitely lost. Is this something I should worry about, or am I somehow missing something? Thanks
The code
#include <boost/thread.hpp>
#include <iostream>
#define THREADS 20
void threadfunc(int workerid) {}
int main(int argc, char **argv){
boost::thread *threads[THREADS];
int i;
for (i = 0; i < THREADS; i++) {
threads[i] = new boost::thread(threadfunc, i);
}
for (i = 0; i < THREADS; i++) {
threads[i]->join();
}
}
Compile command
c++ -o example example.cpp -I /usr/include/boost -lboost_system -lboost_thread
Valgind command
G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --show-reachable=yes --num-callers=40 --log-file=valgrind.log ./example
Valgine results
==31674== HEAP SUMMARY:
==31674== in use at exit: 328 bytes in 21 blocks
==31674== total heap usage: 103 allocs, 82 frees, 14,968 bytes allocated
==31674==
==31674== Searching for pointers to 21 not-freed blocks
==31674== Checked 215,920 bytes
==31674==
==31674== 8 bytes in 1 blocks are still reachable in loss record 1 of 2
==31674== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31674== by 0x4E454A9: boost::detail::get_once_per_thread_epoch() (in /usr/lib/libboost_thread.so.1.46.1)
==31674== by 0x4E3E4FF: ??? (in /usr/lib/libboost_thread.so.1.46.1)
==31674== by 0x4E3E7C8: boost::detail::get_current_thread_data() (in /usr/lib/libboost_thread.so.1.46.1)
==31674== by 0x4E3FF3A: boost::thread::join() (in /usr/lib/libboost_thread.so.1.46.1)
==31674== by 0x402C79: main (in /home/Jason/php/base/example)
==31674==
==31674== 320 bytes in 20 blocks are definitely lost in loss record 2 of 2
==31674== at 0x4C2B1C7: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31674== by 0x402C2A: main (in /home/Jason/php/base/example)
==31674==
==31674== LEAK SUMMARY:
==31674== definitely lost: 320 bytes in 20 blocks
==31674== indirectly lost: 0 bytes in 0 blocks
==31674== possibly lost: 0 bytes in 0 blocks
==31674== still reachable: 8 bytes in 1 blocks
==31674== suppressed: 0 bytes in 0 blocks
==31674==
==31674== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
--31674--
--31674-- used_suppression: 2 dl-hack3-cond-1
==31674==
==31674== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
It's your errors, not boost::threads.
Your memory are not freed.
for (i = 0; i < THREADS; i++) {
threads[i] = new boost::thread(threadfunc, i);
}
Before exit from main function you must free memory (delete threads).
Something like
for (i = 0; i < THREADS; i++) {
delete threads[i];
}
or delete next after join.

size of double and float objects in a list are equal?

I am wondering if the size of float and double objects are equal from std::list point of view?
I've allocated 5-million Real(alias float or double) objects in a std::list and used Valgrind to monitor memory usage.
in both cases the used memory is equal although the size of a 'double' (8 bytes) is double the size if a 'float' object (4 bytes)!
Btw, when I allocate memory for the same amount of objects using 'new' operator, the memory usage of the double array is double the usage of the float array, which seems about right. I was expecting the same using std::list too.
I am using gcc 4.6.2, on Fedora 16.x86_64.
Any idea to help me figure the mystery is appreciated.
here is the code I wrote for test
#include <iostream>
#include <list>
typedef double Real;
int main(int argc, char** argv)
{
std::list<Real> pts;
int k;
int npts = 5000000; // 5 mil
std::cout << "sizeof(Real): " << sizeof(Real) << std::endl;
for(k=0; k < npts;++k)
pts.push_back(1.0);
return 0;
}
if I define Real <- double the Valgrind output is
==15335== Memcheck, a memory error detector
==15335== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==15335== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==15335== Command: /home/soheil/Workspace/tbin/test_memory_usage
==15335==
sizeof(Real): 8
==15335==
==15335== HEAP SUMMARY:
==15335== in use at exit: 616 bytes in 6 blocks
==15335== total heap usage: 5,000,053 allocs, 5,000,047 frees, 120,015,245 bytes allocated
==15335==
==15335== LEAK SUMMARY:
==15335== definitely lost: 0 bytes in 0 blocks
==15335== indirectly lost: 0 bytes in 0 blocks
==15335== possibly lost: 0 bytes in 0 blocks
==15335== still reachable: 616 bytes in 6 blocks
==15335== suppressed: 0 bytes in 0 blocks
==15335== Rerun with --leak-check=full to see details of leaked memory
==15335==
==15335== For counts of detected and suppressed errors, rerun with: -v
==15335== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
if I define Real <- float the Valgrind output is
==15252== Memcheck, a memory error detector
==15252== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==15252== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==15252== Command: /home/soheil/Workspace/tbin/test_memory_usage
==15252==
sizeof(Real): 4
==15252==
==15252== HEAP SUMMARY:
==15252== in use at exit: 616 bytes in 6 blocks
==15252== total heap usage: 5,000,053 allocs, 5,000,047 frees, 120,015,245 bytes allocated
==15252==
==15252== LEAK SUMMARY:
==15252== definitely lost: 0 bytes in 0 blocks
==15252== indirectly lost: 0 bytes in 0 blocks
==15252== possibly lost: 0 bytes in 0 blocks
==15252== still reachable: 616 bytes in 6 blocks
==15252== suppressed: 0 bytes in 0 blocks
==15252== Rerun with --leak-check=full to see details of leaked memory
==15252==
==15252== For counts of detected and suppressed errors, rerun with: -v
==15252== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Each element in a std::list<T> is a linked-list node, so it's a struct containing two pointers, as well as the payload data of type T. For instance, for GCC 4.1.2, it's as follows:
struct _List_node_base
{
_List_node_base* _M_next;
_List_node_base* _M_prev;
// *** Non-virtual member functions ***
};
template<typename _Tp>
struct _List_node : public _List_node_base
{
_Tp _M_data;
};
The size allocated will be the size of that struct; if T is small enough then you may be seeing the figures dominated by struct padding.
So with the GCC definition, that's two 64-bit pointers (so 16 bytes), plus 4 or 8 bytes T, padded up to 8 bytes, so 24 bytes in total, which matches what you're measuring.
To test the theory, try changing Real to be float[2] or double[2].