Multi-threading and atomicity/mem leaks

Multi-threading and atomicity/mem leaks - c++

So, I'm implementing a program with multiple threads (pthreads), and I am looking for help on a few points. I'm doing c++ on linux. All of my other questions have been answered by Google so far, but there are still two that I have not found answers for.
Question 1: I am going to be doing a bit of file I/O and web-page getting/processing within my threads. Is there anyway to guarantee what the threads do to be atomic? I am going to be letting my program run for quite a while, more than likely, and it won't really have a predetermined ending point. I am going to be catching the signal from a ctrl+c and I want to do some cleanup afterwards and still want my program to print out results/close files, etc.
I'm just wondering if it is reasonable behavior for the program to wait for the threads to complete or if I should just kill all the threads/close the file and exit. I just don't want my results to be skewed. Should I/can I just do a pthread_exit() in the signal catching method?
Any other comments/ideas on this would be nice.
Question 2: Valgrind is saying that I have some possible memory leaks. Are these avoidable, or does this always happen with threading in c++? Below are two of the six or so messages that I get when checking with valgrind.
I have been looking at a number of different websites, and one said that some possible memory leaks could be because of sleeping a thread. This doesn't make sense to me, nevertheless, I am currently sleeping the threads to test the setup I have right now (I'm not actually doing any real I/O at the moment, just playing with threads).
==14072== 256 bytes in 1 blocks are still reachable in loss record 4 of 6
==14072== at 0x402732C: calloc (vg_replace_malloc.c:467)
==14072== by 0x400FDAC: _dl_check_map_versions (dl-version.c:300)
==14072== by 0x4012898: dl_open_worker (dl-open.c:269)
==14072== by 0x400E63E: _dl_catch_error (dl-error.c:178)
==14072== by 0x4172C51: do_dlopen (dl-libc.c:86)
==14072== by 0x4052D30: start_thread (pthread_create.c:304)
==14072== by 0x413A0CD: clone (clone.S:130)
==14072==
==14072== 630 bytes in 1 blocks are still reachable in loss record 5 of 6
==14072== at 0x402732C: calloc (vg_replace_malloc.c:467)
==14072== by 0x400A8AF: _dl_new_object (dl-object.c:77)
==14072== by 0x4006067: _dl_map_object_from_fd (dl-load.c:957)
==14072== by 0x4007EBC: _dl_map_object (dl-load.c:2250)
==14072== by 0x40124EF: dl_open_worker (dl-open.c:226)
==14072== by 0x400E63E: _dl_catch_error (dl-error.c:178)
==14072== by 0x4172C51: do_dlopen (dl-libc.c:86)
==14072== by 0x4052D30: start_thread (pthread_create.c:304)
==14072== by 0x413A0CD: clone (clone.S:130)
I am creating my threads with:
rc = pthread_create(&threads[t], NULL, thread_stall, (void *)NULL);
(rc = return code). At the end of the entry point, I call pthread_exit().

Here's my take:
1.If you want your threads to exit gracefully (killing them with open file or socket handles is never a good idea), have them loop on a termination flag:
while(!stop)
{
do work
}
Then when you catch the ctrl-c set the flag to true and then join them. Make sure to declare stop as std::atomic<bool> to make sure all the threads see the updated value. This way they will finish the current batch of work and then exit gracefully when checking the condition next time.
2.I don't have enough information about your code to answer this.

Related

How do I understand Invalid read in Valgrind, where address is bigger than the alloc'd block size

I am new to Valgrind. Got these Valgrind message:
==932767== Invalid read of size 16
==932767== at 0x3D97D2B9AA: __strcasecmp_l_sse42 (in /lib64/libc-2.12.so)
...
==932767== Address 0x8c3e170 is 9 bytes after a block of size 7 alloc'd
==932767== at 0x6A73B4A: malloc (vg_replace_malloc.c:296)
==932767== by 0x34E821195A: ???
Here I have two questions:
the allocated block is 7 bytes, then how come the address 0x8c3e170 is in 9 bytes? Normally the pointed size is smaller than the allocated size. So under what circumstance we will meet the above issue?
the Invalide read size is 16bytes. Does it include the 2 extra bytes from "Address 0x8c3e170 is 9 bytes after a block of size 7 alloc'd"

If it weren't for the ellipsis I would say the Address 0x8c3e170... msg is directly related to the Invalid read of size 16 because it's indented further.
It's possible to get false positives, so don't rule that out. For example, it's possible that strcasecmp is reading more than it needs to as an optimization.
I read the 2nd message as the address being read from starts 9 bytes after the end of a block of size 7.
I have two suggestions, either of which will probably help you track this down:
1) Run your application under valgrind such that you can attach in a separate terminal window with gdb:
~ valgrind --vgdb=yes --vgdb-error=0 your_program
in another window:
~ gdb your_program
(gdb) target remote | vgdb
This option makes it halt as though a breakpoint were set on every problem valgrind finds
2) Compile with the undefined and/or memory sanitizers either with clang or gcc (4.9 or higher). They catch the same sorts of issues, but I find the error messages more informative.

false positive "Conflicting load" with DRD?

Analyzing my C++ code with DRD (valgrind) finds a "Conflicting load", but I cannot see why. The code is as follows:
int* x;
int Nt = 2;
x = new int[Nt];
omp_set_num_threads(Nt);
#pragma omp parallel for
for (int i = 0; i < Nt; i++)
{
x[i] = i;
}
for (int i = 0; i < Nt; i++)
{
printf("%d\n", x[i]);
}
The program behaves well, but DRD sees an issue when the master thread prints out the value of x[1]. Apart from possible false sharing due to how the x array is allocated, I do not see why there should be any conflict, and how to avoid it... Any insights, please?
EDIT Here's the DRD output for the above code (line 47 corresponds to the printf statement):
==2369== Conflicting load by thread 1 at 0x06031034 size 4
==2369== at 0x4008AB: main (test.c:47)
==2369== Address 0x6031034 is at offset 4 from 0x6031030. Allocation context:
==2369== at 0x4C2DCC7: operator new[](unsigned long) (vg_replace_malloc.c:363)
==2369== by 0x400843: main (test.c:37)
==2369== Other segment start (thread 2)
==2369== at 0x4C31EB8: pthread_mutex_unlock (drd_pthread_intercepts.c:703)
==2369== by 0x4C2F00E: vgDrd_thread_wrapper (drd_pthread_intercepts.c:236)
==2369== by 0x5868D95: start_thread (in /lib64/libpthread-2.15.so)
==2369== by 0x5B6950C: clone (in /lib64/libc-2.15.so)
==2369== Other segment end (thread 2)
==2369== at 0x5446846: ??? (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.3/libgomp.so.1.0.0)
==2369== by 0x54450DD: ??? (in /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.3/libgomp.so.1.0.0)
==2369== by 0x4C2F014: vgDrd_thread_wrapper (drd_pthread_intercepts.c:355)
==2369== by 0x5868D95: start_thread (in /lib64/libpthread-2.15.so)
==2369== by 0x5B6950C: clone (in /lib64/libc-2.15.so)

GNU OpenMP runtime (libgomp) implements OpenMP thread teams using a pool of threads. After they are created, the threads sit docked at a barrier where they wait to be awaken to perform a specific task. In GCC these tasks come in the form of outlined (the opposite of inlined) code segments, i.e. the code for the parallel region (or for the explicit OpenMP task) is extracted into a separate function and that is supplied to some of the waiting threads as a task for execution. The docking barrier is then lifted and the threads start executing the task. Once that is finished, the threads are docked again - they are not joined, but simply put on hold. Therefore from DRD's perspective the master thread, which executes the serial part of the code after the parallel region, is accessing without protection resources that might be written to by the other threads. This of course cannot happen since the other threads are docked and waiting for a new task.
Such false positives are common with general tools like DRD that do not understand the specific semantics of OpenMP. Those tools are thus not suitable for analysis of OpenMP programs. You should use instead a specialised tool, e.g. the free Thread Analyzer from Sun/Oracle Solaris Studio for Linux or the commercial Intel Inspector. The latter can be used for free with a license for non-commercial development purposes. Both tools understand the specifics of OpenMP and won't present such situations as possible data races.

c++ new operator takes lots of memory (67MB) via libstdc++

I have some issues with the new operator in libstdc++. I wrote a program in C++ and had some problems with the memory management.
After having debugged with gdb to determine what is eating up my ram I got the following for info proc mappings
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x400000 0x404000 0x4000 0 /home/sebastian/Developement/powerserverplus-svn/psp-job-distributor/Release/psp-job-distributor
0x604000 0x605000 0x1000 0x4000 /home/sebastian/Developement/powerserverplus-svn/psp-job-distributor/Release/psp-job-distributor
0x605000 0x626000 0x21000 0 [heap]
0x7ffff0000000 0x7ffff0021000 0x21000 0
0x7ffff0021000 0x7ffff4000000 0x3fdf000 0
0x7ffff6c7f000 0x7ffff6c80000 0x1000 0
0x7ffff6c80000 0x7ffff6c83000 0x3000 0
0x7ffff6c83000 0x7ffff6c84000 0x1000 0
0x7ffff6c84000 0x7ffff6c87000 0x3000 0
0x7ffff6c87000 0x7ffff6c88000 0x1000 0
0x7ffff6c88000 0x7ffff6c8b000 0x3000 0
0x7ffff6c8b000 0x7ffff6c8c000 0x1000 0
0x7ffff6c8c000 0x7ffff6c8f000 0x3000 0
0x7ffff6c8f000 0x7ffff6e0f000 0x180000 0 /lib/x86_64-linux-gnu/libc-2.13.so
0x7ffff6e0f000 0x7ffff700f000 0x200000 0x180000 /lib/x86_64-linux-gnu/libc-2.13.so
0x7ffff700f000 0x7ffff7013000 0x4000 0x180000 /lib/x86_64-linux-gnu/libc-2.13.so
0x7ffff7013000 0x7ffff7014000 0x1000 0x184000 /lib/x86_64-linux-gnu/libc-2.13.so
That's just snipped out of it. However, everything is normal. Some of this belongs to the code for the standard libs, some if it is heap and some of it are stack sections for threads I created.
But. there is this one section I id not figure out why it is allocated:
0x7ffff0000000 0x7ffff0021000 0x21000 0
0x7ffff0021000 0x7ffff4000000 0x3fdf000 0
These two sections are created at a seemlike random time. There is several hours of debugging no similarity in time nor at a certain created thread or so. I set a hardware watch point with awatch *0x7ffff0000000 and gave it several runs again.
These two sections are created at nearly the same time within the same code section of a non-debuggable function (gdb shows it in stack as in ?? () from /lib/x86_64-linux-gnu/libc.so.6). More exact this is a sample stack where it occured:
#0 0x00007ffff6d091d5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff6d0b2bd in calloc () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7dee28f in _dl_allocate_tls () from /lib64/ld-linux-x86-64.so.2
#3 0x00007ffff77c0484 in pthread_create##GLIBC_2.2.5 () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007ffff79d670e in Thread::start (this=0x6077c0) at ../src/Thread.cpp:42
#5 0x000000000040193d in MultiThreadedServer<JobDistributionServer_Thread>::Main (this=0x7fffffffe170) at /home/sebastian/Developement/powerserverplus-svn/mtserversock/src/MultiThreadedServer.hpp:55
#6 0x0000000000401601 in main (argc=1, argv=0x7fffffffe298) at ../src/main.cpp:29
Another example would be here (from a differet run):
#0 0x00007ffff6d091d5 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff6d0bc2d in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff751607d in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x000000000040191b in MultiThreadedServer<JobDistributionServer_Thread>::Main (this=0x7fffffffe170) at /home/sebastian/Developement/powerserverplus-svn/mtserversock/src/MultiThreadedServer.hpp:53
#4 0x0000000000401601 in main (argc=1, argv=0x7fffffffe298) at ../src/main.cpp:29
The whole thing says that it occurs at the calloc called from the pthread lib or in another situation it was the new operator or the malloc called from it. It doesn't matter which new it is - in several runs it occured at nearly every new or thread creation in my code. The only "constant" thing with it is that it occurs every time in the libc.so.6.
No matter at which point of the code,
no matter if used with malloc or calloc,
no matter after how much time the program ran,
no matter after how many threads have been created,
it is always that section: 0x7ffff0000000 - 0x7ffff4000000.
Everytime the program runs. But everytime at another point in the program. I am really confused because it allocated 67MB of virtual space but it does not use it.
When watching the variables it created there, especially watched those which are created when malloc or calloc were called by libc, none of this space is used by them. They are created in a heap section which is far away from that address range (0x7ffff0000000 - 0x7ffff4000000).
Edit:
I checked the stack size of the parent process too and got a usage of 8388608 Bytes, which is 0x800000 (~8MB). To get these values I did:
pthread_attr_t attr;
size_t stacksize;
struct rlimit rlim;
pthread_attr_init(&attr);
pthread_attr_getstacksize(&attr, &stacksize);
getrlimit(RLIMIT_STACK, &rlim);
fit into a size_t variable. */
printf("Resource limit: %zd\n", (size_t) rlim.rlim_cur);
printf("Stacksize: %zd\n", stacksize);
pthread_attr_destroy(&attr);
Please help me with that. I am really confused about that.

It looks like it is allocating a stack space for a thread.
The space will be used as you make function calls in the thread.
But really what is is doing is none of your business. It is part of the internal implementation of pthread_create() it can do anything it likes in there.

Boost thread memory usage on 64bit linux

I have been using boost threads on 32bit linux for some time and am very happy with their performance so far. Recently the project was moved to a 64bit platform and we saw a huge increase in memory usage (from about 2.5gb to 16-17gb). I have done profiling and found that the boost threads are the source of the huge allocation. Each thread is allocating about 10x what it was doing on 32bit.
I profiled using valgrind's massif and have confirmed the issue using only boost threads in a separate test application. I also tried using std::threads instead and these do not exhibit the large memory allocation issue.
I am wondering if anyone else has seen this behaviour and knows what the problem is? Thanks.

There's no problem. This is virtual memory, and each 64-bit process can allocate terabytes of virtual memory on every modern operating system. It's basically free and there's no reason to care about how much of it used.
It's basically just reserved space for thread stacks. You can reduce it, if you want, by changing the default stack size. But there's absolutely no reason to.

1. stack size of per-thread
use pthread_attr_getstacksize to view. use boost::thread::attributes to change (pthread_attr_setstacksize).
2. pre-mmap for per-thread in glibc's malloc
gdb example of boost.thread
0 0x000000000040ffe0 in boost::detail::get_once_per_thread_epoch() ()
1 0x0000000000407c12 in void boost::call_once<void (*)()>(boost::once_flag&, void (*)()) [clone .constprop.120] ()
2 0x00000000004082cf in thread_proxy ()
3 0x000000000041120a in start_thread (arg=0x7ffff7ffd700) at pthread_create.c:308
4 0x00000000004c5cf9 in clone ()
5 0x0000000000000000 in ?? ()
you will discover data=malloc(sizeof(boost::uintmax_t)); in get_once_per_thread_epoch ( boost_1_50_0/libs/thread/src/pthread/once.cpp )
continue
1 0x000000000041a0d3 in new_heap ()
2 0x000000000041b045 in arena_get2.isra.5.part.6 ()
3 0x000000000041ed13 in malloc ()
4 0x0000000000401b1a in test () at pthread_malloc_8byte.cc:9
5 0x0000000000402d3a in start_thread (arg=0x7ffff7ffd700) at pthread_create.c:308
6 0x00000000004413d9 in clone ()
7 0x0000000000000000 in ?? ()
in new_heap function (glibc-2.15\malloc\arena.c), it will pre-mmap 64M memory for per-thread in 64bit os. in other words, per-thread will use 64M + 8M (default thread stack) = 72M.
glibc-2.15\ChangeLog.17
2009-03-13 Ulrich Drepper <drepper#redhat.com>
* malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features.
* malloc/arena.c: Likewise.
* malloc/hooks.c: Likewise.
http://wuerping.github.io/blog/malloc_per_thread.html

What do the numbers in the valgrind's outputs mean?

I have this output from valgrind:
==4836== 10,232 bytes in 1 blocks are still reachable in loss record 1 of 1
==4836== at 0x4C2779D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4836== by 0x401865: thrt() (main.cpp:221)
==4836== by 0x4048B1: main (tester.cpp:35)
what does ==4836== mean?
what does 0x4C2779D mean?

The answer to your first question: that number represents the process ID.
Look at the official source.
From this same source, we can see the answer to your second question:
The code addresses (eg. 0x804838F) are usually unimportant, but occasionally crucial for tracking down weirder bugs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multi-threading and atomicity/mem leaks - c++

Related

How do I understand Invalid read in Valgrind, where address is bigger than the alloc'd block size

false positive "Conflicting load" with DRD?

c++ new operator takes lots of memory (67MB) via libstdc++

Boost thread memory usage on 64bit linux

What do the numbers in the valgrind's outputs mean?

Categories

Resources