I debug a strange memory issue: When a multithreaded algorithm runs in a loop its memory consumption increases with every iteration although the heap checker of of GooglePerformanceTools says there is no leak. Finally I have made a separate minimal program that reproduces the bug. It seems that the threads are the problem:
#include <stdio.h>
#include <iostream>
#include <vector>
#include "tinythread.h"
using namespace std;
int a(0);
void doNothingAtAll(void*)
{
++a;
}
void startAndJoin100()
{
vector<tthread::thread*> vThreads;
for(int i=0;i<100;++i)
{
vThreads.push_back(new tthread::thread(doNothingAtAll,NULL));
}
while(!vThreads.empty())
{
tthread::thread* pThread(vThreads.back());
pThread->join();
delete pThread;
vThreads.pop_back();
}
}
int main()
{
for(int i=0;i<10;++i)
{
cout<<"calling startAndJoin100()"<<endl;
startAndJoin100();
cout<<"all threads joined"<<endl;
cin.get();
}
return 0;
}
main() calls 10 times startAndJoin100(). It waits for a key stroke after each iteration so that one can take the memory consumption which is (under Ubuntu 17.10, 64-bit):
VIRT
2.1 GB
4 GB
5.9 GB
7.8 GB
9.6 GB
11.5 GB
13.4 GB
15.3 GB
17.2 GB
19.0 GB
Note: C++11 can't be used and the program must compile on Linux and Windows, thus tinythread is used. Minimal test code with Makefile:
geom.at/_downloads/testTinyThread.zip
I answer my own question, this may be useful for somebody later:
Conclusion:
1) I'd really like to keep TinyThread because C++11 is unavailable (VS2008 and old Linux Systems must be supported) and no additional library shall be linked (TinyThread consists only of an *.h and *.cpp file while Boost and other solutions I know require linking a DLL).
2) Valgrind and the heap checker of the GooglePerformanceTools do not report memory leaks and I have looked into the code - it seems to be correct although the virtual memory consumption increases drastically in the minimal example posted above. It seems that the system does not re-use the previously assigned memory pages and I have not found an explanation for this behavior. Thus I do not blame TinyThread++ but it works when pthreads are used directly instead.
3) The workaround: There is a C alternative called TinyCThread: https://tinycthread.github.io/ that works also for C++ and it does not cause the problems observed with TinyThread++.
Related
I have recently moved from Linux to Windows 10 for my OpenGL C++ projects, and the first time I do something with arrays on Windows, they suddenly don't seem to work anymore. Every time I try to create an array from a non-const variable that represents a size, the execution of the program does nothing (meaning no output to the console, but also no error message). It compiles fine. To understand what I mean, look at this code snippet:
#include <iostream>
int main()
{
int s = 10;
float* array = new float[s];
std::cout << *array;
delete[] array;
return 0;
}
It is perfectly legit, but when compiling it with MinGW on Windows and then running it, nothing happens. Yet, using the GCC compiler on Linux writes a 0 to the console, which is perfectly fine and logical, indicating that it is not the code that is flawed. Do you know what could cause this problem? It is really on Windows that I have encountered this behaviour for the first time. Both compilation commands are straight-forward:
g++ array-test.cpp -o array-test
Note that making s a const int fixes the issue, but I do want my arrays to be dynamically allocated, from user input.
I am using a relatively unspectacular text editor (Notepad++) and compile in the Windows PowerShell with MinGW. The supplied code replicates the behaviour on my machine, but I cannot guarantee that it will do so on others as well. Even example code from the internet creating an array with a non-const variable as size will not run.
I am working on a considerably large C++ project with a high emphasis on performance. It therefore relies on the Intel MKL library and on OpenMP. I recently observed a considerable memory leak that I could narrow down to the following minimal example:
#include <atomic>
#include <iostream>
#include <thread>
class Foo {
public:
Foo() : calculate(false) {}
// Start the thread
void start() {
if (calculate) return;
calculate = true;
thread = std::thread(&Foo::loop, this);
}
// Stop the thread
void stop() {
if (!calculate) return;
calculate = false;
if (thread.joinable())
thread.join();
}
private:
// function containing the loop that is continually executed
void loop() {
while (calculate) {
#pragma omp parallel
{
}
}
}
std::atomic<bool> calculate;
std::thread thread;
};
int main() {
Foo foo;
foo.start();
foo.stop();
foo.start();
// Let the program run until the user inputs something
int a;
std::cin >> a;
foo.stop();
return 0;
}
When compiled with Visual Studio 2013 and executed, this code leaks up to 200 MB memory per second (!).
By modifying the above code only a little, the leak totally disappears. For instance:
If the program is not linked against the MKL library (which is obviously not needed here), there is no leak.
If I tell OpenMP to use only one thread, (i.e. I set the environment variable OMP_NUM_THREADS to 1), there is no leak.
If I comment out the line #pragma omp parallel, there is no leak.
If I don't stop the thread and start it again with foo.stop() and foo.start(), there is no leak.
Am I doing something wrong here or am I missing something ?
MKL's parallel (default) driver is built against Intel's OpenMP runtime. MSVC compiles OpenMP applications against its own runtime that is built around the Win32 ThreadPool API. Both most likely don't play nice. It is only safe to use the parallel MKL driver with OpenMP code built using Intel C/C++/Fortran compilers.
It should be fine if you link your OpenMP code with the serial driver of MKL. That way, you may call MKL from multiple threads at the same time and get concurrent serial instances of MKL. Whether n concurrent serial MKL calls are slower than, comparable to or faster than a single threaded MKL call on n threads is likely dependent on the kind of computation and the hardware.
Note that Microsoft no longer support their own OpenMP runtime. MSVC's OpenMP support is stuck at version 2.0, which is more than a decade older than the current specification. There are probably bugs in the runtime (and there are bugs in the compiler's OpenMP support itself) and those are not likely to get fixed. They don't want you to use OpenMP and would like you to favour their own Parallel Patterns Library instead. But PPL is not portable to other platforms (e.g. Linux), therefore you should really be using Intel Treading Building Blocks (TBB). If you want quality OpenMP support under Windows, use the Intel compiler or some of the GCC ports. (I don't work for Intel)
After experiencing crashes when introducing nested calls of std::async in my real program, I was able to reproduce the problem in the following minimum example. It crashes often, but not always. Do you see anything what goes wrong, or is it a compiler or standard library bug? Note that the problem remains if get() calls to the futures are added.
#include <future>
#include <vector>
int main (int, char *[])
{
std::vector<std::future<void>> v;
v.reserve(100);
for (int i = 0; i != 100; ++i)
{
v.emplace_back(std::async(std::launch::async, [] () {
std::async(std::launch::async, [] { });
}));
}
return 0;
}
I observe two different kinds of crashes: (in about every fifth run)
Termination with "This application has requested the Runtime to terminate it in an unusual way."
Termination after throwing an instance of 'std::future_error', what(): Promise already satisfied.
Environment:
Windows 7
gcc version 4.8.2 (i686-posix-dwarf-rev3, Built by
MinGW-W64 project), as provided by Qt 5.3.2
Command line call: g++ -std=c++11 -pthread futures.cpp
Compiled and run on two independent machines
Option -pthread?
Could it be that in my environment for some reason the option -pthread is silently not taken into account? I observe the same behavior with and without that option.
Since this answer is still "unanswered," after talking with some people from Lounge<C++>, I think I can say that it's pretty obvious from the comments that this is due to an implementation error either on MinGW/MinGW-w64's or pthread's part at the time. Using gcc 4.9.1, MinGW-W64, the problem does not appear anymore. In fact, the program above appears to compile and run correctly even on a version earlier than 4.8.2 with POSIX threading.
I myself am not an expert, my guess is that the exact trip-up happens when the program appears to try to write to the same promise twice, which, I think, should be a big no-no, as an std::async should write its result only once (again, I'm not sure if I'm right here, and other comments and edits will most likely clarify).
Also, this may be a related problem: std::future exception on gcc experimental implementation of C++0x
I have a sample application which uses a dynamically linked library library.so. I was measuring CPU usage of the sample application with the top command. But it shows CPU usage of both sample app and library.so per second. But I want to see the CPU usage of only the library.so. Is there anyway to do this? I heard its achievable with htop but could not find out how. I used the tree view but it shows several processes as the sample app process. I could not understand which one is library.so. I am using centos 5.11. Kernel version 3.2.63-1.el5.elrepo.
Given the library is considered part of your program, one way would be to implement the measurement within your code. The following minimal example is implemented on C++11 running only one function from a hypothetical library:
#include <chrono>
#include <iostream>
#include <hypothetical>
int main() {
using namespace std::chrono;
system_clock systemClock;
system_clock::time_point startingTime{systemClock.now()};
hypothetical::function();
system_clock::duration libraryTime{systemClock.now() - startingTime};
std::cout << "Hypothetical library took " << duration_cast<seconds>(libraryTime).count() << " seconds to run.\n";
return 0;
}
You will need to extend this to all of the functions that your program invokes from your library.
I have a pretty involving program that uses an in house FFT algorithm. I recently decided to try using FFTW for a performance increase. Just as a simple test to ensure that FFTW would link and run, I added the following code to the beginning of the application, however, when I run, I get a segmentation fault when I create the fftwf_plan:
const size_t size = 1024;
vector<complex<float> > data(size);
for(size_t i = 0; i < size; ++i) data[i] = complex<float>(i, -i);
fftwf_plan plan =
fftwf_plan_dft_1d(size,
(fftwf_complex*)&data[0],
(fftwf_complex*)&data[0],
FFTW_FORWARD,
FFTW_ESTIMATE);
// ^ seg faults here ^
fftwf_execute(plan);
fftwf_destroy_plan(plan);
Any ideas what would be causing this?
Using FFTW 3.3. Tried 2 different compilers, g++ 4.1.1 and icc 11.1. Also, the core file file shows nothing of significance:
Thread 1.1: Error at 0x00000000
Stack Trace: PC: 000000, FP=Hex Address
EDIT
I reconfigured FFTW to add debug, using the following commands:
setenv CFLAGS "-fPIC -g -O0"
configure --enabled-shared --enable-float --enable-debug
make
make install
When the program has a segmentation fault, it is in a random location in the fftwf_plan_dft_1d() method, however, the stack trace allways shows that is in or below the function search which is called by mkplan.
Aparently the issue stems from multi-threading. Even though the main functions are thread safe in FFTW (e.g. fftwf_execute), the functions to create a plan are not. This doesn't fully explain why just running a test on startup failed, however, when I excapsulated the plan creation in mutex locks, the segmentation faults ceased.
The creation and destruction of plans must be single threaded
fftw_init_threads();
#pragma omp parallel for
for(i=0;i<n;i++) {
#pragma omp critical {
plan = fftw_create_plan....
}
fftw_execute(plan); // or the fftw_execute_dft for multiple in/out fft operations
#pragma omp critical {
fftw_destroy_plan(plan);
}
}
fftw_cleanup_threads();
I'm 3 years late, but I've just stumbled upon a very similar problem, also when using multi-threading (--enable-openmp and fftw_plan_with_nthreads(omp_get_max_threads())). Mine seg faulted on fftw_destroy_plan(p).
It turned out that I didn't pay attention when restructuring my code, and I was calling fftw_cleanup_threads() before calling fftw_destroy_plan(p) ... silly, I know, but it got me chasing my tail for about 1h.
When using multi-threading, fftw_cleanup_threads() needs to be called after all fftw* functions, just as fftw_init_threads() needs to be called before any fftw* function.