Virtual storage increases for a continuously running application - c++

Before I ask my question let me explain my environment:
I have a C/C++ application that runs continuously (Infinite loop) inside an embedded Linux device.
The application records some data from the system and stores them in text files on an SD-card (1 file per day).
The recording occurs on a specific trigger detected from the systems (each 5 minutes for example) and each trigger inserts a new line in the text files.
Typical datatypes used within the application are: (o/i)stream, char arrays, char*, c_str() function, structs and struct*, static string arrays, #define, enums, FILE*, vector<>, and usual ones (int, string, etc.). Some of these datatypes are passed as arguments to functions.
The application is cross compiled with a custom GCC compiler within a Buildroot and BusyBox package for the device's CPU Atmel AT91RM9200QU.
The application executes some system commands using popen in which the output is read using the resulting FILE*
Now the application is running for three days and I noticed an increase of 32 KB byte in the virtual storage (VSZ from the top command) each day. By mistake the device restarted, I launched the application again and the VSZ value started from the usual value on each fresh start (about 2532 KB).
I developed another application that monitors the VSZ value for the application and it is scheduled using crontab each on each our to start monitor. I noticed at some point during the day the 32 KB I noticed happened 4 KB each hour.
So the main question is, what would be the reason that the VSZ increase ? Eventually it will reach a limit causing the system to crash that is my concern because the device have approx. 27 MB of RAM.
Update: Beside the VSZ value, the RSS also increases. I ran the application under valgrind --leak-check=full and after the first recording I aborted the application and the following message appeared many many times!.
==28211== 28 bytes in 1 blocks are possibly lost in loss record 15 of 52
==28211== at 0x4C29670: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==28211== by 0x4EF33D8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x4EF4B00: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x4EF4F17: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x403842: __static_initialization_and_destruction_0 (gatewayfunctions.h:28)
*==28211== by 0x403842: _GLOBAL__sub_I__Z18szBuildUDPTelegramSsii (gatewayfunctions.cpp:396)
==28211== by 0x41AE7C: __libc_csu_init (elf-init.c:88)
==28211== by 0x5676A94: (below main) (in /lib64/libc-2.19.so)
The same message appears, except that the line with * appears with a different file name. The other thing I notice, line 28 of file gatewayfunctions.h is a static string array declaration, this array is used in two files only. Any suggestions ?

Related

Powercenter 10x "Error : Session cannot be aborted or restarted"

I am working on one of the Powercenter 10x Transformations & Workflows and faced this error and unable to view the session logs and the emtire system is not stuck, everytime i have to force restart my laptop.I am having pretty much good configuration on my laptop with 32 GB RAM and 1 TB SSD hard disk. I even tried to recycle the integration services, but even that was also stuck and not responsive, any help is much appreciated.
(Thread 0x53deb940 (LWP 28161)):
0x000000385a87aefe in memcpy () from /lib64/libc.so.6
0x00002ba5bfb20def in zstrbuf::expand() () from /opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb20e5d in zstrbuf::overflow(int) () from /opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb1ee2a in zstreambuf::xsputn(unsigned short const*, int) () from
/opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb1e817 in zostream::write(unsigned short const*, int) () from
/opt/infa/pc/v901/server/bin/libpmuti.so
0x00000000005d9bdc in sendEMail(PmUString const&, PmUString const&, PmUString const&,
PMTValOrderedVector const&, SVarParamManager const*, eEmailType, unsigned int, int&) ()
0x0000000000567f8d in SSessionTask::sendPostSessionEmailForDTM(SSessionInfo*) ()
0x0000000000568a96 in SSessionTask::finishImpl() ()
0x0000000000595665 in STask::finish() ()
0x0000000000565f42 in SSessionTask::handlePrepareLBGroupNotification(STaskLBJobRequest*, ILBResult
const*, ILBRequestBase::EILBEvent, PmUString const&) ()
0x0000000000566c85 in SSessionTask::handleLBNotification(STaskLBGroup*, STaskLBJobRequest*,
ILBResult*&, ILBRequestBase::EILBEvent, PmUString const&) ()
0x0000000000582fc0 in SWorkflow::handleLBNotification(STask*, STaskLBGroup*, STaskLBJobRequest*,
ILBResult*&, ILBRequestBase::EILBEvent, PmUString const&) ()
0x00000000004facb2 in SHandleLBNotificationJob::execute()
Tracing level in Informatica defines the amount of data you wish to write in the session log when you execute the workflow. Tracing level is a very important aspect in Informatica as it helps in analyzing the error.
Terse: When you set the tracing level as terse, Informatica stores error information and information of rejected records. Terse tracing level occupies less space as compared to normal.
Default tracing level is normal. You can change the tracing level to terse to enhance the performance. Tracing level can be defined at an individual transformation level, or you can override the tracing level by defining it at the session level.
Please try to change the tracing level and run the Workflow once again. I hope this resolves your system issue.

Occasional crash in destructor when cleaning up owned (!) string member

I am trying to track down a bug that occasionally crashes my app in the destructor of this trivial C++ class:
class CrashClass {
public:
CrashClass(double r1, double s1, double r2, double s2, double r3, double s3, string dateTime) : mR1(r1), mS1(s1), mR2(r2), mS2(s2), mR3(r3), mS3(s3), mDateTime(dateTime) { }
CrashClass() : mR1(0), mS1(0), mR2(0), mS2(0), mR3(0), mS3(0) { }
~CrashClass() {}
string GetDateTime() { return mDateTime; }
private:
double mR1, mS1, mR2, mS2, mR3, mS3;
string mDateTime;
};
A bunch of those objects is stuck in a standard C++ vector and used in a second class:
class MyClass {
(...)
private:
vector<CrashClass> mCrashClassVec;
};
MyClass is created and dealloc'd as required many times over.
The code is using C++17 on the latest Xcode 10.1 under macOS 10.14.4.
All of this is part of a computationally intensive simulation app running for multiple hours to days. On a 6-core i7 machine running 12 calculations in parallel (using macOS' GCD framework) this frequently crashes after a couple of hours with a
pointer being freed was not allocated
error when invoking mCrashClassVec.clear() on the member in MyClass, i.e.
frame #0: 0x00007fff769a72f6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00000001004aa80d libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff769116a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff76a1f977 libsystem_malloc.dylib`malloc_vreport + 545
frame #4: 0x00007fff76a1f738 libsystem_malloc.dylib`malloc_report + 151
frame #5: 0x0000000100069448 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__libcpp_deallocate(__ptr=<unavailable>) at new:236 [opt]
frame #6: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<char>::deallocate(__p=<unavailable>) at memory:1796 [opt]
frame #7: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator_traits<std::__1::allocator<char> >::deallocate(__p=<unavailable>) at memory:1555 [opt]
frame #8: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1941 [opt]
frame #9: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1936 [opt]
frame #10: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #11: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #12: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<CrashClass>::destroy(this=<unavailable>, __p=<unavailable>) at memory:1860 [opt]
frame #13: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::__destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1727 [opt]
frame #14: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1595 [opt]
frame #15: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::__destruct_at_end(this=<unavailable>, __new_last=0x00000001011ad000) at vector:413 [opt]
frame #16: 0x0000000100069429 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:356 [opt]
frame #17: 0x0000000100069422 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::vector<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:749 [opt]
Side note: The vector being cleared might have no elements (yet).
In the stacktrace (bt all) I can see other threads performing operations on their copies of CrashClass vectors but as far as I can see from comparing addresses in the stack trace all of those are in fact private copies (as designed), i.e. none of this data is shared between the threads.
Naturally the bug only occurs in full production mode, i.e. all attempts to reproduce the crash
running in DEBUG mode,
running under Lldb's (Xcode's) Address Sanitizer (for many hours/overnight),
running under Lldb's (Xcode's) Thread Sanitizer (for many hours/overnight),
running a cut-down version of the class with just the critical code left/replicated,
failed and did not trigger the crash.
Why might deallocating a simple member allocated on the stack fail with a pointer being freed was not allocated error?
Also additional hints on how to debug this or trigger the bug in a more robust to investigate further are very much welcome.
Update 5/2019
The bug is still around intermittently crashing the app and I'm starting to believe that the issues I'm experiencing are actually caused by Intel's data corruption bug in recent CPU models..
https://mjtsai.com/blog/2019/05/17/microarchitectural-data-sampling-mds-mitigation/
https://mjtsai.com/blog/2017/06/27/bug-in-skylake-and-kaby-lake-hyper-threading/
https://www.tomshardware.com/news/hyperthreading-kaby-lake-skylake-skylake-x,34876.html
You might try a few tricks:
Run the production version using a single thread for an even longer duration (say a week or 2) to see if it crashes.
Ensure that you don't consume all available RAM taking into account the fact that you might have memory fragmentation.
Ensure that your program does not have memory leak or increase memory usage the more long it runs.
Add some tracking by adding extra value, set value to something known in destructor (so you would recognize the pattern if you do a double delete).
Try to run the program under another platform and compiler.
Your compiler or library might contains bugs. Try another (more recent) version.
Remove code from the original version until it crashes no more. That works better if you can consistently get the crash with a sequence that somehow corrupt memory.
Once you got a crash, run the program with the exact same data (for each thread) and see if it always crash at the same location.
Rewrite or validate any unsafe code in your application. Avoid casting, printf and other old school variable argument function and any unsafe strcpy and similar function.
Use checked STL version.
Try unoptimized release version.
Try optimized debug version.
Learn the differences between DEBUG and RELEASE version for your compiler.
Rewrite problematic code from zero. Maybe it won't have the bug.
Inspect the data when it crashes.
Review your error/exception handling to see if you ignore some potential problem.
Test how you program behave when it run out of memory, out of disk space, when an exception is thrown…
Ensure that your debugger stop at each thrown exception handled or not.
Ensure that your program compile and run without warnings or that you understand them and are sure it does not matters.
Inspect the data when it crash to see if look good.
You might reserve memory to reduce fragmentation and reallocation. If your program runs for hours, it might be possible that the memory get too much fragmented and the system cannot find a block that is big enough.
Since your program is multithreaded, ensure that your run-time is also compatible with that.
Ensure that you don't share data across thread or that they are adequately protected.

Berkeley DB fails on iPhone in DbEnv::open(char const*, unsigned int, int)

I'm trying to use Berkeley DB for a Bitcon project.
On simulator the project works perfectly. On the device, it fails in DbEnv::open(char const*, unsigned int, int), with this message:
************************
EXCEPTION: 11DbException
DbEnv::open: Operation not supported on socket
bitcoin in AppInit()
All used paths correspond to sandbox restrictions:
Default data directory /private/var/mobile/Applications/CA8DA82B-0540-459F-A634-6BA4A43F9E70/Library/.bitcoin
Loading addresses...
dbenv.open strLogDir=/var/mobile/Applications/CA8DA82B-0540-459F-A634-6BA4A43F9E70/Documents/database strErrorFile=/var/mobile/Applications/CA8DA82B-0540-459F-A634-6BA4A43F9E70/Documents/db.log
UPDATE: On a jailbroken phone still crashes. It means that the issue is not caused by sandbox.
Has anyone met such problems while using Berkeley DB on iPhone?Does anyone know how to fix this problems?

Segmentation fault in __pthread_getspecific called from libcuda.so.1

Problem: Segmentation fault (SIGSEGV, signal 11)
Brief program description:
high performance gpu (CUDA) server handling requests from remote
clients
each incoming request spawns a thread that performs
calculations on multiple GPU's (serial, not in parallel) and sends
back a result to the client, this usually takes anywhere between 10-200ms as each request consists of tens or hundreds of kernel calls
request handler threads have exclusive access to GPU's, meaning that if one thread is running something on GPU1 all others will have to wait until its done
compiled with -arch=sm_35 -code=compute_35
using CUDA 5.0
i'm not using any CUDA atomics explicitly or any in-kernel synchronization barriers, though i'm using thrust (various functions) and cudaDeviceSynchronize() obviously
Nvidia driver: NVIDIA dlloader X Driver 313.30 Wed Mar 27 15:33:21 PDT 2013
OS and HW info:
Linux lub1 3.5.0-23-generic #35~precise1-Ubuntu x86_64 x86_64 x86_64 GNU/Linux
GPU's: 4x GPU 0: GeForce GTX TITAN
32 GB RAM
MB: ASUS MAXIMUS V EXTREME
CPU: i7-3770K
Crash information:
Crash occurs "randomly" after a couple of thousands requests are handled (sometimes sooner, sometimes later). Stack traces from some of the crashes look like this:
#0 0x00007f8a5b18fd91 in __pthread_getspecific (key=4) at pthread_getspecific.c:62
#1 0x00007f8a5a0c0cf3 in ?? () from /usr/lib/libcuda.so.1
#2 0x00007f8a59ff7b30 in ?? () from /usr/lib/libcuda.so.1
#3 0x00007f8a59fcc34a in ?? () from /usr/lib/libcuda.so.1
#4 0x00007f8a5ab253e7 in ?? () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#5 0x00007f8a5ab484fa in cudaGetDevice () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#6 0x000000000046c2a6 in thrust::detail::backend::cuda::arch::device_properties() ()
#0 0x00007ff03ba35d91 in __pthread_getspecific (key=4) at pthread_getspecific.c:62
#1 0x00007ff03a966cf3 in ?? () from /usr/lib/libcuda.so.1
#2 0x00007ff03aa24f8b in ?? () from /usr/lib/libcuda.so.1
#3 0x00007ff03b3e411c in ?? () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#4 0x00007ff03b3dd4b3 in ?? () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#5 0x00007ff03b3d18e0 in ?? () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#6 0x00007ff03b3fc4d9 in cudaMemset () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#7 0x0000000000448177 in libgbase::cudaGenericDatabase::cudaCountIndividual(unsigned int, ...
#0 0x00007f01db6d6153 in ?? () from /usr/lib/libcuda.so.1
#1 0x00007f01db6db7e4 in ?? () from /usr/lib/libcuda.so.1
#2 0x00007f01db6dbc30 in ?? () from /usr/lib/libcuda.so.1
#3 0x00007f01db6dbec2 in ?? () from /usr/lib/libcuda.so.1
#4 0x00007f01db6c6c58 in ?? () from /usr/lib/libcuda.so.1
#5 0x00007f01db6c7b49 in ?? () from /usr/lib/libcuda.so.1
#6 0x00007f01db6bdc22 in ?? () from /usr/lib/libcuda.so.1
#7 0x00007f01db5f0df7 in ?? () from /usr/lib/libcuda.so.1
#8 0x00007f01db5f4e0d in ?? () from /usr/lib/libcuda.so.1
#9 0x00007f01db5dbcea in ?? () from /usr/lib/libcuda.so.1
#10 0x00007f01dc11e0aa in ?? () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#11 0x00007f01dc1466dd in cudaMemcpy () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#12 0x0000000000472373 in thrust::detail::backend::cuda::detail::b40c_thrust::BaseRadixSortingEnactor
#0 0x00007f397533dd91 in __pthread_getspecific (key=4) at pthread_getspecific.c:62
#1 0x00007f397426ecf3 in ?? () from /usr/lib/libcuda.so.1
#2 0x00007f397427baec in ?? () from /usr/lib/libcuda.so.1
#3 0x00007f39741a9840 in ?? () from /usr/lib/libcuda.so.1
#4 0x00007f39741add08 in ?? () from /usr/lib/libcuda.so.1
#5 0x00007f3974194cea in ?? () from /usr/lib/libcuda.so.1
#6 0x00007f3974cd70aa in ?? () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#7 0x00007f3974cff6dd in cudaMemcpy () from /usr/local/cuda-5.0/lib64/libcudart.so.5.0
#8 0x000000000046bf26 in thrust::detail::backend::cuda::detail::checked_cudaMemcpy(void*
As you can see, usually it ends up in __pthread_getspecific called from libcuda.so or somewhere in the library itself. As far as i remember there has been just one case where it did not crash but instead it hanged in a strange way: the program was able to respond to my requests if they did not involve any GPU computation (statistics etc.), but otherwise i never got a reply. Also, doing nvidia-smi -L did not work, it just hung there until i rebooted the computer. Looked to me like a GPU deadlock sort of. This might be a completely different issue than this one though.
Does anyone have a clue where the problem might be or what could cause this?
Updates:
Some additional analysis:
cuda-memcheck does not print any error messages.
valgrind - leak check does print quite a few messages, like those below (there are hundreds like that):
==2464== 16 bytes in 1 blocks are definitely lost in loss record 6 of 725
==2464== at 0x4C2B1C7: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2464== by 0x568C202: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==2464== by 0x56B859D: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==2464== by 0x5050C82: __nptl_deallocate_tsd (pthread_create.c:156)
==2464== by 0x5050EA7: start_thread (pthread_create.c:315)
==2464== by 0x6DDBCBC: clone (clone.S:112)
==2464==
==2464== 16 bytes in 1 blocks are definitely lost in loss record 7 of 725
==2464== at 0x4C2B1C7: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2464== by 0x568C202: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==2464== by 0x56B86D8: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==2464== by 0x5677E0F: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==2464== by 0x400F90D: _dl_fini (dl-fini.c:254)
==2464== by 0x6D23900: __run_exit_handlers (exit.c:78)
==2464== by 0x6D23984: exit (exit.c:100)
==2464== by 0x6D09773: (below main) (libc-start.c:258)
==2464== 408 bytes in 3 blocks are possibly lost in loss record 222 of 725
==2464== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2464== by 0x5A89B98: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5A8A1F2: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5A8A3FF: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5B02E34: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5AFFAA5: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5AAF009: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5A7A6D3: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x59B205C: ??? (in /usr/lib/libcuda.so.313.30)
==2464== by 0x5984544: cuInit (in /usr/lib/libcuda.so.313.30)
==2464== by 0x568983B: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
==2464== by 0x5689967: ??? (in /usr/local/cuda-5.0/lib64/libcudart.so.5.0.35)
More information:
I have tried running on fewer cards (3, as that is the minimum needed for the program) and the crash still occurs.
The above is not true, i misconfigured the application and it used all four cards. Re-running the experiments with really just 3 cards seems to resolve the problem, it is now running for several hours under heavy load without crashes. I will now try to let it run a bit more and maybe then attempt to use a different subset of 3 cards to verify this and at the same time test if the problem is related to one particular card or not.
I monitored GPU temperature during the test runs and there does not seem to be anything wrong. The cards get up to about 78-80 °C under highest load with fan going at about 56% and this stays until the crash happens (several minutes), does not seem to be too high to me.
One thing i have been thinking about is the way the requests are handled - there is quite a lot of cudaSetDevice calls, since each request spawns a new thread (i'm using mongoose library) and then this thread switches between cards by calling cudaSetDevice(id) with appropriate device id. The switching can happen multiple times during one request and i am not using any streams (so it all goes to the default (0) stream IIRC). Can this somehow be related to the crashes occuring in pthread_getspecific ?
I have also tried upgrading to the latest drivers (beta, 319.12) but that didn't help.
If you can identify 3 cards that work, try cycling the 4th card in place of one of the 3, and see if you get the failures again. This is just standard troubleshooting I think. If you can identify a single card that, when included in a group of 3, still elicits the issue, then that card is suspect.
But, my suggestion to run with fewer cards was also based on the idea that it may reduce the overall load on the PSU. Even at 1500W, you may not have enough juice. So if you cycle the 4th card in, in place of one of the 3 (i.e. still keep only 3 cards in the system or configure your app to use 3) and you get no failures, the problem may be due to overall power draw with 4 cards.
Note that the power consumption of the GTX Titan at full load can be on the order of 250W or possibly more. So it might seem that your 1500W PSU should be fine, but it may come down to a careful analysis of how much DC power is available on each rail, and how the motherboard and PSU harness is distributing the 12V DC rails to each GPU.
So if reducing to 3GPUs seems to fix the problem no matter which 3 you use, my guess is that your PSU is not up to the task. Not all 1500W is available from a single DC rail. The 12V "rail" is actually composed of several different 12V rails, each of which delivers a certain portion of the overall 1500W. So even though you may not be pulling 1500W, you can still overload a single rail, depending on how the GPU power is connected to the rails.
I agree that temperatures in the 80C range should be fine, but that indicates (approximately) a fully loaded GPU, so if you're seeing that on all 4 GPUs at once, then you are pulling a heavy load.

opencv namedWindow leak ( c++ and opencv )

Running valgrind, I get loads of memory leaks in opencv, especially with the function of namedWindow.
In the main, I have an image CSImg and PGImg:
std::string cs = "Computer Science Students";
std::string pg = "Politics and Government Students";
CSImg.displayImage(cs);
cv::destroyWindow(cs);
PGImg.displayImage(pg);
cv::destroyWindow(pg);
display image function is:
void ImageHandler::displayImage(std::string& windowname){
namedWindow(windowname);
imshow(windowname, m_image);
waitKey(7000);
}
Valgrind is giving me enormous memory leaks when I do displayImage.
For example:
==6561== 2,359,544 bytes in 1 blocks are possibly lost in loss record 3,421 of 3,421
==6561== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6561== by 0x4F6C94C: cv::fastMalloc(unsigned long) (in /usr/lib/libopencv_core.so.2.3.1)
==6561== by 0x4F53650: cvCreateData (in /usr/lib/libopencv_core.so.2.3.1)
==6561== by 0x4F540F0: cvCreateMat (in /usr/lib/libopencv_core.so.2.3.1)
==6561== by 0x56435AF: cvImageWidgetSetImage(_CvImageWidget*, void const*) (in /usr/lib/libopencv_highgui.so.2.3.1)
==6561== by 0x5644C14: cvShowImage (in /usr/lib/libopencv_highgui.so.2.3.1)
==6561== by 0x5642AF7: cv::imshow(std::string const&, cv::_InputArray const&) (in /usr/lib/libopencv_highgui.so.2.3.1)
==6561== by 0x40CED7: ImageHandler::displayImage(std::string&) (imagehandler.cpp:33)
==6561== by 0x408CF5: main (randomU.cpp:601)
imagehandler.cpp, line 33 is:
imshow(windowname, m_image); //the full function is written above ^
randomU.cpp line 601 is:
CSImg.displayImage(cs);
Any help is appreciated.
Ask for any further info you need.
Sorry, the stark reality looks like OpenCV leaks. It leaks from the side of its Qt interface too due to self-references according to the Leaks Instrument (XCode tools).
Other proof that this is not just a false alarm: On my Mac, Opencv 2.4.3 continuously grows in the memory (according to Activity Monitor) when processing webcam input. (I am not using any pointers or data strorages so theoretically my OpenCV program should remain of constant size.)
Actually you don't need to call namedWindow anymore. You just call a "naked" cv::imshow(windowname,m_image). It works fine even if you overwrite.
REMARK:
waitKey has two usages:
1. to wait forever, then waitKey(0);
2. to wait for just a bit, possibly because you are displaying input from your webcam. Then do waitKey(30); (or less, depending on the fps of what you are playing. For movies, 30.)