When I run my code I get
Bus error(core dumped)
When I run it with valgrind I get
==26570== Invalid read of size 8
==26570== at 0x67EDEE6: ??? (in /home/carolinaloureiro/Qt/5.4/gcc_64/lib/libQt5SerialPort.so.5.4.0)
==26570== by 0x67F34CB: ??? (in /home/carolinaloureiro/Qt/5.4/gcc_64/lib/libQt5SerialPort.so.5.4.0)
==26570== by 0x4E3D5F4: classA::function1(bool) (in /home/carolinaloureiro/catkin_ws/src/testpackage/lib/libLIB.so.1)
==26570== by 0x4E3DC75: OptoPorts_private::run() (in /home/carolinaloureiro/catkin_ws/src/testpackage/lib/libLIB.so.1)
==26570== by 0x5875383: ??? (in /home/carolinaloureiro/Qt/5.4/gcc_64/lib/libQt5Core.so.5.4.0)
==26570== by 0x55BDE99: start_thread (pthread_create.c:308)
==26570== by 0x65192EC: clone (clone.S:112)
==26570== Address 0x200000001109da98 is not stack'd, malloc'd or (recently) free'd
Could this be because I didn't include a library properly? I've been trying to solve this problem for a while but I am no sure how to.
Thanks
Related
just trying detecting some potential issues on a small SDL2 program under linux/GCC written in C++17
valgrind report a lot of noisy memory leak about vg_replace_malloc.c that are suggested to be ignored from the official documentation (link)
(Ignore the "vg_replace_malloc.c", that's an implementation detail.)
But later on on the analysis, there is a block of:
==9891== 256 bytes in 4 blocks are definitely lost in loss record 2,243 of 2,414
==9891== at 0x483980B: malloc (vg_replace_malloc.c:309)
==9891== by 0x40156B3: dl_open_worker (in /usr/lib64/ld-2.30.so)
==9891== by 0x4E60407: _dl_catch_exception (in /usr/lib64/libc-2.30.so)
==9891== by 0x40148FD: _dl_open (in /usr/lib64/ld-2.30.so)
==9891== by 0x4EF139B: dlopen_doit (in /usr/lib64/libdl-2.30.so)
==9891== by 0x4E60407: _dl_catch_exception (in /usr/lib64/libc-2.30.so)
==9891== by 0x4E604D2: _dl_catch_error (in /usr/lib64/libc-2.30.so)
==9891== by 0x4EF1B08: _dlerror_run (in /usr/lib64/libdl-2.30.so)
==9891== by 0x4EF1429: dlopen##GLIBC_2.2.5 (in /usr/lib64/libdl-2.30.so)
==9891== by 0x493CC37: ??? (in /usr/lib64/libSDL2-2.0.so.0.12.0)
==9891== by 0x4941DC5: ??? (in /usr/lib64/libSDL2-2.0.so.0.12.0)
==9891== by 0x494C3CC: ??? (in /usr/lib64/libSDL2-2.0.so.0.12.0)
I am wondering if it is some sort of library dependency or a false positive or is obscurely pointing at something related to my code....
Any one could give me more insight how to interpret that definitely lost bytes snippet?
The problem with that output was using SDL2 from package repository that are compiled without debug info.
Therefore recompiling the SDL2 library from source, including debug info, made the valgrind reports a lot clearer and led to solve and understand the issues.
I have the following call to glXGetFBConfigs:
const GLXFBConfig * fbuf_configs = glXGetFBConfigs (
display, screen_id, &fbuf_config_count
)
Upon inspecting it with valgrind, it produces 137 loss records for a function dlopen_doit, which subsequently calls a malloc. Below is an example of one such record, however all 137 are essentially uniform, such that dlopen_doit is the culprit.
==12317== 9 bytes in 1 blocks are indirectly lost in loss record 16 of 137
==12317== at 0x483579F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12317== by 0x6B1FB5B: ??? (in /usr/lib64/libnvidia-glcore.so.435.21)
==12317== by 0x6B19F54: ??? (in /usr/lib64/libnvidia-glcore.so.435.21)
==12317== by 0x6B11934: ??? (in /usr/lib64/libnvidia-glcore.so.435.21)
==12317== by 0x6B2337F: ??? (in /usr/lib64/libnvidia-glcore.so.435.21)
==12317== by 0x58145B5: ??? (in /usr/lib64/opengl/nvidia/lib/libGLX_nvidia.so.435.21)
==12317== by 0x400F29B: call_init.part.0 (in /lib64/ld-2.29.so)
==12317== by 0x400F3D8: _dl_init (in /lib64/ld-2.29.so)
==12317== by 0x40131E2: dl_open_worker (in /lib64/ld-2.29.so)
==12317== by 0x4DC5F90: _dl_catch_exception (in /lib64/libc-2.29.so)
==12317== by 0x4012AD9: _dl_open (in /lib64/ld-2.29.so)
==12317== by 0x4E5529B: dlopen_doit (in /lib64/libdl-2.29.so)
I am unable to find any information on-line regarding dlopen_doit, and the manual page for glXGetFBConfigs is hardly extensive, and does not mention any need to manually free its return value. I only was able to discover that glXGetFBConfigs was the cause by slowly isolating all other function calls, as it is mentioned nowhere in the valgrind trace.
What is the cause and potential solution for such behaviour ?
When running my test c++ app against my dynamic library which links against NVIDIA's libGL.so I am getting the following errors (see below) reported by Valgrind. I am tempted to suppress them, but I am not sure if this is my issue or something libnvidia-glcore.so has. Part of the unsurety stems form not fully understanding Valgrind's output. I have looked into what variables might be uninitialized in my code in the call to glXCreateContextAttribsARB but I do not see any there. If it appears from the output to by my issue what types of things am I looking for? The two errors I am getting are:
==10156== Conditional jump or move depends on uninitialised value(s)
==10156== at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7DEEADC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F75DA1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F775D3: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F760F5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F3E353: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x4E535F2: opengl_core::render_system::init() (x11_render_system.cpp:92)
==10156== by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156== Uninitialised value was created by a heap allocation
==10156== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156== by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156== by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156== by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156==
==10156== Conditional jump or move depends on uninitialised value(s)
==10156== at 0x7E4CAF4: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7DEE0CD: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7DF085F: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F4B78B: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F4CFBC: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7E279BE: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7E27D21: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F4BFE0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F38ED5: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7B20F52: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7F3E2CB: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7A8C9C0: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== Uninitialised value was created by a heap allocation
==10156== at 0x4C29BCF: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==10156== by 0x5116428: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x7EECF2E: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7E479C1: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x7DC8C31: ??? (in /usr/lib64/nvidia/libnvidia-glcore.so.346.47)
==10156== by 0x50BF331: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x50EB72A: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x50EEA87: ??? (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x50E47D2: glXCreateContextAttribsARB (in /usr/lib64/nvidia/libGL.so.346.47)
==10156== by 0x4E52EF8: opengl_core::render_context::init(opengl_core::render_window&, opengl_core::fb_config&) (x11_render_context.cpp:120)
==10156== by 0x4E534D0: opengl_core::render_system::init() (x11_render_system.cpp:65)
==10156== by 0x4040D8: test_render_system::run() (test_x11_render_system.cpp:10)
==10156==
As per request:
// src/x11_render_system.cpp
91 m_impl->m_context.make_current(m_impl->m_window);
92 glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
93 glClearColor(1.0, 0.0, 0.0, 1.0);
94 glXSwapBuffers(display, window);
95 m_impl->m_context.make_not_current();
Valgrind is quite prone to false positive with critical hardware drivers (such as GPU drivers) due to the way they work. Basically, these drivers access the GPU's memory (and even registers) through user space (virtual RAM) which is setup by the BIOS (this is POSIX mmap at work). This way, the driver can access device's registers through arbitrary addresses, like any other variable.
The point is that some device's registers are only meant to be read. For example, they could reflect some status of the device. Thus, only the device have a reason to write them (and even if the CPU tried to do this, it would fail). Most of the time, it does so internally at power up, and from time to time when status change, and it reflects to user space when mapping is setup. In essence, these are pure volatile variables... even more volatile than the usual thread to thread conception of it, which by the way is well handled by Valgrind since it emulates CPU.
But Valgrind lives in a determinist world (CPU and RAM) and these GPU's registers are completely out of this world. When the driver read them, Valgrind simply think it is accessing RAM (due to mmap), which is definitely not true. Thus, at the point the driver use the read data (some device status) to branch accordingly, Valgrind reports because nothing in its world ever wrote this data.
Let's be honest: proprietary drivers are not open-source, so it's hard to guess what is really happening, but it is likely something similar. What I can tell for sure is that this is happening with Valgrind and GPU drivers since ages (even with very small programs), mainly during initializations and everybody agrees these are false positives. Thus, you can safely ignore it... or create a suppression file for Valgrind in your project (let's name it valgrind.supp):
{
NVidia-driver
Memcheck:Cond
obj:/usr/lib64/nvidia/libnvidia-glcore.so.346.47
}
Then you call Valgrind with the option --suppressions=valgrind.supp and it will no longer report these false positive.
You may have other driver objects related to this, just add entries for them (you'll have to repeat the whole {...} and modify the object line to match what Valgrind reports). You may also have to update them everytime you update your driver since the version changes, though I guess you can use basic wildcards to avoid this.
Take a look here for more infos on this Valgrind feature.
Take the following code:
bool x_init = false;
int x;
void initX(){
x = 4;
x_init = true;
}
bool X_initialized(){
x_init;
}
//...
if( X_initialized() && x <3){
doSomething(x);
}
In this case it is evident x is not used uninitialized, however the compiler/valgrind have to prove that, and what it sees is that "x<3" is using x without initializing it.. Proving arbitrary stuff about code is generally not possible. So if drivers are obfuscated or just coded without using valgrind ( driver vendors tends to have milion of tests, so it is likely they rely on their tests more than profiling tools) it is very possible valgrind can't detect that (it's not a failure of valgrind, but a mathematical limit and if you wish a failure about coding style of third parties code).
However you should report that to the maintainers of the code you are using (NVIDIA?), it is possible that's an issue that needs to be fixed.
Another possibility is that at some point their code requires "Random behaviour" and as such they use uninitialized values as source for non deterministic data (there are no silver bullets, if you use coverage tools you'll soon know that is not always possible have 100% coverage, if you use profiling tools they will soon or later fail too)..
Another chance is that those "uninitialized" values are just "volatile" variables that are initialized when drivers are loaded (after system boostrap) and hence the "application" cannot see them as initialized (probably the most plausible case)
You can show the code around x11_render_system.cpp:92
But in my opinion the valgrind might make mistakes also, just ignore it if you did not find any problems errored by valgrind
I'm working on a rather complex piece of software and once in a while it segfaults on exit. I tried to investigate the problem with valgrind, but the output I get does not tell me which of the numerous usages of QString is the problematic one.
I used valgrind with --track-origins=yes, but this also does not help to see which one it is.
==28264== Invalid read of size 4
==28264== at 0x563B66: QBasicAtomicInt::deref() (qatomic_x86_64.h:133)
==28264== by 0x563DC6: QString::~QString() (in build/output/bin/qgis)
==28264== by 0x36F8A395E9: __cxa_finalize (cxa_finalize.c:55)
==28264== by 0x5B94212: ??? (in build/output/lib/libqgis_core.so.2.1.0)
==28264== by 0x36F860FB69: _dl_fini (dl-fini.c:253)
==28264== by 0x36F8A39278: __run_exit_handlers (exit.c:77)
==28264== by 0x36F8A392C4: exit (exit.c:99)
==28264== by 0x36F8A21B4B: (below main) (libc-start.c:308)
==28264== Address 0x135b30b0 is 0 bytes inside a block of size 40 free'd
==28264== at 0x4A074C4: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==28264== by 0x36C48C31F7: QString::free(QString::Data*) (qstring.cpp:1235)
==28264== by 0x563DDC: QString::~QString() (in build/output/bin/qgis)
==28264== by 0x36F8A39278: __run_exit_handlers (exit.c:77)
==28264== by 0x36F8A392C4: exit (exit.c:99)
==28264== by 0x36F8A21B4B: (below main) (libc-start.c:308)
How can I find the problematic instance of QString? Or what else can I do to track down problems where "below main" cleans up?
I recently had a very (very!) similar problem with one of my global QString's, but this only happened in Qt 5 (5.1.1), not in Qt 4 (4.8.5)... the way I solved it in the end was to run the application through gdb / ddd and let it crash to determine the offending symbol's name. After figuring this out, I simply made it a member of one of my QObject-derived classes (there was actually no reason for it to be global) and this fixed it.
I am analyzing core dump issue . I have run valgrind and look into error log.But I am not able to interpret the following message. Can anyone provide some insight.
I also tried with gdb but I did not get much information. I have looked into other thread and found that it may be centos issue. I am using CentOS release 5.6 (Final) version. I heard that glibc file is not compatible with centos 5.6 but I am not sure about this.Does anyone face this issue any time
==18035==
==18035== Jump to the invalid address stated on the next line
==18035== at 0x0: ???
==18035== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==18035==
==18035==
==18035== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==18035== Bad permissions for mapped region at address 0x0
==18035== at 0x0: ???
==18035== Invalid free() / delete / delete[]
==18035== at 0x47D951D: free (vg_replace_malloc.c:325)
==18035== by 0x3141CD: ??? (in /lib/libc-2.5.so)
==18035== by 0x313D46: ??? (in /lib/libc-2.5.so)
==18035== by 0x47CC3B2: _vgnU_freeres (vg_preloaded.c:62)
==18035== Address 0x198a55e0 is not stack'd, malloc'd or (recently) free'd
==18035==
Jump to the invalid address stated on the next line
This usually means one of two things:
Either you are calling a function through function pointer, and that pointer is NULL, or
You've trashed stack, and return address was overwritten with 0s.
A crash stack trace from GDB might help here.
If this is a stack corruption issue, try using AddressSanitizer (which, unlike Valgrind, does excellent job of detecting stack overflow).