Runtime Crash in C++ when built on Mavericks GM - c++

I'm running on the Mavericks GM with Xcode 5.0.1 GM. The OS X SDK used to compile against doesn't seem to matter. I've tried recompiling my company's software with both the 10.8 and 10.9 SDKs. I get the same result compiling for Debug and Release. Oddly enough, if I compile on 10.7 or 10.8 and bring the binaries over to a 10.9 machine, everything works fine.
The software I work on is written in C++ and runs about 600K lines of code. Nothing in our codebase uses lbxpc directly. Some of the biggest external libraries used:
Qt 4.8.5
Boost 1.49.0
OpenCL (dynamically loaded at runtime)
OpenEXR
Growl 1.2.1
Whenever the crash happens, it's at a seemingly random place in the application's main thread.
Has anyone else run into this issue? If so, what was the cause, and how did you fix it?
Disassembly of where the crash happens:
0x7fff8f2e1e3b: leaq 98519(%rip), %rax ; "Bug in libxpc: Domain environment context has overflowed maximum inline message size."
0x7fff8f2e1e42: movq %rax, -389938449(%rip) ; gCRAnnotations + 8
0x7fff8f2e1e49: ud2 <-- crash
Backtrace from lldb:
* thread #4: tid = 0x122764, 0x00007fff8f2e1e49 libxpc.dylib`_xpc_domain_serialize + 496, queue = 'com.apple.root.default-overcommit-priority, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
frame #0: 0x00007fff8f2e1e49 libxpc.dylib`_xpc_domain_serialize + 496
frame #1: 0x00007fff8f2e18ca libxpc.dylib`_xpc_dictionary_serialize_apply + 84
frame #2: 0x00007fff8f2e1497 libxpc.dylib`_xpc_dictionary_apply_node_f + 105
frame #3: 0x00007fff8f2e16af libxpc.dylib`_xpc_dictionary_serialize + 161
frame #4: 0x00007fff8f2e1184 libxpc.dylib`_xpc_serializer_pack + 423
frame #5: 0x00007fff8f2e0f81 libxpc.dylib`_xpc_pipe_pack_message + 118
frame #6: 0x00007fff8f2e0985 libxpc.dylib`xpc_pipe_routine + 99
frame #7: 0x00007fff8f2dff2a libxpc.dylib`_xpc_runtime_init_once + 827
frame #8: 0x00007fff9076c2ad libdispatch.dylib`_dispatch_client_callout + 8
frame #9: 0x00007fff9076c21c libdispatch.dylib`dispatch_once_f + 79
frame #10: 0x00007fff8f2e4144 libxpc.dylib`_xpc_connection_init + 64
frame #11: 0x00007fff8f2e40f6 libxpc.dylib`_xpc_connection_resume_init + 14
frame #12: 0x00007fff9076c2ad libdispatch.dylib`_dispatch_client_callout + 8
frame #13: 0x00007fff9076e09e libdispatch.dylib`_dispatch_root_queue_drain + 326
frame #14: 0x00007fff9076f193 libdispatch.dylib`_dispatch_worker_thread2 + 40
frame #15: 0x00007fff922f0ef8 libsystem_pthread.dylib`_pthread_wqthread + 314
frame #16: 0x00007fff922f3fb9 libsystem_pthread.dylib`start_wqthread + 13
Update with some more information:
I just found out something rather interesting. This issue only happens when the application is launched by Xcode. If I launch it via lldb on the command line, this crash does not occur. Likewise, if I double click on it in Finder, the issue does not occur.

Related

Qt-5.14.0: Vulkan under QML causes std::system_error:: mutex lock failed

The Vulkan under QML example runs for at most a couple of seconds before crashing with the following error:
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
While the animation looks correct during operation, the QML label is not displayed correctly:
I'm running macOS Catalina with MoltenVK 1.1.130 (LunarG Vulkan SDK) and Qt 5.14.0. I've also tried MoltenVK 1.2.131, with the same result. When using MoltenVK with MVK_LOG_LEVEL_INFO enabled, the following message is printed twice every frame:
[mvk-info] vkCreateMacOSSurfaceMVK(): You are not calling this function from the main thread. NSView should only be accessed from the main thread. When using this function outside the main thread, consider passing the CAMetalLayer itself in VkMacOSSurfaceCreateInfoMVK::pView, instead of the NSView.
Question
Does anyone know what could cause this? Is this a bug? Has anyone run this example successfully?
The MVK error message makes it appear a though the Vulkan integration of Qt is broken: Not only is vkCreateMacOSSurfaceMVK called twice every frame, but it also seems to be called from the render thread (not the GUI thread/main thread).
Details
In order to even use Qt with Vulkan, you must compile Qt from source and provide the Vulkan headers. The configuration call I used to compile Qt is:
../qt5/configure -developer-build -skip qtquick3d -skip qtwebengine -opensource -nomake examples -nomake tests -confirm-license -vulkan -I $VULKAN_SDK/../MoltenVK/include -L $VULKAN_SDK/lib
My environment variables are set according to the LunarG documentation:
export VULKAN_SDK="$HOME/SDK/vulkansdk-macos-1.1.130.0/macOS"
export PATH="$VULKAN_SDK/bin:$PATH"
export DYLD_LIBRARY_PATH="$VULKAN_SDK/lib:$DYLD_LIBRARY_PATH"
export VK_ICD_FILENAMES="$VULKAN_SDK/etc/vulkan/icd.d/MoltenVK_icd.json"
export VK_LAYER_PATH="$VULKAN_SDK/etc/vulkan/explicit_layer.d"
export VK_INSTANCE_LAYERS="VK_LAYER_KHRONOS_validation"
export QT_VULKAN_LIB="$VULKAN_SDK/lib/libMoltenVK.dylib"
(Qt requires the QT_VULKAN_LIB to dlopen the library.)
lldb backtrace:
[mvk-info] vkCreateMacOSSurfaceMVK(): You are not calling this function from the main thread. NSView should only be accessed from the main thread. When using this function outside the main thread, consider passing the CAMetalLayer itself in VkMacOSSurfaceCreateInfoMVK::pView, instead of the NSView.
[mvk-info] vkCreateMacOSSurfaceMVK(): You are not calling this function from the main thread. NSView should only be accessed from the main thread. When using this function outside the main thread, consider passing the CAMetalLayer itself in VkMacOSSurfaceCreateInfoMVK::pView, instead of the NSView.
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
Process 83453 stopped
* thread #10, name = 'QSGRenderThread', stop reason = signal SIGABRT
frame #0: 0x00007fff648c57fa libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff648c57fa <+10>: jae 0x7fff648c5804 ; <+20>
0x7fff648c57fc <+12>: movq %rax, %rdi
0x7fff648c57ff <+15>: jmp 0x7fff648bfa89 ; cerror_nocancel
0x7fff648c5804 <+20>: retq
Target 0: (main) stopped.
(lldb) frame info
frame #0: 0x00007fff648c57fa libsystem_kernel.dylib`__pthread_kill + 10
(lldb) frame variable
(lldb) bt
* thread #10, name = 'QSGRenderThread', stop reason = signal SIGABRT
* frame #0: 0x00007fff648c57fa libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff64982bc1 libsystem_pthread.dylib`pthread_kill + 432
frame #2: 0x00007fff6484ca1c libsystem_c.dylib`abort + 120
frame #3: 0x00007fff618e6be8 libc++abi.dylib`abort_message + 231
frame #4: 0x00007fff618e6d84 libc++abi.dylib`demangling_terminate_handler() + 238
frame #5: 0x00007fff63412792 libobjc.A.dylib`_objc_terminate() + 104
frame #6: 0x00007fff618f3dc7 libc++abi.dylib`std::__terminate(void (*)()) + 8
frame #7: 0x00007fff618f3d79 libc++abi.dylib`std::terminate() + 41
frame #8: 0x0000000103942439 libQt5Core_debug.5.dylib`qTerminate() at qglobal.cpp:3333:5
frame #9: 0x000000010341f5f8 libQt5Core_debug.5.dylib`QThreadPrivate::start(arg=0x000000011721cfb0) at qthread_unix.cpp:354:9
frame #10: 0x00007fff64982e65 libsystem_pthread.dylib`_pthread_start + 148
frame #11: 0x00007fff6497e83b libsystem_pthread.dylib`thread_start + 15
I've reported the issue: QTBUG-82600
The fix was merged into Qt 5.15.0 beta2. Although the crashes no longer occur, the text remains mangled. The fix for that is postponed until Qt6: QTBUG-83072

Occasional crash in destructor when cleaning up owned (!) string member

I am trying to track down a bug that occasionally crashes my app in the destructor of this trivial C++ class:
class CrashClass {
public:
CrashClass(double r1, double s1, double r2, double s2, double r3, double s3, string dateTime) : mR1(r1), mS1(s1), mR2(r2), mS2(s2), mR3(r3), mS3(s3), mDateTime(dateTime) { }
CrashClass() : mR1(0), mS1(0), mR2(0), mS2(0), mR3(0), mS3(0) { }
~CrashClass() {}
string GetDateTime() { return mDateTime; }
private:
double mR1, mS1, mR2, mS2, mR3, mS3;
string mDateTime;
};
A bunch of those objects is stuck in a standard C++ vector and used in a second class:
class MyClass {
(...)
private:
vector<CrashClass> mCrashClassVec;
};
MyClass is created and dealloc'd as required many times over.
The code is using C++17 on the latest Xcode 10.1 under macOS 10.14.4.
All of this is part of a computationally intensive simulation app running for multiple hours to days. On a 6-core i7 machine running 12 calculations in parallel (using macOS' GCD framework) this frequently crashes after a couple of hours with a
pointer being freed was not allocated
error when invoking mCrashClassVec.clear() on the member in MyClass, i.e.
frame #0: 0x00007fff769a72f6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00000001004aa80d libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff769116a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff76a1f977 libsystem_malloc.dylib`malloc_vreport + 545
frame #4: 0x00007fff76a1f738 libsystem_malloc.dylib`malloc_report + 151
frame #5: 0x0000000100069448 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__libcpp_deallocate(__ptr=<unavailable>) at new:236 [opt]
frame #6: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<char>::deallocate(__p=<unavailable>) at memory:1796 [opt]
frame #7: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator_traits<std::__1::allocator<char> >::deallocate(__p=<unavailable>) at memory:1555 [opt]
frame #8: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1941 [opt]
frame #9: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1936 [opt]
frame #10: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #11: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #12: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<CrashClass>::destroy(this=<unavailable>, __p=<unavailable>) at memory:1860 [opt]
frame #13: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::__destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1727 [opt]
frame #14: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1595 [opt]
frame #15: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::__destruct_at_end(this=<unavailable>, __new_last=0x00000001011ad000) at vector:413 [opt]
frame #16: 0x0000000100069429 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:356 [opt]
frame #17: 0x0000000100069422 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::vector<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:749 [opt]
Side note: The vector being cleared might have no elements (yet).
In the stacktrace (bt all) I can see other threads performing operations on their copies of CrashClass vectors but as far as I can see from comparing addresses in the stack trace all of those are in fact private copies (as designed), i.e. none of this data is shared between the threads.
Naturally the bug only occurs in full production mode, i.e. all attempts to reproduce the crash
running in DEBUG mode,
running under Lldb's (Xcode's) Address Sanitizer (for many hours/overnight),
running under Lldb's (Xcode's) Thread Sanitizer (for many hours/overnight),
running a cut-down version of the class with just the critical code left/replicated,
failed and did not trigger the crash.
Why might deallocating a simple member allocated on the stack fail with a pointer being freed was not allocated error?
Also additional hints on how to debug this or trigger the bug in a more robust to investigate further are very much welcome.
Update 5/2019
The bug is still around intermittently crashing the app and I'm starting to believe that the issues I'm experiencing are actually caused by Intel's data corruption bug in recent CPU models..
https://mjtsai.com/blog/2019/05/17/microarchitectural-data-sampling-mds-mitigation/
https://mjtsai.com/blog/2017/06/27/bug-in-skylake-and-kaby-lake-hyper-threading/
https://www.tomshardware.com/news/hyperthreading-kaby-lake-skylake-skylake-x,34876.html
You might try a few tricks:
Run the production version using a single thread for an even longer duration (say a week or 2) to see if it crashes.
Ensure that you don't consume all available RAM taking into account the fact that you might have memory fragmentation.
Ensure that your program does not have memory leak or increase memory usage the more long it runs.
Add some tracking by adding extra value, set value to something known in destructor (so you would recognize the pattern if you do a double delete).
Try to run the program under another platform and compiler.
Your compiler or library might contains bugs. Try another (more recent) version.
Remove code from the original version until it crashes no more. That works better if you can consistently get the crash with a sequence that somehow corrupt memory.
Once you got a crash, run the program with the exact same data (for each thread) and see if it always crash at the same location.
Rewrite or validate any unsafe code in your application. Avoid casting, printf and other old school variable argument function and any unsafe strcpy and similar function.
Use checked STL version.
Try unoptimized release version.
Try optimized debug version.
Learn the differences between DEBUG and RELEASE version for your compiler.
Rewrite problematic code from zero. Maybe it won't have the bug.
Inspect the data when it crashes.
Review your error/exception handling to see if you ignore some potential problem.
Test how you program behave when it run out of memory, out of disk space, when an exception is thrown…
Ensure that your debugger stop at each thrown exception handled or not.
Ensure that your program compile and run without warnings or that you understand them and are sure it does not matters.
Inspect the data when it crash to see if look good.
You might reserve memory to reduce fragmentation and reallocation. If your program runs for hours, it might be possible that the memory get too much fragmented and the system cannot find a block that is big enough.
Since your program is multithreaded, ensure that your run-time is also compatible with that.
Ensure that you don't share data across thread or that they are adequately protected.

Nirgam runtime error using SystemC in Mac OS X

I am running the nirgam 3.0, which is a open source SystemC based NoC simulator in my MacBook(Mac OS X 10.10). I successfully compile the nirgam source code, but when is try to run it, it throws the "segmentation fault" as showing:
╰─$ ./nirgam
SystemC 2.3.1-Accellera --- May 3 2015 19:32:31
Copyright (c) 1996-2014 by all Contributors,
ALL RIGHTS RESERVED
[1] 5067 segmentation fault ./nirgam
(lldb) r
Process 5076 launched: './nirgam' (x86_64)
I tried to using lldb to find out the error, and I got following hints:
SystemC 2.3.1-Accellera --- May 3 2015 19:32:31
Copyright (c) 1996-2014 by all Contributors,
ALL RIGHTS RESERVED
Process 5076 stopped
* thread #1: tid = 0x7321, 0x00000001000c9d8d nirgam`sc_main(argc=1, argv=0x00000001004060f0) + 45 at main.cpp:62, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5eb2dd18)
frame #0: 0x00000001000c9d8d nirgam`sc_main(argc=1, argv=0x00000001004060f0) + 45 at main.cpp:62
59
60 int sc_main(int argc, char *argv[])
61 {
-> 62 system("rm -rf *.txt");
63 system("rm -rf jitter/GT/*");
64 system("rm -rf jitter/BE/*");
65 cout<<"---------------------------------------------------------------------------"<<endl;
(lldb) bt
* thread #1: tid = 0x7321, 0x00000001000c9d8d nirgam`sc_main(argc=1, argv=0x00000001004060f0) + 45 at main.cpp:62, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5eb2dd18)
* frame #0: 0x00000001000c9d8d nirgam`sc_main(argc=1, argv=0x00000001004060f0) + 45 at main.cpp:62
frame #1: 0x0000000100238778 libsystemc-2.3.1.dylib`sc_elab_and_sim + 184
frame #2: 0x00007fff879a65c9 libdyld.dylib`start + 1
frame #3: 0x00007fff879a65c9 libdyld.dylib`start + 1
Following fact can be a hint:
I can compile and run other SystemC based including Noxim and other code I have writed in my system enviroment.
The lldb information show that there is an error occured when going to sc_main, which is a main function in SystemC based modeling.
My question is:
What does EXC_BAD_ACCESS mean when calling the sc_main function?
Is there any method I can use to get closer to the source of this bug ?

Can EXC_BAD_ACCESS crash be an artifact of iOS device running out of memory?

I'm running an app on iOS and periodical (not very often) it crashes with EXC_BAD_ACCESS.
The crash occurs while starting boost::thread:
boost::thread(boost::bind(&SomeClass::someStaticFunction, someParam));
and the call stack i see is:
* thread #35: tid = 0x2a822, 0x00d2469e NdsVgconnectTestApp`boost::(anonymous namespace)::thread_proxy(param=<unavailable>) + 246 at thread.cpp:164, stop reason = EXC_BAD_ACCESS (code=1, address=0x20000008)
* frame #0: 0x00d2469e NdsVgconnectTestApp`boost::(anonymous namespace)::thread_proxy(param=<unavailable>) + 246 at thread.cpp:164
frame #1: 0x3b877918 libsystem_pthread.dylib`_pthread_body + 140
frame #2: 0x3b87788a libsystem_pthread.dylib`_pthread_start + 102
I'm passing to boost::thread a static function so its hard to believe that there is some problem with addressing or pointer corruption. So my question is: Can EXC_BAD_ACCESS crash be an artifact of iOS device running out of memory or the app exceeding the memory limit given by the OS?

OpenGL issue when 3rd party plug-ins also use OpenGL

I'm working on a program containing an OpenGL view (using Ogre3D); this program hosts third-party plug-ins (namely, VST) which can have their own UI opened. Some plug-ins also use OpenGL for their UI and make the program crash in the Ogre Render System as soon as this plug-in-specific OpenGL UI is opened (no crash with other non-opengl plug-ins' UI).
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Thread 0 Crashed: com.apple.main-thread
0 GLEngine gleRunVertexSubmitImmediate + 722
1 GLEngine gleLLVMArrayFunc + 60
2 GLEngine gleSetVertexArrayFunc + 116
3 GLEngine gleDrawArraysOrElements_ExecCore + 1514
4 GLEngine glDrawElements_Exec + 834
5 libGL.dylib glDrawElements + 52
6 RenderSystem_GL.dylib Ogre::GLRenderSystem::_Render(...)...
...
22 Ogre Ogre::Root::renderOneFrame() + 30
23 com.mycompany.myapp MyOgreWidget::paint()
...
(apparently a third-party thread from the plug-in)
Thread 10: Dipatch queue: com.apple.opengl.glvmDoWork
0 libSystem.B.dylib mach_msg_trap + 10
1 libSystem.B.dylib mach_msg + 68
2 libCoreVMClient.dylib cvmsServ_BuildModularFunction + 195
3 libCoreVMClient.dylib CVMSBuildModularFunction + 98
4 libGLProgrammability.dylib glvm_deferred_build_modular(voi*) + 254
5 libSystem.B.dylib _dispatch_queue_drain + 249
6 libSystem.B.dylib _dispatch_queue_invoke + 50
7 libSystem.B.dylib _dispatch_worker_thread2 + 249
8 libSystem.B.dylib _pthread_wqthread + 390
9 libSystem.B.dylib start_wqthread + 30
I suspected that the OpenGL Context was not properly managed, either in Ogre3D or in the plug-in's UI, but it is not possible to access the plug-ins' render callbacks.
I tested with Ogre3D 1.7.1 and 1.7.3. My UI toolkit is Qt (version 4.6.3 and 4.7.4). Same issues with MacOSX and Windows.
I know other programs with OpenGL views which don't have this issue, even with the exact same plug-ins, I wonder how they handle such situations.
Any idea how to handle that?
Thanks for any help. All the best.
Any idea how to handle that?
I'd add a call to QGLWidget::doneCurrent right after finishing your own (=Ogre3D's) OpenGL work, and do a QGLWidget::makeCurrent before doing your own OpenGL work.