I have had an app on the AppStore since 2011 and since then the app has gone through quite a few changes as you might expect, after the last change I have started receiving some crashes (I use Crashlytics) that seem to indicate that the source is CoreGraphics although it must be something that I'm doing wrong with it, and not the library itself, but since the crashes themselves do not contain any code that refers to my app it's a bit complicated to know what it could be.
I can't seem to be able to trigger this issue but some users can, I've checked and some of them had enough memory so it's not an out of memory issue afaik.
The only crashes I have received are all this one in particular so it's not a red herring due to some memory issues that may trigger false positives. The app is constantly used and tested with Address sanitizer on so it should catch all of these.
I'm fairly well versed in C++ (the main language of the app) but know just enough Objective-C to get things done so I was wondering if I am missing something or if there is a way to debug this, since I wouldn't even know where to start.
Since I can't trigger the issue, and I can't see where it happens, I can't think of any way of sorting this out.
What would be the best way of debugging this, how do people approach this?
Crashed: com.apple.main-thread
0 libsystem_platform.dylib 0x184079b80 _platform_memmove + 176
1 CoreGraphics 0x185bc81f8 decode_data + 12740
2 CoreGraphics 0x185bc81f8 decode_data + 12740
3 CoreGraphics 0x185d982f4 img_decode_read + 2032
4 CoreGraphics 0x185d9bffc img_alphamerge_read + 548
5 CoreGraphics 0x185d9f818 img_data_lock + 7048
6 CoreGraphics 0x185d9dc38 CGSImageDataLock + 184
7 CoreGraphics 0x185bbe704 ripc_AcquireRIPImageData + 308
8 CoreGraphics 0x185db287c ripc_DrawImage + 644
9 CoreGraphics 0x185da2678 CGContextDrawImageWithOptions + 632
10 QuartzCore 0x188335f7c CA::Render::(anonymous namespace)::create_image_by_rendering(CGImage*, CGColorSpace*, unsigned int, double) + 1232
11 QuartzCore 0x188336f60 CA::Render::(anonymous namespace)::create_image_from_rgb_image(CGImage*, CGColorSpace*, unsigned int, double) + 676
12 QuartzCore 0x1883358e4 CA::Render::create_image(CGImage*, CGColorSpace*, unsigned int, double) + 900
13 QuartzCore 0x188338584 CA::Render::copy_image(CGImage*, CGColorSpace*, unsigned int, double, double) + 472
14 QuartzCore 0x18843ebcc -[CALayer(CALayerPrivate) _copyRenderLayer:layerFlags:commitFlags:] + 632
15 QuartzCore 0x1884434b0 CA::Layer::commit_if_needed(CA::Transaction*, void (*)(CA::Layer*, unsigned int, unsigned int, void*), void*) + 444
16 QuartzCore 0x1883a7dcc CA::Context::commit_root(CA::Layer*, void*) + 44
17 QuartzCore 0x188366fd8 x_hash_table_foreach + 72
18 QuartzCore 0x1883a8674 CA::Context::commit_transaction(CA::Transaction*) + 2208
19 QuartzCore 0x1883ce340 CA::Transaction::commit() + 540
20 QuartzCore 0x1883cf180 CA::Transaction::observer_callback(__CFRunLoopObserver*, unsigned long, void*) + 92
21 CoreFoundation 0x1843ff8b8 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 32
22 CoreFoundation 0x1843fd270 __CFRunLoopDoObservers + 412
23 CoreFoundation 0x1843fd82c __CFRunLoopRun + 1292
24 CoreFoundation 0x18431e2d8 CFRunLoopRunSpecific + 436
25 GraphicsServices 0x1861aff84 GSEventRunModal + 100
26 UIKit 0x18d8cb880 UIApplicationMain + 208
27 (MyAppName) 0x1010cb280 main (main.m:14)
28 libdyld.dylib 0x183e4256c start + 4
Related
I was trying to create a gameloop with fps dependent on the speed of its iterations. To achieve this I wanted to use a platform specific timer that (in case of windows) used the timeGetTime function (https://learn.microsoft.com/en-us/windows/desktop/api/timeapi/nf-timeapi-timegettime) to calculate how much time has passed since the last iteration. But I found that the time it costs to call this function is already quite a lot (for a computer). Now I'm wondering if this is the right approach.
I created a simple test that looks like this:
Timer timer();
for (int i=0; i < 60; i++)
cout << timer->get_elt() << endl;
delete timer;
The timer class looks like this: (begin is a DWORD)
Timer::Timer()
{
begin = timeGetTime();
}
int Timer::get_elt()
{
return timeGetTime() - begin;
}
Not very interesting, but here is a example of the result:
0 0 1 3 4 14 15 15 15 16 16 17 17 17 17 18 19 19 19 19 20 20 20 20 20 21 21 21 21 21 22 22 22 22 22 22 23 23 23 25 38 39 39 55 56 56 66 68 68 69 71 71 72 73 73 73 73 73 74 74
I was expecting this to take about 10 milliseconds at most, but on average it took about 64.
What surprised me most about it was how erratic the results were. Sometimes it prints up to 7 times the same number, whereas at other times there are gaps of 12 milliseconds between iterations. I realize this is also because the timer is not accurate, but still. As far as I know your pc should execute this program as fast as it possibly can, is that even true?
If you want to run your game at say 60 fps, you'd have about 16 milliseconds for every loop, and if calling the timer alone takes about 2 milliseconds on average every time, and you still need to process input, update, and render, how is that even possible?
So what should I do here, is timeGetTime something you could use in a gameloop (it's been suggested a lot), or should I think of another function?
I would suggest using the QueryPerformanceCounter instead
https://msdn.microsoft.com/en-us/library/windows/desktop/ms644904(v=vs.85).aspx
The Timers from Windows Multimedia API is a good choice for animation, games, etc.
The have greatest precision on Windows Platform.
Qt use and qualifies this timers also as precise ones.
http://doc.qt.io/qt-5/qt.html#TimerType-enum
On Windows, Qt will use Windows's Multimedia timer facility (if
available) for Qt::PreciseTimer and normal Windows timers for
Qt::CoarseTimer and Qt::VeryCoarseTimer.
I'm trying to speedup the OpenCV SIFT algorithm with OpenMP on a Intel® Core™ i5-6500 CPU # 3.20GHz × 4. You can find the code in sift.cpp.
The most expensive part is the descriptor computaton, in particular:
static void calcDescriptors(const std::vector<Mat>& gpyr, const std::vector<KeyPoint>& keypoints,
Mat& descriptors, int nOctaveLayers, int firstOctave )
{
int d = SIFT_DESCR_WIDTH, n = SIFT_DESCR_HIST_BINS;
for( size_t i = 0; i < keypoints.size(); i++ )
{
KeyPoint kpt = keypoints[i];
int octave, layer;
float scale;
unpackOctave(kpt, octave, layer, scale);
CV_Assert(octave >= firstOctave && layer <= nOctaveLayers+2);
float size=kpt.size*scale;
Point2f ptf(kpt.pt.x*scale, kpt.pt.y*scale);
const Mat& img = gpyr[(octave - firstOctave)*(nOctaveLayers + 3) + layer];
float angle = 360.f - kpt.angle;
if(std::abs(angle - 360.f) < FLT_EPSILON)
angle = 0.f;
calcSIFTDescriptor(img, ptf, angle, size*0.5f, d, n, descriptors.ptr<float>((int)i));
}
}
The serial version of this function take 52 ms on average.
This for has an high granulatiy: it's executed 604 times (which is keypoints.size() ). The main time consuming component inside the for is calcSIFTDescriptor which takes most of the cycle time computation and it takes on 105 us on average, but it often happens that it can take 200usor 50us.
However, we are incredibly lucky: there is no dependency between each for cycle, so we can just add:
#pragma omp parallel for schedule(dynamic,8)
and obtain an initial speedup. The dynamic option is introduced since it seems it give little better performances than static (don't know why).
The problem is that it's really unstable and doesn't scale. This is the time needed to compute the function in parallel mode:
25ms 43ms 32ms 15ms 27ms 53ms 21ms 24ms
As you can see only once the optimal speedup in a quad-core system is reached (15ms). Most of the times we reach half of the optimal speedup: 25ms in a quadcore system is only half of the theoretical optimal speedup.
Why this happens? How can we improve this?
UPDATE:
As suggested in the comments, I tried to use a bigger dataset. Using an huge image, the serial version takes 13574ms to compute the descriptors, while the parallel version 3704ms with the same quad-core of before. Much better: even if it's not the best theoretical result, it actually scales well. But actually the problem remain, since the previous results are obtained from a typical image.
UPDATE 1: as suggested by the comment, I tried to benchmark without any interval between the execution in an "hot mode" (see comment for more details). Better results are achieved more frequently, but still there is a lot of variations. This are the times (in ms) for 100 runs in hot mode:
43 42 14 26 14 43 13 26 15 51 15 20 14 40 34 15 15 31 15 22 14 21 17 15 14 27 14 16 14 22 14 22 15 15 14 43 16 16 15 28 14 24 14 36 15 32 13 21 14 23 14 15 13 26 15 35 13 32 14 36 14 34 15 40 28 14 14 15 15 35 15 22 14 17 15 23 14 24 17 16 14 35 14 29 14 25 14 32 14 28 14 34 14 30 22 14 15 24 14 31
You can see a lot of good results (14ms, 15ms) but a lot of horrible results also (>40ms). The average is 22ms Notice that there is no at most 4ms of variation in the sequential mode:
52 54 52 52 51 52 52 53 53 52 53 51 52 53 53 54 53 53 53 53 54 53 54 54 53 53 53 52 53 52 51 52 52 53 54 54 54 55 55 55 54 54 54 53 53 52 52 52 51 52 54 53 54 54 54 55 54 54 52 55 52 52 52 51 52 51 52 52 51 51 52 52 53 53 53 53 55 54 55 54 54 54 55 52 52 52 51 51 52 51 51 51 52 53 53 54 53 54 53 55
UPDATE 2:
I've noticed that each CPU utilization during the "hot mode" benchmarking is quite random and also it never reach more than 80%, as shown in the image below:
Instead the image below shows the CPUs utilization while I compile OpenCV through make -j4. As you can see it more stable and used almost 100% of it:
I think that this is variation in the first image are normal since we execute the same short program many times, which is more unstable than one big program. What I don't understand is why we never reach more than 80% of CPU utilization.
I strongly suggest you to use some performance tools such as Paraver (http://www.bsc.es/paraver), TAU (http://www.cs.uoregon.edu/research/tau/home.php) Vampir (https://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampir) or even Intel's Vtune (https://software.intel.com/en-us/intel-vtune-amplifier-xe).
These tools will help you understand where threads spends their cycles. With them, you can find whether the application is unbalanced (either by IPC or instructions), whether there is any limitation due to memory bandwidth or false sharing problems, among many other issues.
I want to profile a C++ application that runs in an ARM device.
I ran my app and I profiled it using ProfilerStart("googleProfBL.prof"), so the file is generated.
When I open the file from the ARM device in my local computer I get this:
./pprof --text --add_lib=libraryIwanttoDebug.so BinaryThatLoadsThatLibrary googleProfBL.prof
Using local file /home/genius/PresControler/src-build-target/deploy/NavStartup.
Using local file ../traces/googleProfBL.prof.
Warning: address ffffffffffffffff is longer than address length 8
Warning: address ffffffffffffffff is longer than address length 8
Hexadecimal number > 0xffffffff non-portable at ./pprof line 4475.
Hexadecimal number > 0xffffffff non-portable at ./pprof line 4475.
Total: 5347 samples
258 4.8% 4.8% 258 4.8% 0x76d4c276
144 2.7% 7.5% 144 2.7% 0x76da2cc4
126 2.4% 9.9% 126 2.4% 0x5d0f8284
114 2.1% 12.0% 114 2.1% 0x76d27386
64 1.2% 13.2% 64 1.2% 0x76dba2dc
53 1.0% 14.2% 53 1.0% 0x76dba1f4
...
The so library is compiled in debug mode (is not stripped), I do not know why I am not getting the symbols.
I tried this:
./pprof --text --add_lib=aFileOfTheLibrary.o BinaryThatLoadsThatLibrary googleProfBL.prof
Looks like I got a couple of symbols.
Using local file /home/genius/PresControler/src-build-target/deploy/NavStartup.
Using local file ../traces/googleProfBL.prof.
Warning: address ffffffffffffffff is longer than address length 8
Warning: address ffffffffffffffff is longer than address length 8
Hexadecimal number > 0xffffffff non-portable at ./pprof line 4475.
Hexadecimal number > 0xffffffff non-portable at ./pprof line 4475.
Total: 5347 samples
258 4.8% 4.8% 258 4.8% 0x76d4c276
144 2.7% 7.5% 144 2.7% 0x76da2cc4
126 2.4% 9.9% 126 2.4% 0x5d0f8284
114 2.1% 12.0% 114 2.1% 0x76d27386
64 1.2% 13.2% 64 1.2% 0x76dba2dc
53 1.0% 14.2% 53 1.0% 0x76dba1f4
50 0.9% 15.1% 50 0.9% 0x76dbf1bc
34 0.6% 15.8% 34 0.6% 0x72eae1b4
30 0.6% 16.3% 30 0.6% 0x76d8a32a
30 0.6% 16.9% 30 0.6% 0x76d8e2c0
..
0 0.0% 100.0% 7 0.1% std::forward_as_tuple <- I couldn't see that before!!!
I tried doing --add_lib for every .o I have but I do not get any more symbols. Why I do not get the symbols, does it have anything to do because I am checking the results using an intel and getting them using an ARM?? How could I fix that? any help???
Thank you!!!
I got much more information now!
I was living my application pressing ctrl+c, so the file got somehow corrupt...
I did a test calling ProfilerStop() before pressing ctrl+c and it worked (of course using as well --lib_prefix where the .so are).
I still got these warnings:
Warning: address ffffffffffffffff is longer than address length 8
Warning: address ffffffffffffffff is longer than address length 8
Hexadecimal number > 0xffffffff non-portable at ./pprof line 4475.
Hexadecimal number > 0xffffffff non-portable at ./pprof line 4475.
If someone knows why I am getting them (I assume is because I am debugging code generated by another device) please let me know.
I'm running on the Mavericks GM with Xcode 5.0.1 GM. The OS X SDK used to compile against doesn't seem to matter. I've tried recompiling my company's software with both the 10.8 and 10.9 SDKs. I get the same result compiling for Debug and Release. Oddly enough, if I compile on 10.7 or 10.8 and bring the binaries over to a 10.9 machine, everything works fine.
The software I work on is written in C++ and runs about 600K lines of code. Nothing in our codebase uses lbxpc directly. Some of the biggest external libraries used:
Qt 4.8.5
Boost 1.49.0
OpenCL (dynamically loaded at runtime)
OpenEXR
Growl 1.2.1
Whenever the crash happens, it's at a seemingly random place in the application's main thread.
Has anyone else run into this issue? If so, what was the cause, and how did you fix it?
Disassembly of where the crash happens:
0x7fff8f2e1e3b: leaq 98519(%rip), %rax ; "Bug in libxpc: Domain environment context has overflowed maximum inline message size."
0x7fff8f2e1e42: movq %rax, -389938449(%rip) ; gCRAnnotations + 8
0x7fff8f2e1e49: ud2 <-- crash
Backtrace from lldb:
* thread #4: tid = 0x122764, 0x00007fff8f2e1e49 libxpc.dylib`_xpc_domain_serialize + 496, queue = 'com.apple.root.default-overcommit-priority, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
frame #0: 0x00007fff8f2e1e49 libxpc.dylib`_xpc_domain_serialize + 496
frame #1: 0x00007fff8f2e18ca libxpc.dylib`_xpc_dictionary_serialize_apply + 84
frame #2: 0x00007fff8f2e1497 libxpc.dylib`_xpc_dictionary_apply_node_f + 105
frame #3: 0x00007fff8f2e16af libxpc.dylib`_xpc_dictionary_serialize + 161
frame #4: 0x00007fff8f2e1184 libxpc.dylib`_xpc_serializer_pack + 423
frame #5: 0x00007fff8f2e0f81 libxpc.dylib`_xpc_pipe_pack_message + 118
frame #6: 0x00007fff8f2e0985 libxpc.dylib`xpc_pipe_routine + 99
frame #7: 0x00007fff8f2dff2a libxpc.dylib`_xpc_runtime_init_once + 827
frame #8: 0x00007fff9076c2ad libdispatch.dylib`_dispatch_client_callout + 8
frame #9: 0x00007fff9076c21c libdispatch.dylib`dispatch_once_f + 79
frame #10: 0x00007fff8f2e4144 libxpc.dylib`_xpc_connection_init + 64
frame #11: 0x00007fff8f2e40f6 libxpc.dylib`_xpc_connection_resume_init + 14
frame #12: 0x00007fff9076c2ad libdispatch.dylib`_dispatch_client_callout + 8
frame #13: 0x00007fff9076e09e libdispatch.dylib`_dispatch_root_queue_drain + 326
frame #14: 0x00007fff9076f193 libdispatch.dylib`_dispatch_worker_thread2 + 40
frame #15: 0x00007fff922f0ef8 libsystem_pthread.dylib`_pthread_wqthread + 314
frame #16: 0x00007fff922f3fb9 libsystem_pthread.dylib`start_wqthread + 13
Update with some more information:
I just found out something rather interesting. This issue only happens when the application is launched by Xcode. If I launch it via lldb on the command line, this crash does not occur. Likewise, if I double click on it in Finder, the issue does not occur.
I'm working on a program containing an OpenGL view (using Ogre3D); this program hosts third-party plug-ins (namely, VST) which can have their own UI opened. Some plug-ins also use OpenGL for their UI and make the program crash in the Ogre Render System as soon as this plug-in-specific OpenGL UI is opened (no crash with other non-opengl plug-ins' UI).
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Thread 0 Crashed: com.apple.main-thread
0 GLEngine gleRunVertexSubmitImmediate + 722
1 GLEngine gleLLVMArrayFunc + 60
2 GLEngine gleSetVertexArrayFunc + 116
3 GLEngine gleDrawArraysOrElements_ExecCore + 1514
4 GLEngine glDrawElements_Exec + 834
5 libGL.dylib glDrawElements + 52
6 RenderSystem_GL.dylib Ogre::GLRenderSystem::_Render(...)...
...
22 Ogre Ogre::Root::renderOneFrame() + 30
23 com.mycompany.myapp MyOgreWidget::paint()
...
(apparently a third-party thread from the plug-in)
Thread 10: Dipatch queue: com.apple.opengl.glvmDoWork
0 libSystem.B.dylib mach_msg_trap + 10
1 libSystem.B.dylib mach_msg + 68
2 libCoreVMClient.dylib cvmsServ_BuildModularFunction + 195
3 libCoreVMClient.dylib CVMSBuildModularFunction + 98
4 libGLProgrammability.dylib glvm_deferred_build_modular(voi*) + 254
5 libSystem.B.dylib _dispatch_queue_drain + 249
6 libSystem.B.dylib _dispatch_queue_invoke + 50
7 libSystem.B.dylib _dispatch_worker_thread2 + 249
8 libSystem.B.dylib _pthread_wqthread + 390
9 libSystem.B.dylib start_wqthread + 30
I suspected that the OpenGL Context was not properly managed, either in Ogre3D or in the plug-in's UI, but it is not possible to access the plug-ins' render callbacks.
I tested with Ogre3D 1.7.1 and 1.7.3. My UI toolkit is Qt (version 4.6.3 and 4.7.4). Same issues with MacOSX and Windows.
I know other programs with OpenGL views which don't have this issue, even with the exact same plug-ins, I wonder how they handle such situations.
Any idea how to handle that?
Thanks for any help. All the best.
Any idea how to handle that?
I'd add a call to QGLWidget::doneCurrent right after finishing your own (=Ogre3D's) OpenGL work, and do a QGLWidget::makeCurrent before doing your own OpenGL work.