I recently noticed that running a program inside gdb in windows makes it a lot slower, and I want to know why.
Here's an example:
It is a pure C++03 project, compiled with mingw32 (gcc 4.8.1, 32 bits).
It is statically linked against libstdc++ and libgcc, no other lib is used.
It is a cpu and memory intensive non-parallel process (a mesh edition operation, lots of news and deletes and queries to data structures involved).
The problem is not start-up time, the whole process is painfully slow.
Debug build (-O0 -g2) runs in 8 secs outside gdb, but in 140 secs within gdb.
Tested from command line, just launching gdb and just typing "run" (no breakpoints defined).
I also tested a release build (optimized, and without debugging information), and it is still much slower inside gdb (3 secs vs 140 secs; yes, it takes the same time as the not optimized version inside gdb).
Tested with gdb 7.5 and 7.6 from mingw32 project, and with a gdb 7.8 compiled by me (all of them without python support).
I usually develop on a GNU/Linux box, and there I can't notice speed differences between running with or withoud gdb.
I want to know what is gdb doing that is making it run so slowly. I have some basic understanding of how a debugger works, but I cannot figure out what is it doing here, and googling didn't helped me this time.
I've finally found the problem, thanks to greatwolf for asking me to test other debuggers. Ollydbg takes the same time as gdb, so it's not a gdb problem, its a Windows problem. This tip changed my search criteria and then I've found this article* that explains the problem very well and gives a really simple solution: define an environment varible _NO_DEBUG_HEAP to 1. This will disable the use of a special heap system windows provides and c++ programs use.
* Here's the link: http://preshing.com/20110717/the-windows-heap-is-slow-when-launched-from-the-debugger/
I once had issues with gdb being incredibly slow and I remember disabling nls (native language support, i.e. the translations of all the messages) would remedy this.
The configure time option is --disable-nls. I might have just been mistaken as to what is the true cause, but it's worth a shot for you anyways.
My bug report from back then is here, although the conclusion there would be that I was mistaken. If you can provide further insight into this, that would be great!
Related
Since a couple of weeks I'm trying to profile a piece of numerical software and I'm unable to get useful results.
The code I'm profiling results in a huge function (__attribute__((flatten))) created out of many inlined functions and a few calls to std::exp/std::log/std::pow. This function is located inside a shared library and loaded via dlopen().
I've used
the google CPU profiler (hangs in the first fork() (interrupted by SIGPROF and restarted and interrupted and...) -- same problem with g++ option -pg)
linux tool perf (caused a reboot of the machine, I complained and they upgraded the OS (CENTOS 6.5). The results only highlight two assembler instructions out of above mentioned huge function. I don't have permissions to read accurate event sources (*:ppp))
some old version of vtune (difficult to operate, results are unreliable, no hardware drivers loaded)
sprof (results do not tell me anything as there is only a single function to profile -- when avoiding to use attribute flatten then the behavior is fully different)
I'm running
CENTOS 6.5
and
g++ (GCC) 5.3.0
I don't have any influence over the version of the OS or the compiler version.
I complained about the ancient OS some weeks ago, and they upgraded me to what I mentioned above.
In some former live I successfully used the google profiler -- when it was working (and not crashing or hanging due to signal handling problems) it provided useful results.
Anybody any comment?
Could all these unclear results be the result of the fixes for SPECTRE?
Do I need to insist, that certain profiling options are enabled on the machine?
Do I need to insist on the vtune drivers loaded?
Do I need to insist on an uptodate copy of vtune installed?
Compile with -fno-omit-frame-pointers?
I have Kubuntu 14.10 and 15.04 installed on my four computers, all having different hardware (the oldest machine was assembbled in 2007 and the newest just a month ago. I have both 32- and 64-bit OSs installed. The amount of RAM varies from 4 to 32 GB). I have been using Code::Blocks on them for a few months, and I experience the same problem on all 4 machines: integrated debugger is painfully slow when debugging a C++ program.
After the debugger stops at a breakpoint, it takes 10 seconds to 5 minutes to step through a single line of code. And while the debugger is performing a step, one core of my CPU is loaded by GDB by 100%. And often trying to step through a line of code hangs forever. After that I have to kill GDB and the process that has been debugged.
Some time ago I updated GDB to version 7.9 (from 7.8) but this did not fix the problem. And I have no slowdown when debugging with GDB from command line, so I suspect that the problem is in the Code::Blocks debugger plugin.
I saw many complaints regarding similar problems, some of which were allegedly caused by outdated libc6-dbg (more exactly, by the fact that debug symbols were not shipped with Ubuntu and other Debian-based distributions), but reinstalling libc6-dbg did not help either.
I am afraid that after a day or two of trying to fix this problem I will give up and will switch to Eclipse or some other IDE. It looks like Code::Blocks and its debugger plugin have not been updated for a couple of years (at least, their Linux versions). So maybe I should not use Code::Blocks at all because its future is not clear (while Eclipse is likely to be in service for long time).
I wonder if anybody else experiences this problem and whether there are solutions. Overall Code::Blocks IDE looks decent and rather convenient, but this debugger problem prevents from using it for purposes other than writing code and compiling.
An update:
I ended up installing Eclipse for C++ (Luna release). It took some time to learn how to use it. It is slow, buggy, glitchy and uses a lot of RAM, but it at least allows me to debug my applications in IDE. Now I am 100% sure that the problem is in Code::Blocks debugger plugin.
I also tried NetBeans, and seems to work fine, but it is even slower than Eclipse and looks really ugly. So I am going to stick with Eclipse for now because no one seems to be willing to fix the debugger plugin in Code::Blocks.
The problem turned out to be with stepping through lines that declare uninitialized std::string objects. A similar (or the same) problem is described here:
https://sourceware.org/bugzilla/show_bug.cgi?id=12555
The probleb with debugging in Code::Blocks was suddenly fixed when I followed these instructions:
http://wiki.eclipse.org/CDT/User/FAQ#How_can_I_inspect_the_contents_of_STL_containers.3F
on how to enable pretty-printing in Eclipse CDT.
I still need to follow these instructions on my other machines to make sure they fix the problem.
You can try and turn off CodeBlock pretty-printing: Settings->Debugger->Default->Enable Watch Scripts = Unchecked
(Source)
I have downloaded and installed the Qt 5.1.0 for Windows 32-bit (MinGW 4.8) from the qt-project downloads page. I have run the installer, and am able to compile and run applications using these libraries and the minGW 4.8 32-bit toolhchain.
However, I have a large application, and when I try to debug it (using the gdb bundled with the minGW toolchain), it takes an insane amount of time to start running, and any interaction with the application takes a long time to complete. Not an annoying amount of time, but an unusable amount of time. Has anyone else had this problem and are there any solutions?
In case this helps, I get lots of output when debugging like this:
Temporarily disabling breakpoints for unloaded shared library "C:\Qt\Qt5.1.0\5.1.0\mingw48_32\plugins\somefolder\somelib.dll"
There is a gdb bug that was introduced at some point between 7.4 and 7.5, which makes it much slower. When debugging QObject classes, the slower becomes awfully slow.
By disabling debugging helper, you improve it, but then you miss a lot of precious information in the Local Variables and Expressions. For instance, you cannot display nicely the contents of QLists, etc...
It seems that either:
buidling gdb from the CVS or
using an older gdb (7.4.1)
solves the issue.
Qt creator has "attempt quick start" in its gdb options. It helps a LOT.
Or you can switch to using MSVC compiler on Windows. That also switches your debugging to CDB instead of GDB and bypasses the problem entirely. You can just install MSVC compiler and plug it into QtCreator instead of mingw if you don't like MS IDE.
P.S. This also gives you readable core dumps which is a godsend.
See the comments on Zeks' answer. There he explains that switching from the MinGW toolchain to the Microsoft toolchain (compiler, debugger) solves the problem completely. Fortunately, Qt Creator supports the Microsoft toolchain so you don't need to switch IDEs.
After I did that, debugger launch time is now 4sec, and on app crash it there is zero delay. It also sped up builds a lot.
For reference, I've described how I've set up my system here.
I have managed to improve the debugging speed significantly after changing several settings:
Made sure the compiler is gcc.exe and not g++.exe in the Qt5.1.0\Tools\mingw48_32\bin folder
Unchecked Use Debugging Helper in the Tools->Options->Debugger->Locals & Expressions menu
Unchecked Stop when qWarning() is called and Stop when qFatal() is called
The solution to this was found in the question Executable runs faster on Wine than Windows -- why? Glibc's floor() is probably implemented in terms of system libraries.
I have a very small C++ program (~100 lines) for a physics simulation. I have compiled it with gcc 4.6.1 on both Ubuntu Oneiric and Windows XP on the same computer. I used precisely the same command line options (same makefile).
Strangely, on Ubuntu, the program finishes much faster than on Windows (~7.5 s vs 13.5 s). At this point I thought it's a compiler difference (despite using the same version).
But even more strangely, if I run the Windows executable under wine, it's still faster than on Windows (I get 11 s "real" and 7.7 s "user" time -- and this includes wine startup.)
I'm confused. Surely if the same code is run on the same CPU, there shouldn't be a difference in the timing.
What can be the reason for this? What could I be doing wrong?
The program does minimal I/O (outputs a single line), and only uses a fixed-length vector from the STL (i.e. no system libraries should be involved). On Ubuntu I used the default gcc and on Windows the Nuwen distribution. I verified that the CPU usage is close to zero when doing the benchmarking (I closed most programs). On Linux I used time for timing. On Windows I used timethis.exe.
UPDATE
I did some more precise timings, comparing the running time for different inputs (run-time must be proportional to the input) of the gcc and msvc-compiled programs on Windows XP, Wine and Linux. All numbers are in seconds and are the minimum of at least 3 runs.
On Windows I used timethis.exe (wall time), on Linux and Wine I used time (CPU time). (timethis.exe is broken on Wine) I made sure no other programs were using the CPU and disabled the virus scanner.
The command line options to gcc were -march=pentium-m -Wall -O3 -fno-exceptions -fno-rtti (i.e. exceptions were disabled).
What we see from this data:
the difference is not due to process startup time, as run-times are proportional to the input
The difference between running on Wine and Windows exists only for the gcc-compiled program, not the msvc-compiled one: it can't be casued by other programs hogging the CPU on Windows or timethis.exe being broken.
You'd be surprised what system libraries are involved. Just do ldd on your app, and see which are used (ok, not that much, but certainly glibc).
In order to completely trust your findings about execution speed, you would need to run your app a couple of times sequentially and take the mean execution time. It might be that the OS loader is just slower (although 4s is a long loading time).
Other very possible reasons are:
Different malloc implementation
Exception handling, if used to the extreme might cause slowdown (Windows GCC, MinGW, might not be the optimal exception handling star of the show)
OS-dependent initialization: stuff that needs to be done at program startup on Windows, but not on Linux.
Most of these are easily benchmarkable ;-)
An update to your update: the only thing you can now do is profile. Stop guessing, and let a profiler tell you where time is being spent. Use gprof and the Visual Studio built-in profiler and compare time spent in different functions.
Do benchmarking in code. Also try to compile with visual studio. On windows if you have some application like Yahoo Messenger, that are installing hooks, they can very easy slow down your application loading times.
On windows you have: QueryPerformanceCounter
On Linux: clock_gettime
Apparently the difference is system related.
You might use strace to understand what system calls are done, eg
strace -o /tmp/yourprog.tr yourprog
and then look into /tmp/yourprog.tr
(If an equivalent of strace existed on Windows, try to use it)
Perhaps your program is allocating memory (using mmap system call), and perhaps the memory related system calls are faster on Linux (or even on Wine) than on Windows? Or some other syscalls give faster functionality on Linux that on Windows.
NB. I know nothing about Windows, since I'm using Unix systems since 1986 and Linux since 1993.
I've finally managed to run the QtCreator debugger on Windows after struggling with the Comodo Firewall incompatibilities.
I was hoping to switch from an older version of Qt and Visual C++ to the newest version of Qt and QtCreator, but the debugger performance is atrocious.
I have created a simple GUI with one window that does nothing else but display the window. After starting up QtCreator takes ~60MB RAM (Private bytes in Sysinternals process explorer).
When I start debugging, GDB is using 180MB. I start examining the main window pointer and it jumps to 313. Every time I try to inspect something, one of the cores jumps to 100% use and I have to wait for a few seconds for the information to show. This is just a toy program and I'm afraid that the real program that I want to switch will be much worse.
Is this kind of performance normal for MinGW? Would changing to the latest MinGW release improve things?
Visual C++ IDE + debugger + real-world program takes just close to 100MB of RAM and examining local variables is instantaneous.
Yesterday I built a copy of the Qt 4.5.2 libraries using MSVC 2008 and am using the QtCreator 1.2 MS CDB (Microsoft Console Debugger) support. It seems much faster than gdb. Building Qt for MSVC takes a few hours, but it might be worth trying.
Also, that means smaller Qt DLLs and EXEs as the MS compiler/linker is much better at removing unused code. Some of the Qt DLLs are less than half the size of their MinGW equivalents. Rumour has it that the C++ code the MS compiler generates is faster too.
I had to work with QtCreator a month ago. It's performance is awful, after 30 minutes of working with him, it will start to respond very slowly to everything. Maybe it's because it's still at the beginning.