Since a couple of weeks I'm trying to profile a piece of numerical software and I'm unable to get useful results.
The code I'm profiling results in a huge function (__attribute__((flatten))) created out of many inlined functions and a few calls to std::exp/std::log/std::pow. This function is located inside a shared library and loaded via dlopen().
I've used
the google CPU profiler (hangs in the first fork() (interrupted by SIGPROF and restarted and interrupted and...) -- same problem with g++ option -pg)
linux tool perf (caused a reboot of the machine, I complained and they upgraded the OS (CENTOS 6.5). The results only highlight two assembler instructions out of above mentioned huge function. I don't have permissions to read accurate event sources (*:ppp))
some old version of vtune (difficult to operate, results are unreliable, no hardware drivers loaded)
sprof (results do not tell me anything as there is only a single function to profile -- when avoiding to use attribute flatten then the behavior is fully different)
I'm running
CENTOS 6.5
and
g++ (GCC) 5.3.0
I don't have any influence over the version of the OS or the compiler version.
I complained about the ancient OS some weeks ago, and they upgraded me to what I mentioned above.
In some former live I successfully used the google profiler -- when it was working (and not crashing or hanging due to signal handling problems) it provided useful results.
Anybody any comment?
Could all these unclear results be the result of the fixes for SPECTRE?
Do I need to insist, that certain profiling options are enabled on the machine?
Do I need to insist on the vtune drivers loaded?
Do I need to insist on an uptodate copy of vtune installed?
Compile with -fno-omit-frame-pointers?
Related
I have a Linux application written in C++ with proprietary source code so this question is about any mechanisms that could cause what I describe below to happen.
Anyway, the application uses some library functions that perform some data processing and spits out a result. On my machine, I get the expected result. On others, the app either segfaults or outputs nonsense.
It is distributed as a bit of source code and a static library, built on my machine, which can then be built with make and run (and possibly distributed). The OS environment is Ubuntu 18.04.4 LTS (WSL1) and is essentially the same on all of the machines.
The ldd utility indicates that the same libraries are being used and the Ubuntu and g++ versions are identical as well.
The application was run in gdb on one of the machines where it segfaults and it turned out that a specific buffer hadn't been allocated and was nullptr. The size parameter being passed to the constructor in this case is a #define macro.
I am aware that the preferred approach on Linux is to distribute source code, not precompiled binaries, but for IP protection purposes, that might not always be possible. I have also read that there could be subtle library version or kernel-level incompatibilities that can cause this kind of weirdness, but like I said, that doesn't seem to be the case here.
Q: Does anybody know why I might be observing this strange behavior?
I'm using gperftools to profile a C++ application, which was compiled using GCC 5.4.0 (with -O3).
The code is highly optimized, so I don't see a lot of branches in the output, but there's a branch called __nss_passwd_lookup(), which takes significant amount of time:
My only guess is that it's related to memory allocation somehow.
Operating system: Ubuntu 16.04 x86_64, Kernel: 4.8.
Some assembly functions in glibc occasionally have this issue (e.g. memcpy or memset). Consider installing libc6-dbg package. Also please try golang version of pprof tool (go get github.com/google/pprof).
I recently noticed that running a program inside gdb in windows makes it a lot slower, and I want to know why.
Here's an example:
It is a pure C++03 project, compiled with mingw32 (gcc 4.8.1, 32 bits).
It is statically linked against libstdc++ and libgcc, no other lib is used.
It is a cpu and memory intensive non-parallel process (a mesh edition operation, lots of news and deletes and queries to data structures involved).
The problem is not start-up time, the whole process is painfully slow.
Debug build (-O0 -g2) runs in 8 secs outside gdb, but in 140 secs within gdb.
Tested from command line, just launching gdb and just typing "run" (no breakpoints defined).
I also tested a release build (optimized, and without debugging information), and it is still much slower inside gdb (3 secs vs 140 secs; yes, it takes the same time as the not optimized version inside gdb).
Tested with gdb 7.5 and 7.6 from mingw32 project, and with a gdb 7.8 compiled by me (all of them without python support).
I usually develop on a GNU/Linux box, and there I can't notice speed differences between running with or withoud gdb.
I want to know what is gdb doing that is making it run so slowly. I have some basic understanding of how a debugger works, but I cannot figure out what is it doing here, and googling didn't helped me this time.
I've finally found the problem, thanks to greatwolf for asking me to test other debuggers. Ollydbg takes the same time as gdb, so it's not a gdb problem, its a Windows problem. This tip changed my search criteria and then I've found this article* that explains the problem very well and gives a really simple solution: define an environment varible _NO_DEBUG_HEAP to 1. This will disable the use of a special heap system windows provides and c++ programs use.
* Here's the link: http://preshing.com/20110717/the-windows-heap-is-slow-when-launched-from-the-debugger/
I once had issues with gdb being incredibly slow and I remember disabling nls (native language support, i.e. the translations of all the messages) would remedy this.
The configure time option is --disable-nls. I might have just been mistaken as to what is the true cause, but it's worth a shot for you anyways.
My bug report from back then is here, although the conclusion there would be that I was mistaken. If you can provide further insight into this, that would be great!
I have been developing a C++ application to run on a 64 bit Ubuntu 12.04. I develop the code on my 32 bit 12.04 Ubuntu laptop, then upload it to a git repository, pull it on the server and build the pulled source natively.
Until recently things worked well and I had no problems but today g++ 4.6.3 crashed when I tried to compile on the 64 bit server and I got an output telling me to submit a crash report (g++ 4.6.3 is the same version I have on my development machine also). The identical code did not cause a crash on my dev machine.
I am not asking why it crashed, but I would like to know what the problem was if possible. Does g++ produce any file logs when it encounters problems?
As far as I can tell my code is doing nothing controversial, I am not creating templates, I simply use a couple of boost libraries, mysql++, openssl and some static libraries I have written myself.
I really need to run this application every day so I want to fix this as soon as I can. I can think of the following ways to deal with things
Try and find out what aspect of my code caused the compiler to crash and rewrite my code accordingly.
Rent another server.
Upgrade (or downgrade) g++ or create an additional g++ on the server and try that. I am reluctant to do this as I have read that you can ruin your system when upgrading g++ on Ubuntu.
I use Eclipse to build everything on my dev machine and simply build code on my server using the Eclipse generated makefile that I have made part of the git project - I could write my own makefile in case something in there is causing the crash on the 64 bit server.
I would really welcome advice on how to proceed. I am not an expert on how compilers work internally and this is the first time I have encountered this kind of error so I am not quite sure what to do next.
I would really welcome advice on how to proceed
One reason for a crash might be hardware problem (faulty disk, disk controller, memory, or something else). This is hard to detect.
Another reason might be a compiler bug, but very unlikely.
What you can do is :
check the hardware of the server (run all possible checks you can think of). Try to compile many times on a different machine
make sure your system is not running out of virtual memory
upgrade or change the compiler, and see if it happens
There are various article explaining that g++ can crash because of HW problems :
crash during compiling - Most likely there is nothing wrong with your installation, your compiler or kernel. It very likely has something to do with your hardware. There are two exceptions to this "rule". You could be running low on virtual memory, or you could be installing Red Hat 5.x, 6.x or 7.x
crash during optimization
The solution to this was found in the question Executable runs faster on Wine than Windows -- why? Glibc's floor() is probably implemented in terms of system libraries.
I have a very small C++ program (~100 lines) for a physics simulation. I have compiled it with gcc 4.6.1 on both Ubuntu Oneiric and Windows XP on the same computer. I used precisely the same command line options (same makefile).
Strangely, on Ubuntu, the program finishes much faster than on Windows (~7.5 s vs 13.5 s). At this point I thought it's a compiler difference (despite using the same version).
But even more strangely, if I run the Windows executable under wine, it's still faster than on Windows (I get 11 s "real" and 7.7 s "user" time -- and this includes wine startup.)
I'm confused. Surely if the same code is run on the same CPU, there shouldn't be a difference in the timing.
What can be the reason for this? What could I be doing wrong?
The program does minimal I/O (outputs a single line), and only uses a fixed-length vector from the STL (i.e. no system libraries should be involved). On Ubuntu I used the default gcc and on Windows the Nuwen distribution. I verified that the CPU usage is close to zero when doing the benchmarking (I closed most programs). On Linux I used time for timing. On Windows I used timethis.exe.
UPDATE
I did some more precise timings, comparing the running time for different inputs (run-time must be proportional to the input) of the gcc and msvc-compiled programs on Windows XP, Wine and Linux. All numbers are in seconds and are the minimum of at least 3 runs.
On Windows I used timethis.exe (wall time), on Linux and Wine I used time (CPU time). (timethis.exe is broken on Wine) I made sure no other programs were using the CPU and disabled the virus scanner.
The command line options to gcc were -march=pentium-m -Wall -O3 -fno-exceptions -fno-rtti (i.e. exceptions were disabled).
What we see from this data:
the difference is not due to process startup time, as run-times are proportional to the input
The difference between running on Wine and Windows exists only for the gcc-compiled program, not the msvc-compiled one: it can't be casued by other programs hogging the CPU on Windows or timethis.exe being broken.
You'd be surprised what system libraries are involved. Just do ldd on your app, and see which are used (ok, not that much, but certainly glibc).
In order to completely trust your findings about execution speed, you would need to run your app a couple of times sequentially and take the mean execution time. It might be that the OS loader is just slower (although 4s is a long loading time).
Other very possible reasons are:
Different malloc implementation
Exception handling, if used to the extreme might cause slowdown (Windows GCC, MinGW, might not be the optimal exception handling star of the show)
OS-dependent initialization: stuff that needs to be done at program startup on Windows, but not on Linux.
Most of these are easily benchmarkable ;-)
An update to your update: the only thing you can now do is profile. Stop guessing, and let a profiler tell you where time is being spent. Use gprof and the Visual Studio built-in profiler and compare time spent in different functions.
Do benchmarking in code. Also try to compile with visual studio. On windows if you have some application like Yahoo Messenger, that are installing hooks, they can very easy slow down your application loading times.
On windows you have: QueryPerformanceCounter
On Linux: clock_gettime
Apparently the difference is system related.
You might use strace to understand what system calls are done, eg
strace -o /tmp/yourprog.tr yourprog
and then look into /tmp/yourprog.tr
(If an equivalent of strace existed on Windows, try to use it)
Perhaps your program is allocating memory (using mmap system call), and perhaps the memory related system calls are faster on Linux (or even on Wine) than on Windows? Or some other syscalls give faster functionality on Linux that on Windows.
NB. I know nothing about Windows, since I'm using Unix systems since 1986 and Linux since 1993.