Same program faster on Linux than Windows -- why? - c++

The solution to this was found in the question Executable runs faster on Wine than Windows -- why? Glibc's floor() is probably implemented in terms of system libraries.
I have a very small C++ program (~100 lines) for a physics simulation. I have compiled it with gcc 4.6.1 on both Ubuntu Oneiric and Windows XP on the same computer. I used precisely the same command line options (same makefile).
Strangely, on Ubuntu, the program finishes much faster than on Windows (~7.5 s vs 13.5 s). At this point I thought it's a compiler difference (despite using the same version).
But even more strangely, if I run the Windows executable under wine, it's still faster than on Windows (I get 11 s "real" and 7.7 s "user" time -- and this includes wine startup.)
I'm confused. Surely if the same code is run on the same CPU, there shouldn't be a difference in the timing.
What can be the reason for this? What could I be doing wrong?
The program does minimal I/O (outputs a single line), and only uses a fixed-length vector from the STL (i.e. no system libraries should be involved). On Ubuntu I used the default gcc and on Windows the Nuwen distribution. I verified that the CPU usage is close to zero when doing the benchmarking (I closed most programs). On Linux I used time for timing. On Windows I used timethis.exe.
UPDATE
I did some more precise timings, comparing the running time for different inputs (run-time must be proportional to the input) of the gcc and msvc-compiled programs on Windows XP, Wine and Linux. All numbers are in seconds and are the minimum of at least 3 runs.
On Windows I used timethis.exe (wall time), on Linux and Wine I used time (CPU time). (timethis.exe is broken on Wine) I made sure no other programs were using the CPU and disabled the virus scanner.
The command line options to gcc were -march=pentium-m -Wall -O3 -fno-exceptions -fno-rtti (i.e. exceptions were disabled).
What we see from this data:
the difference is not due to process startup time, as run-times are proportional to the input
The difference between running on Wine and Windows exists only for the gcc-compiled program, not the msvc-compiled one: it can't be casued by other programs hogging the CPU on Windows or timethis.exe being broken.

You'd be surprised what system libraries are involved. Just do ldd on your app, and see which are used (ok, not that much, but certainly glibc).
In order to completely trust your findings about execution speed, you would need to run your app a couple of times sequentially and take the mean execution time. It might be that the OS loader is just slower (although 4s is a long loading time).
Other very possible reasons are:
Different malloc implementation
Exception handling, if used to the extreme might cause slowdown (Windows GCC, MinGW, might not be the optimal exception handling star of the show)
OS-dependent initialization: stuff that needs to be done at program startup on Windows, but not on Linux.
Most of these are easily benchmarkable ;-)
An update to your update: the only thing you can now do is profile. Stop guessing, and let a profiler tell you where time is being spent. Use gprof and the Visual Studio built-in profiler and compare time spent in different functions.

Do benchmarking in code. Also try to compile with visual studio. On windows if you have some application like Yahoo Messenger, that are installing hooks, they can very easy slow down your application loading times.
On windows you have: QueryPerformanceCounter
On Linux: clock_gettime

Apparently the difference is system related.
You might use strace to understand what system calls are done, eg
strace -o /tmp/yourprog.tr yourprog
and then look into /tmp/yourprog.tr
(If an equivalent of strace existed on Windows, try to use it)
Perhaps your program is allocating memory (using mmap system call), and perhaps the memory related system calls are faster on Linux (or even on Wine) than on Windows? Or some other syscalls give faster functionality on Linux that on Windows.
NB. I know nothing about Windows, since I'm using Unix systems since 1986 and Linux since 1993.

Related

Is profiling still possible after SPECTRE was fixed?

Since a couple of weeks I'm trying to profile a piece of numerical software and I'm unable to get useful results.
The code I'm profiling results in a huge function (__attribute__((flatten))) created out of many inlined functions and a few calls to std::exp/std::log/std::pow. This function is located inside a shared library and loaded via dlopen().
I've used
the google CPU profiler (hangs in the first fork() (interrupted by SIGPROF and restarted and interrupted and...) -- same problem with g++ option -pg)
linux tool perf (caused a reboot of the machine, I complained and they upgraded the OS (CENTOS 6.5). The results only highlight two assembler instructions out of above mentioned huge function. I don't have permissions to read accurate event sources (*:ppp))
some old version of vtune (difficult to operate, results are unreliable, no hardware drivers loaded)
sprof (results do not tell me anything as there is only a single function to profile -- when avoiding to use attribute flatten then the behavior is fully different)
I'm running
CENTOS 6.5
and
g++ (GCC) 5.3.0
I don't have any influence over the version of the OS or the compiler version.
I complained about the ancient OS some weeks ago, and they upgraded me to what I mentioned above.
In some former live I successfully used the google profiler -- when it was working (and not crashing or hanging due to signal handling problems) it provided useful results.
Anybody any comment?
Could all these unclear results be the result of the fixes for SPECTRE?
Do I need to insist, that certain profiling options are enabled on the machine?
Do I need to insist on the vtune drivers loaded?
Do I need to insist on an uptodate copy of vtune installed?
Compile with -fno-omit-frame-pointers?

C++ program runs much slower on Windows vs Linux when doing file operations?

I use an open source project called ChatScript for natural language processing app development.
When you execute a build operation with ChatScript, it scans all the script files that comprise your chat-bot. In my case, that's hundreds of files. This process takes nearly 30 times longer on Windows 8.1 than it does on Ubuntu 16.04. Therefore I do use Linux for much of my work, but there is a part of my work that I have to do on Windows because of certain associated tools, so I would like to modify the code base so that Windows ChatScript compiles are as fast as on Linux.
Can anyone think of a reason the code would run so much slower on Windows vs. Linux? Are there some C++ file operation codes (read/write/etc.) that are known to be much slower on Windows compared to Linux due to variances in the C++ run-time libraries running on each platform?
By "code running slow" in your last paragraph I'm assuming from context that you're referring to the compiler???
I've encountered frequently and consistently over many years a general, significant performance difference between linux and Windows in disk I/O. NTFS (Windows file system) and the linux file systems handle the situation of lots of files differently, and linux is always quicker in the circumstances I've been in.
You may benefit from some of the pointers in answers to questions like How do I get Windows to go as fast as Linux for compiling C++?, like defragmenting your windows drive, and checking how the compiler optimisations are configured; some of them can slow down the compiler (although an aggressive compiler optimisation setting can slow the compiler down, you produce a faster executable at the end, but that might be something you switch to after most of your development is done).
But doing all those things for me has never got the Windows compile to be quicker than on linux using equivalent disk hardware, not once. If your code is on the one disk and sourced for both compiles, any improvement you'll see in the Windows build (e.g. because the code's put on an SSD) will likely be replicated in an improvement in the linux build as well.
Just to confirm I found the same thing. Ran the same Chatscript scripts on an average Mac and on a fast XPS 15. The Mac compiled the code 30-50 times faster than Windows. What's weird is ChatScript was originally developed for Windows. I also have not worked out why such a gigantic lag, in spite of the Windows PC hardware being much more powerful than the Mac running the script.
So I have come across both an explanation and a (partial) solution. There are two areas of lag in Windows compared to Linux:
Networking behaviour. According to the creator of ChatScript, Bruce
Wilcox, the Windows server code is worse under the hood, and is also
implemented worse in ChatScript for Windows vs ChatScript for Linux.
The lag here is however minor.
Build times. This is where building the bot in Linux takes 10-20
seconds and on Windows 4-5 minutes. Turns out that the reason is the
antivirus:
"Curious about the huge discrepancy, one of our hardware engineers
did some profiling and found the real culprit to be the anti-virus
software. Disabling the real-time virus protection feature of
Windows Defender brought the 4 minutes down to 14 seconds! Even
keeping Windows Defender active, but excluding the
ChatScript-master folder solved much of the slowdown problem,
resulting in about 20 s for :build 0 to complete."
(https://www.chatbots.org/ai_zone/viewthread/3575/)
So for OP, if you exclude CS from Windows Defender, or switch it off, the build differential will largely disappear.

Why is gdb so slow in Windows?

I recently noticed that running a program inside gdb in windows makes it a lot slower, and I want to know why.
Here's an example:
It is a pure C++03 project, compiled with mingw32 (gcc 4.8.1, 32 bits).
It is statically linked against libstdc++ and libgcc, no other lib is used.
It is a cpu and memory intensive non-parallel process (a mesh edition operation, lots of news and deletes and queries to data structures involved).
The problem is not start-up time, the whole process is painfully slow.
Debug build (-O0 -g2) runs in 8 secs outside gdb, but in 140 secs within gdb.
Tested from command line, just launching gdb and just typing "run" (no breakpoints defined).
I also tested a release build (optimized, and without debugging information), and it is still much slower inside gdb (3 secs vs 140 secs; yes, it takes the same time as the not optimized version inside gdb).
Tested with gdb 7.5 and 7.6 from mingw32 project, and with a gdb 7.8 compiled by me (all of them without python support).
I usually develop on a GNU/Linux box, and there I can't notice speed differences between running with or withoud gdb.
I want to know what is gdb doing that is making it run so slowly. I have some basic understanding of how a debugger works, but I cannot figure out what is it doing here, and googling didn't helped me this time.
I've finally found the problem, thanks to greatwolf for asking me to test other debuggers. Ollydbg takes the same time as gdb, so it's not a gdb problem, its a Windows problem. This tip changed my search criteria and then I've found this article* that explains the problem very well and gives a really simple solution: define an environment varible _NO_DEBUG_HEAP to 1. This will disable the use of a special heap system windows provides and c++ programs use.
* Here's the link: http://preshing.com/20110717/the-windows-heap-is-slow-when-launched-from-the-debugger/
I once had issues with gdb being incredibly slow and I remember disabling nls (native language support, i.e. the translations of all the messages) would remedy this.
The configure time option is --disable-nls. I might have just been mistaken as to what is the true cause, but it's worth a shot for you anyways.
My bug report from back then is here, although the conclusion there would be that I was mistaken. If you can provide further insight into this, that would be great!

Increasing (nonstack) memory used by a C/C++ program

I am running a heavy memory intensive job on a windows OS with 12 GB of RAM. By my computations, 4 GB of memory should be enough to run the program. I am running the program I've written with dynamic memory allocation (I have 2 versions of the program in C and C++ with malloc/free and new/delete respectively) using CodeBlocks.
When I pull up task manager, I see that the program only seems to use about 2 GB of RAM, even when I have a lot more available, and the pagefile size is currently set to 30 GB. Is there any way I can get CodeBlocks to use more memory? I also used DEV-C++ and I get the same bad_alloc error in the C++ code.
Any ideas? Thanks in advance.
Oh and I am using a 64-bit Windows 7.
Look at this page for memory limits based on architecture (x86, 64-bit) and Windows version. Some work-arounds are mentioned:
https://learn.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases#memory_limits
First you have to make sure you are building a 64-bit executable and not 32-bit.
If using g++, make sure you use option -m64.
As for large address awareness mentioned in the MSDN page, it should be active by default on 64-bit Windows systems.
Still, the Visual C++ linker has an option to explicitly ask for it: /LARGEADDRESSAWARE
Now if you don't use the Visual C++ linker, it appears you can always use this as an extra step if you want to activate large address awareness for your executable:
editbin /LARGEADDRESSAWARE your_executable
(editbin being an M$ Visual Studio tool)
thanks to all the help so far. There was a simple workaround. I installed mingw 64bit compiler, pointed code blocks to that compiler and everything worked like a charm. yay.

Will console applications compiled on Windows work on Linux and MAC OS X using DOSBox?

The doubt
I have written some code in Microsoft Visual C++ 2010 Express as so:
#include<iostream>
int main()
{
system("cls");
char name[20];
cout<<"\nEnter your name:";
cin.getline(name,20);
system("pause");
cout<<"\nYour name is:"<<name;
system("pause");
return 0;
}
And now I have compiled it and sent it to a friend on a Linux machine. he downloads the DOSBox software and then runs this program.
THE DOUBT
Will it run as it does on my machine or will this create any problem?
why I am asking this?
I recently downloaded a linux live cd and ran it on my machine. I can't install it on this machine as it is a shared PC. Anyway, I typed cls into the terminal and there was no response. I typed pause again there was no response. So it set me wondering if the command "cls" that i am passing to the system in the above code will really have any effect on a linux machine.
There are a few reasons why this program won't work on other machines - I will summarise the two main ones:
You use system instructions which are not supported by other operating systems. If you attempt the run these instructions on a different OS, the OS will complain that it doesn't understand them and the program will crash.
(And probably more importantly,) the Windows executable you have created is a Windows .exe file which is Microsoft's Portable Executable format. Linux can only read executables in ELF format, and Mac OS X uses the Mach-O format.
These two points are worth discussion in their own right, and as Joachim pointed out in the comments, the WINE emulator is quite good at emulating a windows environment on Linux, so this may be an option for program compatibility.
EDIT: I should add here that Point 1 assumes that Point 2 has been overcome. Point 2 is the reason executables on one OS just plain "don't work" on other operating systems.
Response to comment:
Generally, yes, ELF files are the standard for all Linux distros (there may be a few rare exceptions). Similarly, PE files are the standard for all Windows versions. Provided you have a relatively up to date CPU, then if you compile an executable on one Linux distro, then it should work on others.
The exception here is, if you compile the program on a machine with a recent CPU, and wish to run it on a machine with a very old CPU, the old CPU may not support some of the instructions that the compiler creates. However, these days just compiling a program with the default settings generally works on all (Intel) CPUs. If you know for a fact that your target machine uses a very different or older CPU, you can add the -march=... compiler option so the compiler generates instructions that will definitely work on the target machine.
Finally, DOSBox is not a Windows Emulator, it is a DOS emulator. The two systems, despite their history, are quite different. DOSBox is not designed to run native Windows applications, it is designed to run native DOS applications (most of which are abandonware these days). If you'd like to run DOS programs on Linux such as Dangerous Dave (one of my nostalgic favourites), then you can. However, if you wish to run Windows applications, you will need an emulator designed for this purpose, such as WINE.
For reference, DOS uses the obsolete MZ Executable format.
pause and cls most likely will not work directly in other OSes because these are Windows/DOS-specific commands.
If you remove the DOS-specific commands and make the program generic, then the EXE file built in Windows can most probably be executed in Linux or MacOS through Wine. Please see http://www.winehq.org/about/ and http://wiki.winehq.org/MacOSX . I'm saying "most probably" because you still need to try it out to see if there are problems.
If you run your EXE executable inside a virtualized environment running Windows in it like Virtual Box, then it will work.
On Linux, the command to clear the screen is clear. Is that what you're really intending to do?