Accurate benchmark under windows - c++

I am writing a program that needs to run a set of executables and find their execution times.
My first approach was just to run a process, start a timer and see the difference between the start time and the moment when process returns exit value.
Unfortunately, this program will not run on the dedicated machine so many other processes / threads can greatly change the execution time.
I would like to get time in milliseconds / clocks that actually was given to the process by OS. I hope that windows stores that information somewhere but I cannot find anything useful on the msdn.
Sure one solution is to run the process multiple times and calculate the avrage time, but I wont to avoid that.
Thanks.

You can take a look at GetProcessTimes API.

The "High-Performance Counter" might be what you're looking for.
I've used QueryPerformanceCounter/QueryPerformanceFrequency for high-resolution timing in stuff like 3D programming where the stock functionality just doesn't cut it.
You could also try the RDTSC x86 instruction.

Related

Get time used by the process only

I have a sample piece of code to check how much time it takes to execute. Hence I measure the timestamp before and after its execution and then compute the time in milliseconds.
However the output is dependent on system load and process priorities. As such I am not getting the correct reading.
How can I get the actual time spent by process only for its execution?
Platform Windows Compiler - VC and MinGW
Use Win32 function:
QueryProcessCycleTime()

time taken by forked child process

This is a sequel to my previous question. I am using fork to create child process. Inside child, I am giving command to run a process as follows:
if((childpid=fork())==0)
{
system("./runBinary ");
exit(1)
}
My runBinary has the functionality of measuring how much time it takes from start to finish.
What amazes me is that when I run runBinary directly on command-line, it takes ~60 seconds. However, when I run it as a child process, it takes more, like ~75 or more. Is there something which I can do or am currently doing wrong, which is leading to this?
Thanks for the help in advance.
MORE DETAILS: I am running on linux RHEL server, with 24 cores. I am measuring CPU time. At a time, I only fork 8 child (sequentially), each of which is bound to different core, using taskset (not shown in code). The system is not loaded except for my own program.
The system() function is to invoke the shell. You can do anything inside it, including running a script. This gives you a lot of flexibility, but it comes with a price: you're loading a shell, and then runBinary inside it. Although I don't think loading the shell would be responsible to so much time difference (15 seconds is a lot, after all), since it doesn't seem you need that - just to run the app - try using something from the exec() family instead.
Without profiling the application, if the parent process which forks has a large memory space, you might find that there is time spent attempting to fork the process itself, and attempts to duplicate the memory space.
This isn't a problem in Red Hat Enterprise Linux 6, but was in earlier versions of Red Hat Enterprise Linux 5.

I tried to come up with a cross platform alternative to sleep(), but my code isn't quite working

I'm doing a little beginner c++ program based on the game of snap.
When i output the card objects to the console, because of the computers processing speed naturally, a whole list of the cards that were dealt just appears. I thought it might be nice if i could put a pause between each card deal so that a human could actually observe each card being dealt. Since i'm always working on both Linux and Windows and already had < ctime > included i came up with this little solution:
for(;;){
if( (difftime(time(0),lastDealTime)) > 0.5f){ //half second passed
cout << currentCard <<endl;
lastDealTime = time(0);
break;
}
}
At first i thought it had worked but then when i tried to speed up the dealing process later i realised that changing the control value of 0.5 (i was aiming for a card deal every half a second) didn't seem to have any effect.. i tried changing it to deal every 0.05 seconds and it made no difference, cards still seemed to be output every second i would guess.
Any observations as to why this wouldn't be working? Thanks!
The resolution of time() is one second -- i.e., the return value is an integral number of seconds. You'll never see a difference less than a second.
usleep() is in the standard C library -- it has a resolution in microseconds, so use that instead.
time() and difftime() have a resolution of a second, to there's no
way to use them to manage intervals of less than a second; even for
intervals of a second, they're not usable, since the jitter may be up to
a second as well.
In this case, the solution is to define some sort of timer class, with a
system independent interface in the header file, but system dependent
source files; depending on the system, you compile one source file or
the other. Both Windows and Linux do have ways of managing time with
higher resolution.
If you want to make sure that the cards deal at precisely the interval you request, then you should probably create a timer class too. We use:
In Windows use QueryPerformanceFrequency to get the system tick time and QueryPerformanceCounter to get the ticks
On Mac Carbon use DurationToAbsolute to get system tick time and UpTime to get the ticks.
On Linux use clock_gettime.
For sleep use:
One Windows use Sleep();
On Mac Carbon use MPDelayUntil();
On Linux use nanosleep();
the big issue with your code from the way I see it is not the fact that you have not found a single-platform version of sleep but the fact that sleep is actually meant to stop the CPU from processing for a period of time, but yours will not stop processing and your application will use up lots of resources.
Of course if your computer is dedicated to just running one application it might not matter, but nowadays we expect our computers to be doing more than just one thing.

gdb: How do I pause during loop execution?

I'm writing a software renderer in g++ under mingw32 in Windows 7, using NetBeans 7 as my IDE.
I've been needing to profile it of late, and this need has reached critical mass now that I'm past laying down the structure. I looked around, and to me this answer shows the most promise in being simultaneously cross-platform and keeping things simple.
The gist of that approach is that possibly the most basic (and in many ways, the most accurate) way to profile/optimise is to simply sample the stack directly every now and then by halting execution... Unfortunately, NetBeans won't pause. So I'm trying to find out how to do this sampling with gdb directly.
I don't know a great deal about gdb. What I can tell from the man pages though, is that you set breakpoints before running your executable. That doesn't help me.
Does anyone know of a simple approach to getting gdb (or other gnu tools) to either:
Sample the stack when I say so (preferable)
Take a whole bunch of samples at random intervals over a given period
...give my stated configuration?
Have you tried simply running your executable in gdb, and then just hitting ^C (Ctrl+C) when you want to interrupt it? That should drop you to gdb's prompt, where you can simply run the where command to see where you are, and then carry on execution with continue.
If you find yourself in a irrelevant thread (e.g. a looping UI thread), use thread, info threads and thread n to go to the correct one, then execute where.

Linux time sample based profiler

short version:
Is there a good time based sampling profiler for Linux?
long version:
I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.
The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.
Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).
Here is my OProfile configuration:
$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536
Is there another profiler for Linux that could have found the bottleneck?
I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.
If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.
Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.
If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.
Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.
I wrote this a long time ago, only because I couldn't find anything better: https://github.com/dicej/profile
I just found this, too, though I haven't tried it: https://github.com/oliver/ptrace-sampler
Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html
It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.
After trying everything suggested here (except for the now-defunct Zoom, which is still available as huge file from dropbox), I found that NOTHING does what Mr. Dunlavey recommends. The "quick hacks" listed above in some of the answers wouldn't build for me, or didn't work for me either. Spent all day trying stuff... and nothing could find fseek as a hotspot in an otherwise simple test program that was I/O bound.
So I coded up yet another profiler, this time with no build dependencies, based on GDB, so it should "just work" for almost any debuggable code. A single CPP file.
https://github.com/jasonrohrer/wallClockProfiler
It automates the manual process suggested by Mr. Dunlavey, interrupting the target process with GDB periodically and harvesting a stack trace, and then printing a report at the end about which stack traces are the most common. Those are your true wall-clock hotspots. And it actually works.