Finding out CPU activity between two timestamps under Windows - c++

I'm interested in finding the total activity of the system between two instants in time. Under Linux, I can do this by reading /proc/stat twice. Under MacOS, I can call host_statistics twice. Under BSDs, I can also manage with a sysctl.
However, I have no idea how to do this under Windows. The closest equivalent I have found is \Processor(*)\% Processor Time, but this only gives me some kind of "current activity" (10ms-scale, if I understand correctly), rather than a total activity.
I am coding in regular (unmanaged) C++. My code needs to check the system use every ~10-15 seconds.

Ah, it seems that GetSystemTimes will provide the data I need.

Related

Performance counter for Halide?

Is there a performance counter available for code written in the Halide language? I would like to know how many loads, stores, and ALU operations are performed by my code.
The Halide tutorial for scheduling multi-stage pipelines compares different schedules by comparing the amount of allocated memory, loads, stores, and calls to halide Funcs, but I don't see how this information was collected. I suppose it might be possible to use trace_stores, trace_loads, and trace_realizations to print to the console every time one of these operations occurs. This isn't a great option though because it would greatly slow down the program's execution and would require some kind of counting script to compile the long list of console outputs into the desired counts for loads, stores, and ALU operations.
I'm pretty sure they just used the trace_xxx output and ran some scripts/programs on it.
If you're looking for real performance numbers on a X86 platform, I would go with Intel VTune Amplifier. It's pretty expensive, but may be free if you're in academia (student, teacher, researcher) or it's for an open source project.
Other than that, look at the lowered statement code by setting HL_DEBUG_CODEGEN=1 in the environment and you can get a better idea of the loop structure and data use. Note that this output goes to stderr, not stdout.
EDIT: For Linux, there's perf.
We do not have any perf counter based support at present. It is fairly difficult to make it portable. (And on mobile devices, often the OS simply doesn't allow access to the hardware.) The support in Profiling.cpp and src/profiling.cpp could likely be used to drive perf counter operation. The profiling lowering pass adds code to call routines in the runtime which update information about Func and Pipeline execution. This information is collected and aggregated by another thread.
If tracing is run to a file (e.g. using HL_TRACE_FILE) a binary format is used and it is a bit more efficient. See utils/HalideTraceViz for a tool to work with the binary format. This is generally how analyses are done within the team.
There was a small amount of investigation of OProfile, which looked promising but I don't think we ever got code working.

check the performance of an exe through code

I want to check the performance of an application (whose exe i have, no source code) by running it multiple times and possibly compare the results, dint find much on the internet regarding this topic,
Since i have to do it with multiple input times, i thought doing it through code(no bar on the language used) can make things easier, as i may have to repeat them many times,
can anyone help me start off???
Note: by Performance i mean the memory usage, cpu and possibly the time taken to do it!
(I'm currently using perfmon on windows by using necessary counters to check these parameters and manually noting it down)
Thanks
It strongly depends upon your operating system. On Linux, you could use the time utility. And strace might help you understanding the system calls that are used.
I have no idea of the equivalent on Windows systems.
I think that you could create a bash/batch script to call your program as many times as you need and with different inputs.
You could then have your script create a CSV file that contains the time it took to execute your program (start date and end date for example). CSV files are usually compatible with most spreadsheet programs like Excel, so I think that can make it easier for you to process your data, like creating means and standard deviations.
I don't have much to say regarding the memory and CPU usage, but if you are in Windows it wouldn't hurt to take a look at the Process Explorer and the Process Monitor (you can find them in this page). I think that they might help you in your task.
Finally if you are in Linux I think that you might be able to use grep with the top command to gather some statistics.
Regards,
Felipe
If you want exact results, Rational Purify (on Windows), or valgrind (on Linux) are the best tools; these run your application in a virtual machine that can be instructed to do exact cycle counting.
In another post an utility named timethis.exe was mentioned for measuring time under Windows. Maybe it is useful for your purposes.
I used the perform im using to manually note down in an automated way,
that is, i used the performance counter class available in dot net and obtained samples of the particular application at regular intervals and generated a graph with those values..
Thanks :)

Schedule task on precise periods in Linux or Windows

I have this weird question.
I would like to know if it is possible to make a program in C/C++ that will run on Linux or Windows and will hook interrupt handler on a system timer set to specific period (2000 times per second, for example) and I want this interrupt to be with highest priority, meaning that it has to be executed every half millisecond and while executing it must not be interrupted.
This we have done with MS-DOS with Borland Turbo C 3.1. We have an interface card (our own) that runs on ISA slot. Every half millisecond, our program reads the state of electronics that is controlling an industrial process thru the interface. This has worked for us in the past 15 years, but we are running out of motherboards that have ISA slot, so we are looking for new solutions.
We also have solution based on PIC microcontrollers, but our horizons will be widened with general purpose processor.
My guess is that there are some customized Linux kernels for embedded applications, so I am looking for some sources with which we can start experimenting.
Yes, you can do that in MS-DOS because it is not a multi-user or multi-tasking operating system. However, the same thing will not work in Windows because it is a mult-user and multi-tasking operating system. It's also not real-time, which means there's no guarantee that your task will be executed exactly when you ask for it to be executed. Everything is pre-emptively scheduled, meaning that any number of other processes and tasks (either user-mode or system-level) could effectively "bump" your process down the priority list and force it to wait to be executed until those other tasks completed or were themselves interrupted to give your process a chance to run for a while.
I don't know about Linux, but I imagine most of the major distributions are written similarly to Windows.
You will need to find a real-time, single-user operating system to do this. A Unix-derivative is probably the best place to start looking, but I won't be the person able to suggest one.
Alternatively, you could continue using MS-DOS (or alternatives such as FreeDOS), but switch to a different interface technology that is available on newer boards. There's no reason to update something that works for you, especially if the updates are counter-productive to your goal.
A typical OS such as a standard Linux or Windows is not designed to, and will not be able to perform to that degree of real-time accuracy and availability.
It sounds to me like you need to be investigating Real-Time Linux, or similar.
RTLinux is a modified version of the Linux Kernel which is designed to perform in real-time, precicely for applications such as this.
Hope that helps.
Personal and affordable computing has increased in performance incredibly over the years, except in one area, low latency. Latency has actually increased in many use cases when you compare a 486 and a modern desktop CPU.
That said, have a look at this paper, where the authors come to the conclusion that sub-millisecond scheduling is possible in Linux on commodity hardware.

What profiler should I use to measure _real_ time (including waiting for syscalls) spend in this function, not _CPU_ one

The application does not calculate things, but does i/o, read files, uses network. I want profiler to show it.
I expect something like something like in callgrind that calls clock_gettime each proble.
Or like oprofile that interrupts my application (while it is sleeping or waiting for socket/file/whatever) to see what is it doing.
I want things like "read", "connect", "nanosleep", "send" and especially "fsync" (And all their callers) to be bold (not things like string or number functions that perform calculations).
Platform: GNU/Linux # i386
Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html
It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.
As it probes the program periodically, we'll see functions that waits for something.
And as it walks the stack, we'll see callers too.
Also people from answers to similar questions recommends Zoom.
There's no real way to answer that question without knowing your platform and a little more about what you are trying to do.
When I'm running on an Intel platform, I like to use the Time Stamp Counter (RDTSC). It is tough to beat resoultion in the sub-microsecond range. I just put a call to it before and after a chunk of code, and compare the difference.

How to profile multi-threaded C++ application on Linux?

I used to do all my Linux profiling with gprof.
However, with my multi-threaded application, it's output appears to be inconsistent.
Now, I dug this up:
http://sam.zoy.org/writings/programming/gprof.html
However, it's from a long time ago and in my gprof output, it appears my gprof is listing functions used by non-main threads.
So, my questions are:
In 2010, can I easily use gprof to profile multi-threaded Linux C++ applications? (Ubuntu 9.10)
What other tools should I look into for profiling?
Edit: added another answer on poor man's profiler, which IMHO is better for multithreaded apps.
Have a look at oprofile. The profiling overhead of this tool is negligible and it supports multithreaded applications---as long as you don't want to profile mutex contention (which is a very important part of profiling multithreaded applications)
Have a look at poor man's profiler. Surprisingly there are few other tools that for multithreaded applications do both CPU profiling and mutex contention profiling, and PMP does both, while not even requiring to install anything (as long as you have gdb).
Try modern linux profiling tool, the perf (perf_events): https://perf.wiki.kernel.org/index.php/Tutorial and http://www.brendangregg.com/perf.html:
perf record ./application
# generates profile file perf.data
perf report
Have a look at Valgrind.
A Paul R said, have a look at Zoom. You can also use lsstack, which is a low-tech approach but surprisingly effective, compared to gprof.
Added: Since you clarified that you are running OpenGL at 33ms, my prior recommendation stands. In addition, what I personally have done in situations like that is both effective and non-intuitive. Just get it running with a typical or problematic workload, and just stop it, manually, in its tracks, and see what it's doing and why. Do this several times.
Now, if it only occasionally misbehaves, you would like to stop it only while it's misbehaving. That's not easy, but I've used an alarm-clock interrupt set for just the right delay. For example, if one frame out of 100 takes more than 33ms, at the start of a frame, set the timer for 35ms, and at the end of a frame, turn it off. That way, it will interrupt only when the code is taking too long, and it will show you why. Of course, one sample might miss the guilty code, but 20 samples won't miss it.
I tried valgrind and gprof. It is a crying shame that none of them work well with multi-threaded applications. Later, I found Intel VTune Amplifier. The good thing is, it handles multi-threading well, works with most of the major languages, works on Windows and Linux, and has many great profiling features. Moreover, the application itself is free. However, it only works with Intel processors.
You can randomly run pstack to find out the stack at a given point. E.g. 10 or 20 times.
The most typical stack is where the application spends most of the time (according to experience, we can assume a Pareto distribution).
You can combine that knowledge with strace or truss (Solaris) to trace system calls, and pmap for the memory print.
If the application runs on a dedicated system, you have also sar to measure cpu, memory, i/o, etc. to profile the overall system.
Since you didn't mention non-commercial, may I suggest Intel's VTune. It's not free but the level of detail is very impressive (and the overhead is negligible).
Putting a slightly different twist on matters, you can actually get a pretty good idea as to what's going on in a multithreaded application using ftrace and kernelshark. Collecting the right trace and pressing the right buttons and you can see the scheduling of individual threads.
Depending on your distro's kernel you may have to build a kernel with the right configuration (but I think that a lot of them have it built in these days).
Microprofile is another possible answer to this. It requires hand-instrumentation of the code, but it seems like it handles multi-threaded code pretty well. And it also has special hooks for profiling graphics pipelines, including what's going on inside the card itself.