Measure actual runtime of a procedure using Qt - c++

I want to measure the runtime of a procedure that will last multiple days. This means the executing process will be interrupted many times since the Linux PC will be suspended to standby repeatedly. I don't want to include those sleep phases in my time measurement so I can't use simple time stamps.
Is there a better way to measure the net runtime of a procedure other than firing a QTimer every second and counting those timeouts?

As G.M. proposed, I used boost::timer::cpu_timer time to solve my issue. It returns the summed up computation time and the run time of the entire process which is fine for my purpose. Boost is not Qt but it is cross-platform anyway.

Related

How do I measure GPU time on Metal?

I want to see programmatically how much GPU time a part of my application consumes on macOS and iOS. On OpenGL and D3D I can use GPU timer query objects. I searched and couldn't find anything similar for Metal. How do I measure GPU time on Metal without using Instruments etc. I'm using Objective-C.
There are a couple of problems with this method:
1) You really want to know what is the GPU side latency within a command buffer most of the time, not round trip to CPU. This is better measured as the time difference between running 20 instances of the shader and 10 instances of the shader. However, that approach can add noise since the error is the sum of the errors associated with the two measurements.
2) Waiting for completion causes the GPU to clock down when it stops executing. When it starts back up again, the clock is in a low power state and may take quite a while to come up again, skewing your results. This can be a serious problem and may understate your performance in benchmark vs. actual by a factor of two or more.
3) if you start the clock on scheduled and stop on completed, but the GPU is busy running other work, then your elapsed time includes time spent on the other workload. If the GPU is not busy, then you get the clock down problems described in (2).
This problem is considerably harder to do right than most benchmarking cases I've worked with, and I have done a lot of performance measurement.
The best way to measure these things is to use on device performance monitor counters, as it is a direct measure of what is going on, using the machine's own notion of time. I favor ones that report cycles over wall clock time because that tends to weed out clock slewing, but there is not universal agreement about that. (Not all parts of the hardware run at the same frequency, etc.) I would look to the developer tools for methods to measure based on PMCs and if you don't find them, ask for them.
You can add scheduled and completed handler blocks to a command buffer. You can take timestamps in each and compare. There's some latency, since the blocks are executed on the CPU, but it should get you close.
With Metal 2.1, Metal now provides "events", which are more like fences in other APIs. (The name MTLFence was already used for synchronizing shared heap stuff.) In particular, with MTLSharedEvent, you can encode commands to modify the event's value at particular points in the command buffer(s). Then, you can either way for the event to have that value or ask for a block to be executed asynchronously when the event reaches a target value.
That still has problems with latency, etc. (as Ian Ollmann described), but is more fine grained than command buffer scheduling and completion. In particular, as Klaas mentions in a comment, a command buffer being scheduled does not indicate that it has started executing. You could put commands to set an event's value at the beginning and (with a different value) at the end of a sequence of commands, and those would only notify at actual execution time.
Finally, on iOS 10.3+ but not macOS, MTLCommandBuffer has two properties, GPUStartTime and GPUEndTime, with which you can determine how much time a command buffer took to execute on the GPU. This should not be subject to latency in the same way as the other techniques.
As an addition to Ken's comment above, GPUStartTime and GPUEndTime is now available on macOS too (10.15+):
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1639926-gpuendtime?language=objc

SystemC - measure and include file parsing time in systemc simulation

I have a simple C++ function which parse CSV (10-10k rows) file line-by-line and insert each field in defined structure, array of structures to be more specific.
Now I want to measure parsing time using systemc methods (without C++ utilities, e.g clock()) and include it in simulation like any other process and generate trace file - is it possible at all?
For couple of days I am struggling doing that in every possible way. Actually I have realised that sc_time_stamp() is useless in that particular case as it only shows declared sc_start() simulation elapsed time.
I thought it will be as simple as:
wait() until pos.edg
parseFile()
signal.write(1)
doOtherStuff()
but apparently it's not...
Internet is full of adders, counters and other logic stuff examples but I did not found anything related to my problem.
Thanks in advance!
Your question "measure parsing time using systemc" and the pseudo code example you gave appear to indicate you want to measure real-time. The pseudo-code
wait some-event
parseFile()
signal.write(1)
...
Will take exactly zero simulation time since (unless parseFile() itself has wait statements) parseFile() is carried out in one delta cycle. If parseFile() does have wait statements (wait 10ms for example) then in simulation time it will take 10 ms. The only way systemc has of knowing how long a process takes to complete is if you tell it.
SystemC is a modeling library. So you can only measure simulation time with sc_time_stamp.
If you want to measure physical time you will need to measure it with some other C/C++ library (Easily measure elapsed time).
Then, if you want to add this time to your simulation time, you can put
wait( measured_parsing_time );

How to use .. QNX Momentics Application Profiler?

I'd like to profile my (multi-threaded) application in terms of timing. Certain threads are supposed to be re-activated frequently, i.e. a thread executes its main job once every fixed time interval. In other words, there's a fixed time slice in which all the threads a getting re-activated.
More precisely, I expect certain threads to get activated every 2ms (since this is the cycle period). I made some simplified measurements which confirmed the 2ms to be indeed effective.
For the purpose of profiling my app more accurately it seemed suitable to use Momentics' tool "Application Profiler".
However when I do so, I fail to interpret the timing figures that I selected. I would be interested in the average as well in the min and max time it takes before a certain thread is re-activated. So far it seems, the idea is to be only able to monitor the times certain functions occupy. However, even that does not really seem to be the case. E.g. I've got 2 lines of code that are put literally next to each other:
if (var1 && var2 && var3) var5=1; takes 1ms (avg)
if (var4) var5=0; takes 5ms (avg)
What is that supposed to tell me?
Another thing confuses me - the parent thread "takes" up 33ms on avg, 2ms on max and 1ms on min. Aside the fact that the avg shouldn't be bigger than max (i.e. even more I expect avg to be not bigger than 2ms - since this is the cycle time), it's actually increasing the longer I run the the profiling tool. So, if I would run the tool for half an hour the 33ms would actually be something like 120s. So, it seems that avg is actually the total amount of time the thread occupies the CPU.
If that is the case, I would assume to be able to offset against the total time using the count figure which doesn't work either. Mostly due to the figure being almost never available - i.e. there is only as a separate list entry (for every parent thread) called which does not represent a specific process scope.
So, I read QNX community wiki about the "Application Profiler", incl. the manual about "New IDE Application Profiler Enhancements", as well as the official manual articles about how to use the profiler tool.. but I couldn't figure out how I would use the tool to serve my interest.
Bottom line: I'm pretty sure I'm misinterpreting and misusing the tool for what it was intended to be used. Thus my question - how would I interpret the numbers or use the tool's feedback properly to get my 2ms cycle time confirmed?
Additional information
CPU: single core
QNX SDP 6.5 / Momentics 4.7.0
Profiling Method: Sampling and Call Count Instrumentation
Profiling Scope: Single Application
I enabled "Build for Profiling (Sampling and Call Count Instrumentation)" in the Build Options1
The System Profiler should give you what you are looking for. It hooks into the micro kernel and lets you see the state of all threads on the system. I used it in a similar setup to find out what our system was getting unexpected time-outs. (The cause turned out to be Page Waits on critical threads.)

how to detect if a thread or process is getting starved due to OS scheduling

This is on Linux OS. App is written in C++ with ACE library.
I am suspecting that one of the thread in the process is getting blocked for unusually long time(5 to 40 seconds) sometimes. The app runs fine most of the times except couple times a day it has this issue. There are other similar 5 apps running on the box which are also I/O bound due to heavy socket incoming data.
I would like to know if there is any thing I can do programatically to see if the thread/process are getting their time slice.
If a process is being starved out, self monitoring for that process would not be that productive. But, if you just want that process to notice it hasn't been run in a while, it can call times periodically and compare the relative difference in elapsed time with the relative difference in scheduled user time (you would sum the tms_utime and tms_cutime fields if you want to count waiting for children as productive time, and you would sum in the tms_stime and tms_cstime fields if you count kernel time spent on your behalf to be productive time). For thread times, the only way I know of is to consult the /proc filesystem.
A high priority external process or high priority thread could externally monitor processes (and threads) of interest by reading the appropriate /proc/<pid>/stat entries for the process (and /proc/<pid>/task/<tid>/stat for the threads). The user times are found in the 14th and 16th fields of the stat file. The system times are found in the 15th and 17th fields. (The field positions are accurate for my Linux 2.6 kernel.)
Between two time points, you determine the amount of elapsed time that has passed (a monitor process or thread would usually wake up at regular intervals). Then the difference between the cumulative processing times at each of those time points represents how much time the thread of interest got to run during that time. The ratio of processing time to elapsed time would represent the time slice.
One last bit of info: On Linux, I use the following to obtain the tid of the current thread for examining the right task in the /proc/<pid>/task/ directory:
tid = syscall(__NR_gettid);
I do this, because I could not find the gettid system call actually exported by any library on my system, even though it was documented. But, it might be available on yours.

Qt Get the amount of up time for current application

Is there a way in qt to get the up time of the application as well as the up time for the system?
Thanks in advance.
You can use the QElapsedTimer class from Qt 4.7 to get uptime for your app. This class will use monotonic clocks if it can.
Just create an instance, and call start on it at the start of your program. From then on, you can get the number of milliseconds your program has been running (or more precisely, since the call to start) by calling
myElapsedTimer.elapsed()
On Windows you can simply calculate by calling Winapi function to get process start datetime.
More information you can find at http://www.codeproject.com/KB/threads/ProcessTime.aspx
On Linux, you can use the times system call to tell you elapsed processor time. This will not count the time your program has been idle waiting for input, or blocked waiting for input, or the time that it's been preempted by other programs also running on the system. (Therefore, this makes it very good for benchmarks.)