I am trying to get the total time a particular thread spent so far programatically.
getrusage returns a thread's CPU time but I want the total time i.e. including the time spent by the thread being blocked for whatever reason.
Please note that I will be making use of this functionality by instrumenting a given program using a profiler that I wrote.
A program may have many threads (I am focusing on profiling servers so there can be many). At any given time I would want to know how much time a particular thread spent (so far). So its not convenient to start a timer for every thread as they are spawned. So I would want something of usage similar to getrusage e.g. it returns the total time of the current thread or maybe I can pass to it a thread id. So manual mechanisms like taking a timestamp when the thread was spawned and one later then taking their difference won't be very helpful for me.
Can anyone suggest how to do this?
Thanks!
Save the current time at the point when the thread is started. The total time spent by the thread, counting both running and blocked time, is then just:
current_time - start_time
Of course this is almost always useless/meaningless, which is why there's no dedicated API for it.
Depending on what you want to use this for, one possibility to think about is to sum the number of clock ticks consumed during blocking, which typically is slow enough hide a little overhead like that. So from that sum and the surrounding thread interval you also measure, you can compute the real time load on your thread over that interval. Of course, time-slicing with other processes will throw this off some amount, and capturing all blocking may be very easy or very hard, depending on your situation.
Related
I used Sleep(500) in my code and I used getTickCount() to test the timing. I found that it has a cost of about 515ms, more than 500. Does somebody know why that is?
Because Win32 API's Sleep isn't a high-precision sleep, and has a maximum granularity.
The best way to get a precision sleep is to sleep a bit less (~50 ms) and do a busy-wait. To find the exact amount of time you need to busywait, get the resolution of the system clock using timeGetDevCaps and multiply by 1.5 or 2 to be safe.
sleep(500) guarantees a sleep of at least 500ms.
But it might sleep for longer than that: the upper limit is not defined.
In your case, there will also be the extra overhead in calling getTickCount().
Your non-standard Sleep function may well behave in a different matter; but I doubt that exactness is guaranteed. To do that, you need special hardware.
As you can read in the documentation, the WinAPI function GetTickCount()
is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds.
To get a more accurate time measurement, use the function GetSystemDatePreciseAsFileTime
Also, you can not rely on Sleep(500) to sleep exactly 500 milliseconds. It will suspend the thread for at least 500 milliseconds. The operating system will then continue the thread as soon as it has a timeslot available. When there are many other tasks running on the operating system, there might be a delay.
In general sleeping means that your thread goes to a waiting state and after 500ms it will be in a "runnable" state. Then the OS scheduler chooses to run something according to the priority and number of runnable processes at that time. So if you do have high precision sleep and high precision clock then it is still a sleep for at least 500ms, not exactly 500ms.
Like the other answers have noted, Sleep() has limited accuracy. Actually, no implementation of a Sleep()-like function can be perfectly accurate, for several reasons:
It takes some time to actually call Sleep(). While an implementation aiming for maximal accuracy could attempt to measure and compensate for this overhead, few bother. (And, in any case, the overhead can vary due to many causes, including CPU and memory use.)
Even if the underlying timer used by Sleep() fires at exactly the desired time, there's no guarantee that your process will actually be rescheduled immediately after waking up. Your process might have been swapped out while it was sleeping, or other processes might be hogging the CPU.
It's possible that the OS cannot wake your process up at the requested time, e.g. because the computer is in suspend mode. In such a case, it's quite possible that your 500ms Sleep() call will actually end up taking several hours or days.
Also, even if Sleep() was perfectly accurate, the code you want to run after sleeping will inevitably consume some extra time.
Thus, to perform some action (e.g. redrawing the screen, or updating game logic) at regular intervals, the standard solution is to use a compensated Sleep() loop. That is, you maintain a regularly incrementing time counter indicating when the next action should occur, and compare this target time with the current system time to dynamically adjust your sleep time.
Some extra care needs to be taken to deal with unexpected large time jumps, e.g. if the computer was temporarily suspected or if the tick counter wrapped around, as well as the situation where processing the action ends up taking more time than is available before the next action, causing the loop to lag behind.
Here's a quick example implementation (in pseudocode) that should handle both of these issues:
int interval = 500, giveUpThreshold = 10*interval;
int nextTarget = GetTickCount();
bool active = doAction();
while (active) {
nextTarget += interval;
int delta = nextTarget - GetTickCount();
if (delta > giveUpThreshold || delta < -giveUpThreshold) {
// either we're hopelessly behind schedule, or something
// weird happened; either way, give up and reset the target
nextTarget = GetTickCount();
} else if (delta > 0) {
Sleep(delta);
}
active = doAction();
}
This will ensure that doAction() will be called on average once every interval milliseconds, at least as long as it doesn't consistently consume more time than that, and as long as no large time jumps occur. The exact time between successive calls may vary, but any such variation will be compensated for on the next interation.
Default timer resolution is low, you could increase time resolution if necessary. MSDN
#define TARGET_RESOLUTION 1 // 1-millisecond target resolution
TIMECAPS tc;
UINT wTimerRes;
if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) != TIMERR_NOERROR)
{
// Error; application can't continue.
}
wTimerRes = min(max(tc.wPeriodMin, TARGET_RESOLUTION), tc.wPeriodMax);
timeBeginPeriod(wTimerRes);
There are two general reasons why code might want a function like "sleep":
It has some task which can be performed at any time that is at least some distance in the future.
It has some task which should be performed as near as possible to some moment in time some distance in the future.
In a good system, there should be separate ways of issuing those kinds of requests; Windows makes the first easier than the second.
Suppose there is one CPU and three threads in the system, all doing useful
work until, one second before midnight, one of the threads says it won't have
anything useful to do for at least a second. At that point, the system will
devote execution to the remaining two threads. If, 1ms before midnight,
one of those threads decides it won't have anything useful to do for at least
a second, the system will switch control to the last remaining thread.
When midnight rolls around, the original first thread will become available to
run, but since the presently-executing thread will have only had the CPU for
a millisecond at that point, there's no particular reason the original first
thread should be considered more "worthy" of CPU time than the other thread
which just got control. Since switching threads isn't free, the OS may very
well decide that the thread that presently has the CPU should keep it until
it blocks on something or has used up a whole time slice.
It might be nice if there were a version of "sleep" which were easier to use
than multi-media timers but would request that the system give the thread a
temporary priority boost when it becomes eligible to run again, or better yet
a variation of "sleep" which would specify a minimum time and a "priority-
boost" time, for tasks which need to be performed within a certain time window. I don't know of any systems that can be easily made to work that way, though.
How can I measure time required to create and launch thread?
(Linux, pthreads or boost::thread).
Thanks for advices!
You should probably specify what exactly you want to measure, since there are at least 2 possible interpretations (the time the original thread is "busy" inside pthread_create versus the time from calling pthread_create until the other thread actually executes its first instruction).
In either case, you can query monotonic real time using clock_gettime with CLOCK_MONOTONIC before and after the call to pthread_create or before the call and as the first thing inside the thread funciton. Then, subtract the second value from the first.
To know what time is spent inside phtread_create, CLOCK_THREAD_CPUTIME_ID is an alternative, as this only counts the actual time your thread used.
Alltogether, it's a bit pointless to measure this kind of thing, however. It tells you little to nothing at all about how it will behave under real conditions on your system or another system, with unknown processes and unknown scheduling strategies and priorities.
On another machine, or on another day, your thread might just be scheduled 100 or 200 milliseconds later. If you depend on the fact that this won't happen, you're dead.
EDIT:
In respect of the info added in above comment, if you need to "perform actions on non-regular basis" on a scale that is well within "normal scheduling quantums", you can just create a single thread and nanosleep for 15 or 30 milliseconds. Of course, sleep is not terribly accurate or reliable, so you might instead want to e.g. block on a timerfd (if portability is not topmost priority, otherwise use a signal-delivering timer).
It's no big problem to schedule irregular intervals with a single timer/wait either, you only need to keep track of when the next event is due. This is how modern operating systems do it too (read up on "timer coalescing").
There a bunch of other questions like this, but the only substantial answer I've seen is the one where you use SetPriorityClass to give priority to other processes. This is not what I want. I want to explicitly limit the CPU usage of my thread/process.
How can I do this?
Edit: I can't improve the efficiency of the process itself, because I'm not controlling it. I'm injecting my code into a game which I'd like to 'automate' in the background.
The best solution to limiting the cpu usage for a process or thread is to make sure that the thread or process uses less cpu.
That can best be done by improving the efficiency of the code, or by calling it less often.
The aim is to make sure that the process doesn't continually consume all of its available time slice.
Things to try:
Work out what is actually taking up all of the CPU. Optimize heavy processing areas - ideally with a change of algorithm.
Minimise polling wherever possible.
Try to rely on the operating system's ability to wake your process when necessary. eg. By waiting on files/sockets/fifos/mutexes/semaphores/message queue etc.
Have processes self regulate their processor usage. If your process is doing a lot of work in an endless loop insert a sched_yield() or sleep() after every N loops. If there are no other processes waiting for CPU usage then your process will get rescheduled almost immediately, but will allow the rest of the system to use cpu time when necessary.
Rearrange your processing to allow lower priority activities to be run when your process is at idle.
Carefully adjust thread or process priorities. But be aware, as #Mooing Duck has said, that by doing this you may just shift the CPU usage from one place to a different place without seeing an overall improvement.
How about issuing a sleep command at regular intervals?
Your question is broad -- I don't know what it's doing. You can certainly track the thread's I/O and force it to give up the cpu after a certain threshold is passed.
I ended up enumerating a list of threads, then having a 100ms timer that suspended the list of threads two out of every five iterations (which in theory reduces CPU usage by 40%).
Thanks for all the answers.
I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA
Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager
Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.
The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.
To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.
I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.
How to measure amount of time given by a mutex to the OS? The main goal is to detect a mutex, that blocks threads for largest amount of time.
PS: I tried oprofile. It reports 30% of time spent inside vmlinux/.poll_idle. This is unexpected, because the app is designed to take 100% of its core. Therefore, I'm suspecting, that the time is given back to the OS while waiting for some mutex, and oprofile reports it as the IDLE time.
Profile.
Whenever the question is "What really takes the [most|least] time?", the answer is always "Profile to find out.".
As suggested - profile, but decide before what do you want to measure - elapsed time (time threads were blocked), or user/kernel time (time it cost you to perform synchronization). In different scenarios you might want to measure one or another, or both.
You could profile your program, using say OProfile on linux. Then filter out your results to look at the time spent in pthread_mutex_lock() for each mutex, or your higher-level function that performs the locking. Since the program will block inside the lock function call until the mutex is aquired, profiling the time spent in that function should give you an idea of which mutexes are your most expensive.
start = GetTime();
Mutex.Lock();
stop = GetTime();
elapsedTime = stop - start;
elaspedTime is the amount of time it took to grab the mutex. If it is bigger than some small value, it's because another thread has the mutex. This won't show how long the OS has the mutex, only that another thread has it.