Measure time required to create thread

Measure time required to create thread - c++

How can I measure time required to create and launch thread?
(Linux, pthreads or boost::thread).
Thanks for advices!

You should probably specify what exactly you want to measure, since there are at least 2 possible interpretations (the time the original thread is "busy" inside pthread_create versus the time from calling pthread_create until the other thread actually executes its first instruction).
In either case, you can query monotonic real time using clock_gettime with CLOCK_MONOTONIC before and after the call to pthread_create or before the call and as the first thing inside the thread funciton. Then, subtract the second value from the first.
To know what time is spent inside phtread_create, CLOCK_THREAD_CPUTIME_ID is an alternative, as this only counts the actual time your thread used.
Alltogether, it's a bit pointless to measure this kind of thing, however. It tells you little to nothing at all about how it will behave under real conditions on your system or another system, with unknown processes and unknown scheduling strategies and priorities.
On another machine, or on another day, your thread might just be scheduled 100 or 200 milliseconds later. If you depend on the fact that this won't happen, you're dead.
EDIT:
In respect of the info added in above comment, if you need to "perform actions on non-regular basis" on a scale that is well within "normal scheduling quantums", you can just create a single thread and nanosleep for 15 or 30 milliseconds. Of course, sleep is not terribly accurate or reliable, so you might instead want to e.g. block on a timerfd (if portability is not topmost priority, otherwise use a signal-delivering timer).
It's no big problem to schedule irregular intervals with a single timer/wait either, you only need to keep track of when the next event is due. This is how modern operating systems do it too (read up on "timer coalescing").

Related

How to detect thread entry/spawn point in LLVM pass?

I am writing a profiler for which I want to know the total time since the thread was created as well as the total CPU time used by that thread for a certain calculation.
By total time I mean: the time including the time a thread spends being blocked for whatever reason, so total time elapsed since something like thread my_thread(function).
getrusage gives me the CPU time for the calling thread which is good, however, I want to know the total time that passed since the thread was created as well. I couldn't find any C++ library for it at all.
I can take a time stamp when the thread was created/spawned by instrumenting the program and inserting a simple time stamp method like the chrono functions and then another time stamp when I do the calculation, their difference is the time I want. However, even after some search I couldn't figure out how to detect thread entry/spawn point using an LLVM pass.
Any suggestions on how to detect thread entry/spawn point in LLVM pass?
Best regards!

I guess, you need to search for CallInsts which call to pthread_create, then analyze their arguments and find out what callback function they are passed in.
To make sure you catch all thread creation calls, you'd need to research how threads are created on your platform. At the lowest level, thread creation requires syscall (well, in most cases).
For instance, FreeBSD does have a pthread_create function, but it is purely userland and delegates thread creation to thr_new syscall. Some programs (Rust language runtime, IIRC) may call into that syscall directly, bypassing pthread_create, but these are pretty rare. So, if you really want to make sure you catch every thread creation, you'd need to search CallInsts to these low level ones.

How to find total time used by the thread in C/C++?

I am trying to get the total time a particular thread spent so far programatically.
getrusage returns a thread's CPU time but I want the total time i.e. including the time spent by the thread being blocked for whatever reason.
Please note that I will be making use of this functionality by instrumenting a given program using a profiler that I wrote.
A program may have many threads (I am focusing on profiling servers so there can be many). At any given time I would want to know how much time a particular thread spent (so far). So its not convenient to start a timer for every thread as they are spawned. So I would want something of usage similar to getrusage e.g. it returns the total time of the current thread or maybe I can pass to it a thread id. So manual mechanisms like taking a timestamp when the thread was spawned and one later then taking their difference won't be very helpful for me.
Can anyone suggest how to do this?
Thanks!

Save the current time at the point when the thread is started. The total time spent by the thread, counting both running and blocked time, is then just:
current_time - start_time
Of course this is almost always useless/meaningless, which is why there's no dedicated API for it.

Depending on what you want to use this for, one possibility to think about is to sum the number of clock ticks consumed during blocking, which typically is slow enough hide a little overhead like that. So from that sum and the surrounding thread interval you also measure, you can compute the real time load on your thread over that interval. Of course, time-slicing with other processes will throw this off some amount, and capturing all blocking may be very easy or very hard, depending on your situation.

Why does Sleep(500) cost more than 500ms?

I used Sleep(500) in my code and I used getTickCount() to test the timing. I found that it has a cost of about 515ms, more than 500. Does somebody know why that is?

Because Win32 API's Sleep isn't a high-precision sleep, and has a maximum granularity.
The best way to get a precision sleep is to sleep a bit less (~50 ms) and do a busy-wait. To find the exact amount of time you need to busywait, get the resolution of the system clock using timeGetDevCaps and multiply by 1.5 or 2 to be safe.

sleep(500) guarantees a sleep of at least 500ms.
But it might sleep for longer than that: the upper limit is not defined.
In your case, there will also be the extra overhead in calling getTickCount().
Your non-standard Sleep function may well behave in a different matter; but I doubt that exactness is guaranteed. To do that, you need special hardware.

As you can read in the documentation, the WinAPI function GetTickCount()
is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds.
To get a more accurate time measurement, use the function GetSystemDatePreciseAsFileTime
Also, you can not rely on Sleep(500) to sleep exactly 500 milliseconds. It will suspend the thread for at least 500 milliseconds. The operating system will then continue the thread as soon as it has a timeslot available. When there are many other tasks running on the operating system, there might be a delay.

In general sleeping means that your thread goes to a waiting state and after 500ms it will be in a "runnable" state. Then the OS scheduler chooses to run something according to the priority and number of runnable processes at that time. So if you do have high precision sleep and high precision clock then it is still a sleep for at least 500ms, not exactly 500ms.

Like the other answers have noted, Sleep() has limited accuracy. Actually, no implementation of a Sleep()-like function can be perfectly accurate, for several reasons:
It takes some time to actually call Sleep(). While an implementation aiming for maximal accuracy could attempt to measure and compensate for this overhead, few bother. (And, in any case, the overhead can vary due to many causes, including CPU and memory use.)
Even if the underlying timer used by Sleep() fires at exactly the desired time, there's no guarantee that your process will actually be rescheduled immediately after waking up. Your process might have been swapped out while it was sleeping, or other processes might be hogging the CPU.
It's possible that the OS cannot wake your process up at the requested time, e.g. because the computer is in suspend mode. In such a case, it's quite possible that your 500ms Sleep() call will actually end up taking several hours or days.
Also, even if Sleep() was perfectly accurate, the code you want to run after sleeping will inevitably consume some extra time.
Thus, to perform some action (e.g. redrawing the screen, or updating game logic) at regular intervals, the standard solution is to use a compensated Sleep() loop. That is, you maintain a regularly incrementing time counter indicating when the next action should occur, and compare this target time with the current system time to dynamically adjust your sleep time.
Some extra care needs to be taken to deal with unexpected large time jumps, e.g. if the computer was temporarily suspected or if the tick counter wrapped around, as well as the situation where processing the action ends up taking more time than is available before the next action, causing the loop to lag behind.
Here's a quick example implementation (in pseudocode) that should handle both of these issues:
int interval = 500, giveUpThreshold = 10*interval;
int nextTarget = GetTickCount();
bool active = doAction();
while (active) {
nextTarget += interval;
int delta = nextTarget - GetTickCount();
if (delta > giveUpThreshold || delta < -giveUpThreshold) {
// either we're hopelessly behind schedule, or something
// weird happened; either way, give up and reset the target
nextTarget = GetTickCount();
} else if (delta > 0) {
Sleep(delta);
}
active = doAction();
}
This will ensure that doAction() will be called on average once every interval milliseconds, at least as long as it doesn't consistently consume more time than that, and as long as no large time jumps occur. The exact time between successive calls may vary, but any such variation will be compensated for on the next interation.

Default timer resolution is low, you could increase time resolution if necessary. MSDN
#define TARGET_RESOLUTION 1 // 1-millisecond target resolution
TIMECAPS tc;
UINT wTimerRes;
if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) != TIMERR_NOERROR)
{
// Error; application can't continue.
}
wTimerRes = min(max(tc.wPeriodMin, TARGET_RESOLUTION), tc.wPeriodMax);
timeBeginPeriod(wTimerRes);

There are two general reasons why code might want a function like "sleep":
It has some task which can be performed at any time that is at least some distance in the future.
It has some task which should be performed as near as possible to some moment in time some distance in the future.
In a good system, there should be separate ways of issuing those kinds of requests; Windows makes the first easier than the second.
Suppose there is one CPU and three threads in the system, all doing useful
work until, one second before midnight, one of the threads says it won't have
anything useful to do for at least a second. At that point, the system will
devote execution to the remaining two threads. If, 1ms before midnight,
one of those threads decides it won't have anything useful to do for at least
a second, the system will switch control to the last remaining thread.
When midnight rolls around, the original first thread will become available to
run, but since the presently-executing thread will have only had the CPU for
a millisecond at that point, there's no particular reason the original first
thread should be considered more "worthy" of CPU time than the other thread
which just got control. Since switching threads isn't free, the OS may very
well decide that the thread that presently has the CPU should keep it until
it blocks on something or has used up a whole time slice.
It might be nice if there were a version of "sleep" which were easier to use
than multi-media timers but would request that the system give the thread a
temporary priority boost when it becomes eligible to run again, or better yet
a variation of "sleep" which would specify a minimum time and a "priority-
boost" time, for tasks which need to be performed within a certain time window. I don't know of any systems that can be easily made to work that way, though.

Switching between pthreads on a timer

I want to make two pthreads in a c++ program using a single processor. Thread1 will be endlessly running but needs to be interrupted every 5 microseconds to allow thread2 to do one iteration of a while loop before switching back to thread1. I know how to make the pthreads and whatnot, but I can't figure out how to make the switching between threads based on a timer occur. Is there a way to do this?

if you don't want the two threads to run simultaneously then you should not use two threads. The second thread can be used as a function to be called from the first thread each 5 micro seconds.

The problem with what you want is the scale of the time interval for the interrupts. Usual OS do not switch threads that often, the normal rate lays in milliseconds scale. To achieve 5 microseconds intervals you need either
Real-time OS with low-cost high-precision interrupts or write your own kernel module for the same purpose.
Proactively schedule the threads. I.e. more or less similar to what Samer suggested.
I'm not familiar with RtOS or drivers. And I doubt you might need this very specialized solution. As for the second approach..
pthreads
You can still do it with pthreads but make sure the thread1 calls OS at least every 5 microseconds to give the OS a chance to reschedule it to thread2. And you'd need to assign higher priority for thread2 or set different scheduling policy like Round-Robin instead of Fair Scheduling: see here.
However, this threads approach will suffer from significant overheads for context switches: assuming roughly 1300 ns per context switch, and that you need 2 switches per 5 microseconds, it leads to about 50% time of overheads!
coroutines
Another approach could be user-level threads (coroutines/fibers) scheduling. It has significantly less overheads and enables thread2 to be written as a loop which sleeps periodically instead of exiting its function completely. Anyway, it also requires thread1 to take care of having checkpoints with the rate sufficient to allow switches every 5 microseconds.
function
And finally, if the thread2 can be easily implemented as function having a state in order to handle the loop, it is the best choice from both overheads and complexity points of view.

Could someone explain this interesting behaviour with Sleep(1)?

I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA

Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager

Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.

The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.

To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.

I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js