I am writing a profiler for which I want to know the total time since the thread was created as well as the total CPU time used by that thread for a certain calculation.
By total time I mean: the time including the time a thread spends being blocked for whatever reason, so total time elapsed since something like thread my_thread(function).
getrusage gives me the CPU time for the calling thread which is good, however, I want to know the total time that passed since the thread was created as well. I couldn't find any C++ library for it at all.
I can take a time stamp when the thread was created/spawned by instrumenting the program and inserting a simple time stamp method like the chrono functions and then another time stamp when I do the calculation, their difference is the time I want. However, even after some search I couldn't figure out how to detect thread entry/spawn point using an LLVM pass.
Any suggestions on how to detect thread entry/spawn point in LLVM pass?
Best regards!
I guess, you need to search for CallInsts which call to pthread_create, then analyze their arguments and find out what callback function they are passed in.
To make sure you catch all thread creation calls, you'd need to research how threads are created on your platform. At the lowest level, thread creation requires syscall (well, in most cases).
For instance, FreeBSD does have a pthread_create function, but it is purely userland and delegates thread creation to thr_new syscall. Some programs (Rust language runtime, IIRC) may call into that syscall directly, bypassing pthread_create, but these are pretty rare. So, if you really want to make sure you catch every thread creation, you'd need to search CallInsts to these low level ones.
Related
I want to test my idea, wherein I execute some code in the context of another process at some interval. What API call or kernel functionality or l technique should I look into to execute code in another process at some interval?
Seems like I need to halt the process and modify the instruction pointer value before continuing it, if that’s remotely possible. Alternatively, I could hook into the kernel code which schedules time on the CPU for each process, and run the code each time the next time slot happens for a process. But PatchGuard probably prevents that.
This time interval doesn’t need to be precise.
The wording of the question tells me you're fairly new to programming. A remote process doesn't have AN instruction pointer, it typically has many - one per executing thread. That's why the normal approach would be to not mess with any of those instruction pointers. Instead, you create a new thread in the remote process CreateRemoteThreadEx.
Since this thread is under your control, it can just run an infinite loop alternating between Sleep and the function you want to call.
I am trying to get the total time a particular thread spent so far programatically.
getrusage returns a thread's CPU time but I want the total time i.e. including the time spent by the thread being blocked for whatever reason.
Please note that I will be making use of this functionality by instrumenting a given program using a profiler that I wrote.
A program may have many threads (I am focusing on profiling servers so there can be many). At any given time I would want to know how much time a particular thread spent (so far). So its not convenient to start a timer for every thread as they are spawned. So I would want something of usage similar to getrusage e.g. it returns the total time of the current thread or maybe I can pass to it a thread id. So manual mechanisms like taking a timestamp when the thread was spawned and one later then taking their difference won't be very helpful for me.
Can anyone suggest how to do this?
Thanks!
Save the current time at the point when the thread is started. The total time spent by the thread, counting both running and blocked time, is then just:
current_time - start_time
Of course this is almost always useless/meaningless, which is why there's no dedicated API for it.
Depending on what you want to use this for, one possibility to think about is to sum the number of clock ticks consumed during blocking, which typically is slow enough hide a little overhead like that. So from that sum and the surrounding thread interval you also measure, you can compute the real time load on your thread over that interval. Of course, time-slicing with other processes will throw this off some amount, and capturing all blocking may be very easy or very hard, depending on your situation.
So I am writing a profiler and the time the threads were not executing is relevant to my calculation. However, all the libraries I know provide the time for which the threads were executing e.g getrusage.
getrusage returns the time the threads were executing, so if I can somehow get the time elapsed since the thread was made (or the time the thread was made at), in principle, I can subtract getrusage's time from it and that will be the blocking time?
I instrument the code to know my metrics through an LLVM pass.
While profiling my code to find what is going slow, I have 3functions that are taking forever apparently, well thats what very sleepy says.
These functions are:
ZwDelayExecution 20.460813 20.460813 19.987685 19.987685
MsgWaitForMultipleObjects 20.460813 20.460813 19.987685 19.987685
WaitForSingleObject 20.361805 20.361805 19.890967 19.890967
Can anybody tell me what these functions are? Why they are taking so long, and how to fix them.
Thanks
Probably that functions are used to make thread 'sleeping' in Win32 API. Also they might be used as thread synchronization so check these thing.
They are taking so much CPU time because they are designed for that.
The WaitForSingleObject function can wait for the following objects:
Change notification
Console input
Event
Memory resource notification
Mutex
Process
Semaphore
Thread
Waitable timer
So the other possible thing where it can be used for is console user input waiting.
ZwDelayExecution is an internal function of Windows. As it can be seen it is used to realize Sleep function. Here is call stack for Sleep function so you can see it with your own eyes:
0 ntdll.dll ZwDelayExecution
1 kernel32.dll SleepEx
2 kernel32.dll Sleep
It probaly uses Assembly low-level features to realize that so it can delay thread with precision of 100ns.
MsgWaitForMultipleObjects has a similar to WaitForSingleObject goal.
Judging on the names, all 3 functions seem to block, so they take a long time because they are designed to do so, but they shouldn't use any CPU while waiting.
One of the first steps should always be to check the documentation:
WaitForSingleObject:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms687032.aspx
Waits for an object like a thread, process, mutex.
MsgWaitForMultipleObjects:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684242.aspx
Simply waits for multiple objects, just like WaitForSingleObject.
ZwDelayExecution:
There doesn't seem to be a documentation for ZwDelayExecution but I think that is an internal method which get's called when you call Sleep.
Anyway, the name already reveals part of it. "Wait" and "Delay"-functions are supposed to take time. If you want to reduce the waiting time you have to find out what is calling these functions.
To give you an example:
If you start a new thread and then wait for it to finish in your main thread, you will call WaitForSingleObject one way or another in WINAPI-programming. It doesn't even have to be you who is starting the thread - it could be the runtime itself. The function will wait until the thread finishes. Therefore it will take time and block the program in WaitForSingleObject until thread is done or a timeout occurs. This is nothing bad, this is intended behaviour.
Before you start zooming in on these functions, you might first want to determine what kind of slowness your program is suffering from. It is pretty normal for a Windows program to have one or more threads spending most of their time in blocking functions.
You would first need to determine whether your actual critical thread is CPU bound. In that case you don't want to zoom in on the functions that take a lot off wall clock time, you want to find those functions that take CPU time.
I don't have much experience with Very Sleepy, but IIRC it is a sampling profiler, and those are typically not so good at measuring CPU usage.
Only after you've determined that your program is not CPU bound, then you should zoom in on the functions that wait a lot.
How can I measure time required to create and launch thread?
(Linux, pthreads or boost::thread).
Thanks for advices!
You should probably specify what exactly you want to measure, since there are at least 2 possible interpretations (the time the original thread is "busy" inside pthread_create versus the time from calling pthread_create until the other thread actually executes its first instruction).
In either case, you can query monotonic real time using clock_gettime with CLOCK_MONOTONIC before and after the call to pthread_create or before the call and as the first thing inside the thread funciton. Then, subtract the second value from the first.
To know what time is spent inside phtread_create, CLOCK_THREAD_CPUTIME_ID is an alternative, as this only counts the actual time your thread used.
Alltogether, it's a bit pointless to measure this kind of thing, however. It tells you little to nothing at all about how it will behave under real conditions on your system or another system, with unknown processes and unknown scheduling strategies and priorities.
On another machine, or on another day, your thread might just be scheduled 100 or 200 milliseconds later. If you depend on the fact that this won't happen, you're dead.
EDIT:
In respect of the info added in above comment, if you need to "perform actions on non-regular basis" on a scale that is well within "normal scheduling quantums", you can just create a single thread and nanosleep for 15 or 30 milliseconds. Of course, sleep is not terribly accurate or reliable, so you might instead want to e.g. block on a timerfd (if portability is not topmost priority, otherwise use a signal-delivering timer).
It's no big problem to schedule irregular intervals with a single timer/wait either, you only need to keep track of when the next event is due. This is how modern operating systems do it too (read up on "timer coalescing").