High precision timed operations with multiprocess application on windows/c++ - c++

I have multiple processes(which are in different exe files generated by subprojects) created by my main program.
What I want to do is running each process for about 1-2 milliseconds within every 40-50 milliseconds major frame. When I use suspend/resume thread to suspend one process(by suspending all threads it have, but each have only one.) and resuming next, only one switch context(suspend old and resume new) lasts about 60 milliseconds. Which is longer even my major frame. By the way I know that using Sleep is not advised within this manner since the only sleep/wake operation lasts 15-30 ms and I dont use any.
If I change the priority of the running process to lower and next process to higher; is it guaranteed context switch to occur by windows within microseconds?
or what should I consider to achieve an only microsecond sensitive process switch?
And I wonder how long a simple Suspend/ResumeThread operation normally takes?
Currently I can't use threads insted of processes since I need the memory isolation of a process and my processes may spawn and terminate their own threads. Does Waithandlers like syncronization methods give me the high precised time?
Edit: The proposed sync objcets are in the resolution maximum to milliseconds (Like waitable timers, multimedia timers etc. all get parameter as ms and gives you ms). I need to use QueryPerformanceCounter and other ways to achieve high resolution as I mentioned.

As Remy says, you should be doing this with synchronisation objects - that's what they're for. Let's suppose that process A executes first and wants to 'hand over' to process B at some point. It can then do this:
SECURITY_ATTRIBUTES sa = { sizeof (SECURITY_ATTRIBUTES), NULL, TRUE };
HANDLE hHandOffToA = CreateEventW (&sa, TRUE, FALSE, L"HandOffToA");
HANDLE hHandOffToB = CreateEventW (&sa, TRUE, FALSE, L"HandOffToB");
// Start process B
CreateProcess (...);
while (!quit)
{
// Do work, and then:
SetEvent (hHandOffToB);
WaitForSingleObject (hHandOffToA, INFINITE);
}
CloseHandle (hHandOffToA);
CloseHandle (hHandOffToB);
And process B can then do:
HANDLE hHandOffToA = OpenEventW (EVENT_MODIFY_STATE, FALSE, L"HandoffToA");
HANDLE hHandOffToB = OpenEventW (SYNCHRONIZE, FALSE, L"HandoffToB");
while (!quit)
{
WaitForSingleObject (hHandOffToB, INFINITE);
// Do work, and then:
SetEvent (hHandOffToA);
}
CloseHandle (hHandOffToA);
CloseHandle (hHandOffToB);
You should, of course, include proper error checking and I've left it up to you to decide how process A should tell process B to shut down (I guess it could just kill it). Remember also that event names are system-wide so choose them more carefully than I have done.

For very high precision one can use the funciton below:
void get_clock(LONGLONG* SYSTEM_TIME)
{
static REAL64 multiplier = 1.0;
static BOOL alreadyCalculated = FALSE;
if (alreadyCalculated == FALSE)
{
LARGE_INTEGER frequency;
BOOL result = QueryPerformanceFrequency(&frequency);
if (result == TRUE)
{
multiplier = 1000000000.0 / frequency.QuadPart;
}
else
{
DWORD error = GetLastError();
}
alreadyCalculated = TRUE;
}
LARGE_INTEGER time;
QueryPerformanceCounter(&time);
*SYSTEM_TIME = static_cast<SYSTEM_TIME_TYPE>(time.QuadPart * multiplier);
}
In my case sync objects didn't fit very well(however I have used them where time is not critical), instead I have redesigned my logic to put place holders where my thread need to take action and calculated the time using function above.
But still not sure if higher priority task arrives how long does it take windows to take it into cpu and preempt running one.

Related

Light event in WinAPI / C++

Is there some light (thus fast) event in WinAPI / C++ ? Particularly, I'm interested in minimizing the time spent on waiting for the event (like WaitForSingleObject()) when the event is set. Here is a code example to clarify further what I mean:
#include <Windows.h>
#include <chrono>
#include <stdio.h>
int main()
{
const int64_t nIterations = 10 * 1000 * 1000;
HANDLE hEvent = CreateEvent(nullptr, true, true, nullptr);
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
WaitForSingleObject(hEvent, INFINITE);
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("%.3lf Ops/sec\n", nIterations / nSec);
return 0;
}
On 3.85GHz Ryzen 1800X I'm getting 7209623.405 operations per second, meaning 534 CPU clocks (or 138.7 nanoseconds) are spent on average for a check whether the event is set.
However, I want to use the event in performance-critical code where most of the time the event is actually set, so it's just a check for a special case and in that case the control flow goes to code which is not performance-critical (because this situation is seldom).
WinAPI events which I know (created with CreateEvent) are heavy-weight because of security attributes and names. They are intended for inter-process communication. Perhaps WaitForSingleObject() is so slow because it switches from user to kernel mode and back, even when the event is set. Furthermore, this function has to behave differently for manual- and auto-reset events, and a check for the type of the event takes time too.
I know that a fast user-mode mutex (spin lock) can be implemented with atomic_flag . Its spinning loop can be extended with a std::this_thread::yield() in order to let other threads run while spinning.
With the event I wouldn't like a complete equivalent of a spin-lock, because when the event is not set, it may take substantial time till it becomes set again. If every thread that needs the event set start spinning till it's set again, that would be an epic waste of CPU electricity (though shouldn't affect system performance if they call std::this_thread::yield)
So I would rather like an analogy of a critical section, which usually just does the work in user mode and when it realizes it needs to wait (out of spins), it switches to kernel mode and waits on a heavy synchronization object like a mutex.
UPDATE1: I've found that .NET has ManualResetEventSlim , but couldn't find an equivalent in WinAPI / C++.
UPDATE2: because there were details of event usage requested, here they are. I'm implementing a knowledge base that can be switched between regular and maintenance mode. Some operations are maintenance-only, some operations are regular-only, some can work in both modes, but of them some are faster in maintenance and some are faster in regular mode. Upon its start each operation needs to know whether it is in maintenance or regular mode, as the logic changes (or the operation refuses to execute at all). From time to time user can request a switch between maintenance and regular mode. This is rare. When this request arrives, no new operations in the old mode can start (a request to do so fails) and the app waits for the current operations in the old mode to finish, then it switches mode. So light event is a part of this data structure: the operations except mode switching have to be fast, so they need to set/reset/wait event quickly.
begin from win8 the best solution for you use WaitOnAddress (in place WaitForSingleObject, WakeByAddressAll (work like SetEvent for NotificationEvent) and WakeByAddressSingle (work like SynchronizationEvent ). more read - WaitOnAddress lets you create a synchronization object
implementation can be next:
class LightEvent
{
BOOLEAN _Signaled;
public:
LightEvent(BOOLEAN Signaled)
{
_Signaled = Signaled;
}
void Reset()
{
_Signaled = FALSE;
}
void Set(BOOLEAN bWakeAll)
{
_Signaled = TRUE;
(bWakeAll ? WakeByAddressAll : WakeByAddressSingle)(&_Signaled);
}
BOOL Wait(DWORD dwMilliseconds = INFINITE)
{
BOOLEAN Signaled = FALSE;
while (!_Signaled)
{
if (!WaitOnAddress(&_Signaled, &Signaled, sizeof(BOOLEAN), dwMilliseconds))
{
return FALSE;
}
}
return TRUE;
}
};
don't forget add Synchronization.lib for linker input.
code for this new api very effective, they not create internal kernel objects for wait (like event) but use new api ZwAlertThreadByThreadId ZwWaitForAlertByThreadId special design for this targets.
how implement this yourself, before win8 ? for first look trivial - boolen varitable + event handle. and must look like:
void Set()
{
SetEvent(_hEvent);
// Sleep(1000); // simulate thread innterupted here
_Signaled = true;
}
void Reset()
{
_Signaled = false;
// Sleep(1000); // simulate thread innterupted here
ResetEvent(_hEvent);
}
void Wait(DWORD dwMilliseconds = INFINITE)
{
if(!_Signaled) WaitForSingleObject(_hEvent);
}
but this code really incorrect. problem that we do 2 operation in Set (Reset) - change state of _Signaled and _hEvent. and no way do this from user mode as atomic/interlocked operation. this mean that thread can be interrupted between this two operation. assume that 2 different threads in concurrent call Set and Reset. in most case operation will be executed in next order for example:
SetEvent(_hEvent);
_Signaled = true;
_Signaled = false;
ResetEvent(_hEvent);
here all ok. but possible and next order (uncomment one Sleep for test this)
SetEvent(_hEvent);
_Signaled = false;
ResetEvent(_hEvent);
_Signaled = true;
as result _hEvent will be in reset state, when _Signaled is true.
implement this as atomic yourself, without os support will be not simply, however possible. but i be first look for usage of this - for what ? are event like behavior this is exactly you need for task ?
The other answer is very good if you can drop support of Windows 7.
However on Win7, if you set/reset the event many times from multiple threads, but only need to sleep rarely, the proposed method is quite slow.
Instead, I use a boolean guarded by a critical section, with condition variable to wake / sleep.
The wait method will go to the kernel for sleep on SleepConditionVariableCS API, that’s expected and what you want.
However set & reset methods will work entirely in user mode: setting a single boolean variable is very fast, i.e. in 99% of cases, the critical section will do it’s user-mode lock free magic.

does while loop always take full CPU usage?

I need create a server side game loop, the problem is how to limit the loop cpu usage.
In my experience of programming, a busy loop always take maximal CPU usage it could. But I am reading the code of SDL(Simple DirectMedia Layer), it has a function SDL_Delay(UINT32 ms), and it has a while loop, does it take max cpu usage, if not, why?
https://github.com/eddieringle/SDL/blob/master/src/timer/unix/SDL_systimer.c#L137-158
do {
errno = 0;
#if HAVE_NANOSLEEP
tv.tv_sec = elapsed.tv_sec;
tv.tv_nsec = elapsed.tv_nsec;
was_error = nanosleep(&tv, &elapsed);
#else
/* Calculate the time interval left (in case of interrupt) */
now = SDL_GetTicks();
elapsed = (now - then);
then = now;
if (elapsed >= ms) {
break;
}
ms -= elapsed;
tv.tv_sec = ms / 1000;
tv.tv_usec = (ms % 1000) * 1000;
was_error = select(0, NULL, NULL, NULL, &tv);
#endif /* HAVE_NANOSLEEP */
} while (was_error && (errno == EINTR));
This code uses select for a timeout. select usually takes a file descriptor, and makes the caller wait until an IO event occurs on the fd. It also takes a timeout argument for the maximum time to wait. Here the fd is 0, so no events will occur, and the function will always return when the timeout is reached.
The select(3) that you get from the C library is a wrapper around the select(2) system call, which means calling select(3) eventually gets you in the kernel. The kernel then doesn't schedule the process unless an IO event occurs, or the timeout is reached. So the process is not using the CPU while waiting.
Obviously, the jump into the kernel and process scheduling introduce delays. So if you must have very low latency (nanoseconds) you should use busy waiting.
That loop won't take up all CPU. It utilizes one of two different functions to tell the operating system to pause the thread for a given amount of time and letting another thread utilize the CPU:
// First function call - if HAVE_NANOSLEEP is defined.
was_error = nanosleep(&tv, &elapsed);
// Second function call - fallback without nanosleep.
was_error = select(0, NULL, NULL, NULL, &tv);
While the thread is blocked in SDL_Delay, it yields the CPU to other tasks. If the delay is long enough, the operating system will even put the CPU in an idle or halt mode if there is no other work to do. Note that this won't work well if the delay time isn't at least 20 milliseconds or so.
However, this is usually not the right way to do whatever it is you are trying to do. What is your outer problem? Why doesn't your game loop ever finish doing whatever needs to be done at this time and so then need to wait for something to happen so that it has more work to do? How can it always have an infinite amount of work to do immediately?

Scheduler using Timer Queues

I am working on an application where i need to schedule tasks based on the time set by the user. The user may add/modify/delete the schedules. To implement it i am considering using Timer Queues. Initially i though of using WaitableTimers which suite very much for my purpose but i cant make my thread to sleep for competing the APC.
Now with the Timer Queue i am not sure how to set the timer to signal based on Systemtime. I tried the following code but the callback function is never called
SYSTEMTIME st, lt;
GetSystemTime(&st);
FILETIME ft;
SystemTimeToFileTime(&st, &ft);
ULONGLONG qwResult;
// Copy the time into a quadword.
qwResult = (((ULONGLONG) ft.dwHighDateTime) << 32) + ft.dwLowDateTime;
// Add 20 seconds days.
qwResult += 20 * _SECOND;
HANDLE hTimerQueue = CreateTimerQueue();
HANDLE hTimer;
// Set a timer to call the timer routine in 10 seconds.
if (!CreateTimerQueueTimer( &hTimer, hTimerQueue ,(WAITORTIMERCALLBACK)TimerAPCProc, NULL , qwResult, 0, 0))
{
printf("CreateTimerQueueTimer failed (%d)\n", GetLastError());
return 3;
}
The callback routine will be called in qwResult milliseconds, and file time gives you the time in 100 nanoseconds. You do the math. GetSystemTimeAsFileTime Will give you FILETIME right away if that is the path you want to go.
Personally, I would keep a list of structure with times when the routines should be called and pointers to routines and iterate through the list once in a while and if the time of execution is due I would just call the function (or create a thread). That way your users can always review the scheduled tasks and change them.
It needs to be backed by WaitForSingleObject, or entering the thread into waitable state (using SleepEx for example).
You're passing in an absolute time, but the docs say you need to pass in the number of milliseconds from the current time.
If you want the timer to go off in 20 seconds, pass 20000 instead of qwResult

What is the cleanest way to create a timeout for a while loop?

Windows API/C/C++
1. ....
2. ....
3. ....
4. while (flag1 != flag2)
5. {
6. SleepEx(100,FALSE);
//waiting for flags to be equal (flags are set from another thread).
7. }
8. .....
9. .....
If the flags don't equal each other after 7 seconds, I would like to continue to line 8.
Any help is appreciated. Thanks.
If you are waiting for a particular flag to be set or a time to be reached, a much cleaner solution may be to use an auto / manual reset event. These are designed for signalling conditions between threads and have very rich APIs designed on top of them. For instance you could use the WaitForMultipleObjects API which takes an explicit timeout value.
Do not poll for the flags to change. Even with a sleep or yield during the loop, this just wastes CPU cycles.
Instead, get the thread which sets the flags to signal you that they've been changed, probably using an event. Your wait on the event takes a timeout, which you can tweak to allow waiting of 7 seconds total.
For example:
Thread1:
flag1 = foo;
SetEvent(hEvent);
Thread2:
DWORD timeOutTotal = 7000; // 7 second timeout to start.
while (flag1 != flag2 && timeOutTotal > 0)
{
// Wait for flags to change
DWORD start = GetTickCount();
WaitForSingleObject(hEvent, timeOutTotal);
DWORD end = GetTickCount();
// Don't let timeOutTotal accidently dip below 0.
if ((end - start) > timeOutTotal)
{
timeOutTotal = 0;
}
else
{
timeOutTotal -= (end - start);
}
}
You can use QueryPerformanceCounter from WinAPI. Check it before while starts, and query if the amount of time has passed. However, this is a high resolution timer. For a lower resolution use GetTickCount (milliseconds).
All depends whether you are actively waiting (doing something) or passively waiting for an external process. If the latter, then the following code using Sleep will be a lot easier:
int count = 0;
while ( flag1 != flag2 && count < 700 )
{
Sleep( 10 ); // wait 10ms
++count;
}
If you don't use Sleep (or Yield) and your app is constantly checking on a condition, then you'll bloat the CPU the app is running on.
If you use WinAPI extensively, you should try out a more native solution, read about WinAPI's Synchronization Functions.
You failed to mention what will happen if the flags are equal.
Also, if you just test them with no memory barriers then you cannot guarantee to see any writes made by the other thread.
Your best bet is to use an Event, and use the WaitForSingleObject function with a 7000 millisecond time out.
Make sure you do a sleep() or yield() in there or you will eat up all the entire CPU (or core) waiting.
If your application does some networking stuff, have a look at the POSIX select() call, especially the timeout functionality!
I would say "check the time and if nothing has happened in seven seconds later, then break the loop.

When and why sleep() is needed?

cout<<"abcd";
sleep(100);
cout<<'\b';
If I want to print the string out and then get back one character ,
why a sleep() is needed here?
But when using printf in C ,it seems that it is not necessary, why?
char* a = "12345";
char* b = "67890";
threadA(){cout<<a;}
threadB(){cout<<b;}
beginthread (threadA);
sleep(100);
beginthread (threadB);
In the second pseudo code above ,is it right to use sleep()?
For calculating tomorrow date:
void get_tomorrow_date( struct timeval *date )
{
sleep( 86400 ); // 60 * 60 * 24
gettimeofday( date, 0 );
}
;)
There are two subtle issues that you need to understand:
Multi-threading
I/O and Buffering
I'll try to give you some idea:
Multi-threading and sleep
Having a sleep in a threaded environment makes sense. The sleep call makes you wait thereby giving the initial thread some scope to have completed its processing i.e. writing out the string abcd to the standard output before the other thread inserts the backspace character. If you didn't wait for the first thread to complete its processing, you'd have written the backspace character first, and then the string abcd and wouldn't notice any difference.
Buffered I/o
I/O typically happens in buffered, non-buffered and semi-buffered states. This can influence how long, if at all, you have to wait for the output to appear on the console.
Your implementation of cout is probably using a buffered model. Try adding a newline or the endl at the end of your cout statements to print a new line and have it flush immediately, or use cout << "abcd" << flush; to flush without printing a new line.
In the second case without the sleep there's a slim chance that the second thread could start working before the first, resulting in the output "6789012345".
However a "sleep" isn't really the way to handle synchronisation between threads. You'd normally use a semaphore or similar in threadA() which threadB() has to wait for before doing its work.
The reason that the call to sleep makes your code work is because you are using it to turn the potentially parallel execution of the two output stream actions into a single, sequential action. The call to sleep() will allow the scheduler to switch away from the main thread of execution and execute thread A.
If you don't put sleep() in, the order of thread execution is not guaranteed and thread B could well start executing/printing before thread A had a chance to do that.
I think you need to understand what sleep does in general, and understand why it might exist.
sleep does what it sounds like. It instructs the OS to put the requesting task (where a task is a thread of execution) to sleep by removing it from the list of currently running processes and putting it on some sort of wait queue.
Note that there are also times when the OS will put you to sleep whether you like it or not. An example would be any form of blocking I/O, like reading a file from disk. The OS does this so that other tasks may get the CPU's attention while you're off waiting for your data.
One would use sleep voluntarily for similar purposes that the OS would. For example, if you have multiple threads and they're waiting on the completion of some computation, you'll probably want to voluntarily relinquish the CPU so that the computation can complete. You may also voluntarily relinquish the CPU so that other threads have a chance to run. For example, if you have a tight loop that's highly CPU-bound, you'll want to sleep now and then to give other threads a chance to run.
What it looks like you're doing is sleeping for the sake of something being flushed to stdout so that some other thread won't write to stdout before you. This, however, isn't guaranteed to work. It might work incidentally, but it's certainly not what you'd want to do by design. You'd either want to explicitly flush your buffer and not sleep at all, or use some form of synchronization.
As for why printf doesn't exhibit those issues... well, it's a crapshoot. Both printf and cout use some form of buffered output, but the implementation of each may be different.
In summary, it's probably best to remember the following:
When you want to synchronize, use synchronization primitives.
When you want to give someone else a chance to run, sleep.
The OS is better at deciding whether an I/O operation is blocking or not.
if you're having problems seeing the "abcd" being printed, it's because you're not giving cout an endline character to flush the buffer.
if you put
cout << "abcd" << endl;
you would be able to see the characters, then it would beep. no sleep necessary.
while( true )
{
msgStack.Lock();
process( msgStack.pop_msg());
msgStack.Unlock();
sleep(0);
}
sleep in the first example is just to print message a little before you will see "backspace" action. In the second example sleep "can" help. But it is weird. You won't be able to synchronize console outs with sleep in some more complex case.
In the code that launches two threads:
beginthread (threadA);
sleep(100);
beginthread (threadB);
the sleep() waits for 100 ms and then continues. The programmer probably did this in order to give threadA a chance to start up before launching threadB. If you must wait for threadA to be initialized and running before starting threadB, then you need a mechanism that waits for threadA to start, but this is the wrong way to do it.
100 is a magic cookie, chosen arbitrarily, probably accompanying a thought like "it should never take threadA more than 100 ms to start up." Assumptions like this are faulty because you have no way of knowing how long it will take for threadA to start. If the machine is busy or if the implementation of threadA changes it could easily take longer than 100 ms for the thread to launch, run its startup code, and get to it's main loop (if it is that kind of thread).
Instead of sleeping for some arbitrary amount of time, threadA needs to tell the main thread when it is up & running. One common way of doing this is by signaling an event.
Sample code that illustrates how to do this:
#include "stdafx.h"
#include <windows.h>
#include <process.h>
struct ThreadParam
{
HANDLE running_;
HANDLE die_;
};
DWORD WINAPI threadA(void* pv)
{
ThreadParam* param = reinterpret_cast<ThreadParam*>(pv);
if( !param )
return 1;
// do some initialization
// : :
SetEvent(param->running_);
WaitForSingleObject(param->die_, INFINITE);
return 0;
}
DWORD WINAPI threadB(void* pv)
{
ThreadParam* param = reinterpret_cast<ThreadParam*>(pv);
if( !param )
return 1;
// do some initialization
// : :
SetEvent(param->running_);
WaitForSingleObject(param->die_, INFINITE);
return 0;
}
int main(int argc, char** argv)
{
ThreadParam
paramA = {CreateEvent(0, 1, 0, 0), CreateEvent(0, 1, 0, 0) },
paramB = {CreateEvent(0, 1, 0, 0), CreateEvent(0, 1, 0, 0) };
DWORD idA = 0, idB = 0;
// start thread A, wait for it to initialize
HANDLE a = CreateThread(0, 0, threadA, (void*)&paramA, 0, &idA);
WaitForSingleObject(paramA.running_, INFINITE);
// start thread B, wait for it to initi
HANDLE b = CreateThread(0, 0, threadB, (void*)&paramB, 0, &idB);
WaitForSingleObject(paramB.running_, INFINITE);
// tell both threads to die
SetEvent(paramA.die_);
SetEvent(paramB.die_);
CloseHandle(a);
CloseHandle(b);
return 0;
}
It's not needed - what output do you get if you omit it?
The only thing sleep does is pauses execution on the calling thread for the specified number of milliseconds. It in no way will affect the outcome of any printing you might do.
Sleep can be used to avoid a certain thread/process (yeah, i know they are different things) hogging the processor.
On the other hand, printf is thread safe. Cout is not. That may explain differences in their behaviour.