does while loop always take full CPU usage? - c++

I need create a server side game loop, the problem is how to limit the loop cpu usage.
In my experience of programming, a busy loop always take maximal CPU usage it could. But I am reading the code of SDL(Simple DirectMedia Layer), it has a function SDL_Delay(UINT32 ms), and it has a while loop, does it take max cpu usage, if not, why?
https://github.com/eddieringle/SDL/blob/master/src/timer/unix/SDL_systimer.c#L137-158
do {
errno = 0;
#if HAVE_NANOSLEEP
tv.tv_sec = elapsed.tv_sec;
tv.tv_nsec = elapsed.tv_nsec;
was_error = nanosleep(&tv, &elapsed);
#else
/* Calculate the time interval left (in case of interrupt) */
now = SDL_GetTicks();
elapsed = (now - then);
then = now;
if (elapsed >= ms) {
break;
}
ms -= elapsed;
tv.tv_sec = ms / 1000;
tv.tv_usec = (ms % 1000) * 1000;
was_error = select(0, NULL, NULL, NULL, &tv);
#endif /* HAVE_NANOSLEEP */
} while (was_error && (errno == EINTR));

This code uses select for a timeout. select usually takes a file descriptor, and makes the caller wait until an IO event occurs on the fd. It also takes a timeout argument for the maximum time to wait. Here the fd is 0, so no events will occur, and the function will always return when the timeout is reached.
The select(3) that you get from the C library is a wrapper around the select(2) system call, which means calling select(3) eventually gets you in the kernel. The kernel then doesn't schedule the process unless an IO event occurs, or the timeout is reached. So the process is not using the CPU while waiting.
Obviously, the jump into the kernel and process scheduling introduce delays. So if you must have very low latency (nanoseconds) you should use busy waiting.

That loop won't take up all CPU. It utilizes one of two different functions to tell the operating system to pause the thread for a given amount of time and letting another thread utilize the CPU:
// First function call - if HAVE_NANOSLEEP is defined.
was_error = nanosleep(&tv, &elapsed);
// Second function call - fallback without nanosleep.
was_error = select(0, NULL, NULL, NULL, &tv);

While the thread is blocked in SDL_Delay, it yields the CPU to other tasks. If the delay is long enough, the operating system will even put the CPU in an idle or halt mode if there is no other work to do. Note that this won't work well if the delay time isn't at least 20 milliseconds or so.
However, this is usually not the right way to do whatever it is you are trying to do. What is your outer problem? Why doesn't your game loop ever finish doing whatever needs to be done at this time and so then need to wait for something to happen so that it has more work to do? How can it always have an infinite amount of work to do immediately?

Related

Posix Timer on SCHED_RR Thread is using 100% CPU

I have the following code snippet:
#include <iostream>
#include <thread>
#include <unistd.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>
int main() {
std::thread rr_thread([](){
struct sched_param params = {5};
pthread_setschedparam(pthread_self(), SCHED_RR, &params);
struct itimerspec ts;
struct epoll_event ev;
int tfd ,epfd;
ts.it_interval.tv_sec = 0;
ts.it_interval.tv_nsec = 0;
ts.it_value.tv_sec = 0;
ts.it_value.tv_nsec = 20000; // 50 kHz timer
tfd = timerfd_create(CLOCK_MONOTONIC, 0);
timerfd_settime(tfd, 0, &ts, NULL);
epfd = epoll_create(1);
ev.events = EPOLLIN;
epoll_ctl(epfd, EPOLL_CTL_ADD, tfd, &ev);
while (true) {
epoll_wait(epfd, &ev, 1, -1); // wait forever for the timer
read(tfd, &missed, sizeof(missed));
// Here i have a blocking function (dummy in this example) which
// takes on average 15ns to execute, less than the timer period anyways
func15ns();
}
});
rr_thread.join();
}
I have a posix thread using the SCHED_RR policy and on this thread there is a POSIX Timer running with a timeout of 20000ns = 50KHz = 50000 ticks/sec.
After the timer fires i am executing a function that takes roughly 15ns so less than the timer period, but this doesn't really matter.
When i execute this i am getting 100% CPU Usage, the whole system is getting slow but i don't understand why this is happening and some things are confusing.
Why 100% CPU Usage since the thread is supposed to be sleeping while waiting for the timer to fire, so other tasks can be scheduled in theory right? even if this is a high priority thread.
I checked using pidstat the number of context switches and it seems that it's very small, close to 0, both voluntary and involuntary ones. Is this normal? While waiting for the timer to fire the scheduler should schedule other tasks right? I should see at least 20000 * 2 context switches / sec
As presented, your program does not behave as you describe. This is because you program the timer as a one-shot, not a repeating timer. For a timer that fires every 20000 ns, you want to set a 20000-ns interval:
ts.it_interval.tv_nsec = 20000;
Having modified that, I get a program that works produces heavy load on one core.
Why 100% CPU Usage since the thread is supposed to be sleeping while waiting for the timer to fire, so other tasks can be scheduled
in theory right? even if this is a high priority thread.
Sure, your thread blocks in epoll_wait() to await timer ticks, if in fact it manages to loop back there before the timer ticks again. On my machine, your program consumes only about 30% of one core, which seems to confirm that such blocking will indeed happen. That you see 100% CPU use suggests that my computer runs the program more efficiently than yours does, for whatever reason.
But you have to appreciate that the load is very heavy. You are asking to perform all the processing of the timer itself, the epoll call, the read, and func15ns() once every 20000 ns. Yes, whatever time may be left, if any, is available to be scheduled for another task, but the task swap takes a bit more time again. 20000 ns is not very much time. Consider that just fetching a word from main memory costs about 100 ns (though reading one from cache is of course faster).
In particular, do not neglect the work other than func15ns(). If the latter indeed takes only 15 ns to run then it's the least of your worries. You're performing two system calls, and these are expensive. Just how expensive depends on a lot of factors, but consider that removing the epoll_wait() call reduces the load for me from 30% to 25% of a core (and note that the whole epoll setup is superfluous here because simply allowing the read() to block serves the purpose).
I checked using pidstat the number of context switches and it seems that it's very small, close to 0, both voluntary and involuntary
ones. Is this normal? While waiting for the timer to fire the
scheduler should schedule other tasks right? I should see at least
20000 * 2 context switches / sec
You're occupying a full CPU with a high priority task, so why do you expect switching?
On the other hand, I'm also observing a low number of context switches for the process running your (modified) program, even though it's occupying only 25% of a core. I'm not prepared at the moment to reason about why that is.

High precision timed operations with multiprocess application on windows/c++

I have multiple processes(which are in different exe files generated by subprojects) created by my main program.
What I want to do is running each process for about 1-2 milliseconds within every 40-50 milliseconds major frame. When I use suspend/resume thread to suspend one process(by suspending all threads it have, but each have only one.) and resuming next, only one switch context(suspend old and resume new) lasts about 60 milliseconds. Which is longer even my major frame. By the way I know that using Sleep is not advised within this manner since the only sleep/wake operation lasts 15-30 ms and I dont use any.
If I change the priority of the running process to lower and next process to higher; is it guaranteed context switch to occur by windows within microseconds?
or what should I consider to achieve an only microsecond sensitive process switch?
And I wonder how long a simple Suspend/ResumeThread operation normally takes?
Currently I can't use threads insted of processes since I need the memory isolation of a process and my processes may spawn and terminate their own threads. Does Waithandlers like syncronization methods give me the high precised time?
Edit: The proposed sync objcets are in the resolution maximum to milliseconds (Like waitable timers, multimedia timers etc. all get parameter as ms and gives you ms). I need to use QueryPerformanceCounter and other ways to achieve high resolution as I mentioned.
As Remy says, you should be doing this with synchronisation objects - that's what they're for. Let's suppose that process A executes first and wants to 'hand over' to process B at some point. It can then do this:
SECURITY_ATTRIBUTES sa = { sizeof (SECURITY_ATTRIBUTES), NULL, TRUE };
HANDLE hHandOffToA = CreateEventW (&sa, TRUE, FALSE, L"HandOffToA");
HANDLE hHandOffToB = CreateEventW (&sa, TRUE, FALSE, L"HandOffToB");
// Start process B
CreateProcess (...);
while (!quit)
{
// Do work, and then:
SetEvent (hHandOffToB);
WaitForSingleObject (hHandOffToA, INFINITE);
}
CloseHandle (hHandOffToA);
CloseHandle (hHandOffToB);
And process B can then do:
HANDLE hHandOffToA = OpenEventW (EVENT_MODIFY_STATE, FALSE, L"HandoffToA");
HANDLE hHandOffToB = OpenEventW (SYNCHRONIZE, FALSE, L"HandoffToB");
while (!quit)
{
WaitForSingleObject (hHandOffToB, INFINITE);
// Do work, and then:
SetEvent (hHandOffToA);
}
CloseHandle (hHandOffToA);
CloseHandle (hHandOffToB);
You should, of course, include proper error checking and I've left it up to you to decide how process A should tell process B to shut down (I guess it could just kill it). Remember also that event names are system-wide so choose them more carefully than I have done.
For very high precision one can use the funciton below:
void get_clock(LONGLONG* SYSTEM_TIME)
{
static REAL64 multiplier = 1.0;
static BOOL alreadyCalculated = FALSE;
if (alreadyCalculated == FALSE)
{
LARGE_INTEGER frequency;
BOOL result = QueryPerformanceFrequency(&frequency);
if (result == TRUE)
{
multiplier = 1000000000.0 / frequency.QuadPart;
}
else
{
DWORD error = GetLastError();
}
alreadyCalculated = TRUE;
}
LARGE_INTEGER time;
QueryPerformanceCounter(&time);
*SYSTEM_TIME = static_cast<SYSTEM_TIME_TYPE>(time.QuadPart * multiplier);
}
In my case sync objects didn't fit very well(however I have used them where time is not critical), instead I have redesigned my logic to put place holders where my thread need to take action and calculated the time using function above.
But still not sure if higher priority task arrives how long does it take windows to take it into cpu and preempt running one.

Scheduler using Timer Queues

I am working on an application where i need to schedule tasks based on the time set by the user. The user may add/modify/delete the schedules. To implement it i am considering using Timer Queues. Initially i though of using WaitableTimers which suite very much for my purpose but i cant make my thread to sleep for competing the APC.
Now with the Timer Queue i am not sure how to set the timer to signal based on Systemtime. I tried the following code but the callback function is never called
SYSTEMTIME st, lt;
GetSystemTime(&st);
FILETIME ft;
SystemTimeToFileTime(&st, &ft);
ULONGLONG qwResult;
// Copy the time into a quadword.
qwResult = (((ULONGLONG) ft.dwHighDateTime) << 32) + ft.dwLowDateTime;
// Add 20 seconds days.
qwResult += 20 * _SECOND;
HANDLE hTimerQueue = CreateTimerQueue();
HANDLE hTimer;
// Set a timer to call the timer routine in 10 seconds.
if (!CreateTimerQueueTimer( &hTimer, hTimerQueue ,(WAITORTIMERCALLBACK)TimerAPCProc, NULL , qwResult, 0, 0))
{
printf("CreateTimerQueueTimer failed (%d)\n", GetLastError());
return 3;
}
The callback routine will be called in qwResult milliseconds, and file time gives you the time in 100 nanoseconds. You do the math. GetSystemTimeAsFileTime Will give you FILETIME right away if that is the path you want to go.
Personally, I would keep a list of structure with times when the routines should be called and pointers to routines and iterate through the list once in a while and if the time of execution is due I would just call the function (or create a thread). That way your users can always review the scheduled tasks and change them.
It needs to be backed by WaitForSingleObject, or entering the thread into waitable state (using SleepEx for example).
You're passing in an absolute time, but the docs say you need to pass in the number of milliseconds from the current time.
If you want the timer to go off in 20 seconds, pass 20000 instead of qwResult

Make select based loop as responsive as possible

This thread will be very responsive to network activity but can be guaranteed to process the message queue only as often as 100 times a second. I can keep reducing the timeout but after a certain point I will be busy-waiting and chewing up CPU. Is it true that this solution is about as good as I'll get without switching to another method?
// semi pseudocode
while (1) {
process_thread_message_queue(); // function returns near-instantly
struct timeval t;
t.tv_sec = 0;
t.tv_usec = 10 * 1000; // 10ms = 0.01s
if (select(n,&fdset,0,0,t)) // see if there are incoming packets for next 1/100 sec
{
... // respond with more packets or processing
}
}
It depends on what your OS provides for your. On Windows you can wait for a thread message and a bunch of handles simultaneously using MsgWaitForMultipleObjectsEx. This solves your problem. On other OS you should have something similar.

What is the cleanest way to create a timeout for a while loop?

Windows API/C/C++
1. ....
2. ....
3. ....
4. while (flag1 != flag2)
5. {
6. SleepEx(100,FALSE);
//waiting for flags to be equal (flags are set from another thread).
7. }
8. .....
9. .....
If the flags don't equal each other after 7 seconds, I would like to continue to line 8.
Any help is appreciated. Thanks.
If you are waiting for a particular flag to be set or a time to be reached, a much cleaner solution may be to use an auto / manual reset event. These are designed for signalling conditions between threads and have very rich APIs designed on top of them. For instance you could use the WaitForMultipleObjects API which takes an explicit timeout value.
Do not poll for the flags to change. Even with a sleep or yield during the loop, this just wastes CPU cycles.
Instead, get the thread which sets the flags to signal you that they've been changed, probably using an event. Your wait on the event takes a timeout, which you can tweak to allow waiting of 7 seconds total.
For example:
Thread1:
flag1 = foo;
SetEvent(hEvent);
Thread2:
DWORD timeOutTotal = 7000; // 7 second timeout to start.
while (flag1 != flag2 && timeOutTotal > 0)
{
// Wait for flags to change
DWORD start = GetTickCount();
WaitForSingleObject(hEvent, timeOutTotal);
DWORD end = GetTickCount();
// Don't let timeOutTotal accidently dip below 0.
if ((end - start) > timeOutTotal)
{
timeOutTotal = 0;
}
else
{
timeOutTotal -= (end - start);
}
}
You can use QueryPerformanceCounter from WinAPI. Check it before while starts, and query if the amount of time has passed. However, this is a high resolution timer. For a lower resolution use GetTickCount (milliseconds).
All depends whether you are actively waiting (doing something) or passively waiting for an external process. If the latter, then the following code using Sleep will be a lot easier:
int count = 0;
while ( flag1 != flag2 && count < 700 )
{
Sleep( 10 ); // wait 10ms
++count;
}
If you don't use Sleep (or Yield) and your app is constantly checking on a condition, then you'll bloat the CPU the app is running on.
If you use WinAPI extensively, you should try out a more native solution, read about WinAPI's Synchronization Functions.
You failed to mention what will happen if the flags are equal.
Also, if you just test them with no memory barriers then you cannot guarantee to see any writes made by the other thread.
Your best bet is to use an Event, and use the WaitForSingleObject function with a 7000 millisecond time out.
Make sure you do a sleep() or yield() in there or you will eat up all the entire CPU (or core) waiting.
If your application does some networking stuff, have a look at the POSIX select() call, especially the timeout functionality!
I would say "check the time and if nothing has happened in seven seconds later, then break the loop.