Best way to implement a periodic linux task in c++20

Best way to implement a periodic linux task in c++20 - c++

I have a periodic task in c++, running on an embedded linux platform, and have to run at 5 ms intervals. It seems to be working as expected, but is my current solution good enough?
I have implemented the scheduler using sleep_until(), but some comments I have received is that setitimer() is better. As I would like the application to be at least some what portable, I would prefer c++ standard... of course unless there are other problems.
I have found plenty of sites that show implementation with each, but I have not found any arguments for why one solution is better than the other. As I see it, sleep_until() will implement an "optimal" on any (supported) platform, and I'm getting a feeling the comments I have received are focused more on usleep() (which I do not use).
My implementation looks a little like this:
bool is_submilli_capable() {
return std::ratio_greater<std::milli,
std::chrono::system_clock::period>::value;
}
int main() {
if (not is_submilli_capable())
exit(1);
while (true) {
auto next_time = next_period_start();
do_the_magic();
std::this_thread::sleep_until(next_time);
}
}
A short summoning of the issue.
I have an embedded linux platform, build with yocto and with RT capabilities
The application need to read and process incoming data every 5 ms
Building with gcc 11.2.0
Using c++20
All the "hard work" is done in separate threads, so this question is only regards triggering the task periodically and with minimal jitter

Since the application is supposed to read and process the data every 5 ms, it is possible that a few times, it does not perform the required operations. What I mean to say is that in a time interval of 20 ms, do_the_magic() is supposed to be invoked 4 times... But if the time taken to execute do_the_magic() is 10 ms, it will get invoked only 2 times. If that is an acceptable outcome, the current implementation is good enough.
Since the application is reading data, it probably receives it from the network or disk. And adding the overhead of processing it, it likely takes more than 5 ms to do so (depending on the size of the data). If it is not acceptable to miss out on any invocation of do_the_magic, the current implementation is not good enough.
What you could probably do is create a few threads. Each thread executes the do_the_magic function and then goes to sleep. Every 5 ms, you wake a sleeping thread which will most likely take less than 5 ms to happen. This way no invocation of do_the_magic is missed. Also, the number of threads depends on how long will do_the_magic take to execute.
bool is_submilli_capable() {
return std::ratio_greater<std::milli,
std::chrono::system_clock::period>::value;
}
void wake_some_thread () {
static int i = 0;
release_semaphore (i); // Release semaphore associated with thread i
i++;
i = i % NUM_THREADS;
}
void * thread_func (void * args) {
while (true) {
// Wait for a semaphore
do_the_magic();
}
int main() {
if (not is_submilli_capable())
exit(1);
while (true) {
auto next_time = next_period_start();
wake_some_thread (); // Releases a semaphore to wake a thread
std::this_thread::sleep_until(next_time);
}
Create as many semaphores as the number of threads where thread i is waiting for semaphore i. wake_some_thread can then release a semaphore starting from index 0 till NUM_THREADS and start again.

5ms is a pretty tight timing.
You can get a jitter-free 5ms tick only if you do the following:
Isolate a CPU for this thread. Configure it with nohz_full and rcu_nocbs
Pin your thread to this CPU, assign it a real-time scheduling policy (e.g., SCHED_FIFO)
Do not let any other threads run on this CPU core.
Do not allow any context switches in this thread. This includes avoiding system calls altogether. I.e., you cannot use std::this_thread::sleep_until(...) or anything else.
Do a busy wait in between processing (ensure 100% CPU utilisation)
Use lock-free communication to transfer data from this thread to other, non-real-time threads, e.g., for storing the data to files, accessing network, logging to console, etc.
Now, the question is how you're going to "read and process data" without system calls. It depends on your system. If you can do any user-space I/O (map the physical register addresses to your process address space, use DMA without interrupts, etc.) - you'll have a perfectly real-time processing. Otherwise, any system call will trigger a context switch, and latency of this context switch will be unpredictable.
For example, you can do this with certain Ethernet devices (SolarFlare, etc.), with 100% user-space drivers. For anything else you're likely to have to write your own user-space driver, or even implement your own interrupt-free device (e.g., if you're running on an FPGA SoC).

Related

Empty loop which waits for condition (busy-waiting)

I have been spending the last 20 minutes doing research on empty loops which purpose are only to wait for a condition to become true.
I have a function called "waitForLoaded" which is a thread created by CreateThread.
The function:
void waitForLoaded(){
while(!isLoaded){
Sleep(500); // < my question
}
Sleep(500); //sleep another 500ms to ensure everything is loaded.
//continue on here
}
I am using Sleep(500) to be easy on the CPU as I believe that using either 0 or 1 would drain the processor.
I have seen in many peoples code "Sleep(0)" used and I never understood why not just no sleep at all and to do "while(condition){}.."
I can't find any solid answer on which is more CPU friendly so I am asking people here, what is the difference between busy-waiting with 0ms, 1ms or 500ms and which is more CPU friendly.
In my opinion it would be best to do at least a half sleep which is nearly unnoticeable by the user.

On windows a Sleep(0) will not spend any time sleeping, but allows the OS to relinquish the CPU to another waiting thread. It's kind of like saying "If someone is waiting in line let them go ahead, otherwise I'd like to go right away."

If I understand your question, you are asking which is superior of these wait methods:
sleep(500)
sleep(1)
sleep(0)
// (do nothing)
If you have the time to afford a sleep(500), then the answer is "sleep(500)"

A simple synchronization primitive around event or something similar would drain less of CPU AND your thread would hopefully get to work sooner than worst case 500 ms with your 500 ms wait.

First you need to study your problem.
Do you need a busy wait?
Can you use a dispatcher?
Can you detect the exact moment when the data is available or the operation has finished?
I would take a look into different approaches like event file descriptor, or condition variable.
Condition variable approach:
boost::mutex::scoped_lock lock(m_mutex);
while(queue.empty() && !m_quit) {
m_condition.wait(lock);
}
Event File descriptor approach
m_loopFD = eventfd(0,EFD_CLOEXEC|EFD_NONBLOCK);
if(m_loopFD < 0) {
close(m_epollFD);
throw ...
}
struct epoll_event event;
memset(&event, 0, sizeof(struct epoll_event));
event.data.fd = m_loopFD;
event.events = EPOLLIN;
if(epoll_ctl(m_epollFD, EPOLL_CTL_ADD, m_loopFD, &event) != 0) {
throw ...
}
Later you may have something like this
int res = epoll_wait(m_epollFD, events, MAX_EVENTS, timeout);
and to wake it up:
uint64_t value = 0x01;
write(m_loopFD, &value, sizeof(value));

Busy waiting is basically related to a code which must calculate something to only lose the time. The Sleep does use OS scheduler in case if need to wait a quite long period of time which means it is not stable for period of times less than the scheduler time quant which is ~15ms for the Windows OS.
This is not acceptible for example in case of a spinlock.
The most simple code what I could get is:
#include <cstdlib>
inline void noop_by_rand(int num)
{
while (num--) rand();
}
Pros:
It is a builtin function with a fixed time of calculation less than the OS scheduler time quant.
Can be easely scaled on longer time.
The compiler optimization can not avoid the call or reduce the code because of external function.
Does rely on CPU performance instead of the time, which means it does scale with the performance of the CPU.
Cons:
Does not avoid the OS scheduler. If busy wait time is too long, then OS scheduler will anyway handle the thread for scheduling, which means it can lose much more time than requested.

In C++, how would I make a random number (either 1 or 2) that changes every 5 minutes?

I'm trying to make a simple game and I have a shop in the game. I want it to be every 5 minutes(if the function changeItem() is called) the item in the shop either switches or stays the same. I have no problem generating the random number, but I have yet to find a thread that shows how to make it generate differently each 5 minutes. Thank you.

In short, keep track of the last time the changeItem() function was called. If it is more than 5 minutes since the last time it was called, then use your random number generator to generate a new number. Otherwise, use the saved number from the last time it was generated.

You've already accepted an answer but I would like to say that for apps that need simple timing like this and don't need great accuracy, a simple calculation in the main loop as all you need.
Kicking off a thread for a single timer is a lot of unnecessary overhead.
So, here's the code showing how you'd go about doing it.
#define FIVE_MINUTES (60*5)
int main(int argc, char** argv){
time_t lastChange = 0, tick;
run_game_loop = true;
while (run_game_loop){
// ... game loop
tick = time(NULL);
if ((tick - lastChange) >= FIVE_MINUTES){
changeItem();
lastChange = tick;
}
}
return 0;
}
It somewhat assumes to be called reasonably regularly though. If on the other hand you need it accurate then a thread would be better. And depending on platform there exist API's for timers that get called by the system.

Standard and portable approach:
You could consider C++11 threads. The general idea would be :
#include <thread>
#include <chrono>
void myrandogen () // function that refreshes your randum number:
// will be executed as a thread
{
while (! gameover ) {
this_thread::sleep_for (std::chrono::minutes(5)); // wait 5 minutes
... // generate your random number and update your game data structure
}
}
in the main function, you would then instantiate a thread with your function:
thread t1 (myrandomgen); // create an launch thread
... // do your stuff until game over
t1.join (); // wait until thread returns
Of course you could also pass parameters (references to shared variables, etc...) when you create the thread, like this:
thread t1 (myrandomgen, param1, param2, ....);
The advantage of this approach is that it's standard and portable.
Non-portable alternatives:
I'm less familiar with these, but:
In a MSWIN environment, you could use SetTimer(...) to define a function to be called at regular interval (and KillTimer(...) to delete it). But this requires a programm structure build around the windows event processing loop.
In a linux environment, you could similarly define a call back function with signal(SIGALRM, ...) and activate periodic calls with alarm().
Small update on performance considerations:
Following several reamrks about overkill of therads and performance, I've done a benchmark, executing 1 billion loop iterations an waiting 1 microsecond each 100K iterations. The whole thing was run on an i7 multicore CPU:
Non threaded execution yielded 213K iterations per millisec.
2 thread execution yielded 209K iterations per millisec and per thread. So slightly slower for each thread. The total execution time was however only 70 to 90 ms longer, so that the overall throughput is at 418 K iterations.
How come ? Because the second thread is using a non used core on the processor. This means that with the adequate architecture, a game could process many more calculatios when using multithreading...

Simulating CPU Load In C++

I am currently writing an application in Windows using C++ and I would like to simulate CPU load.
I have the following code:
void task1(void *param) {
unsigned elapsed =0;
unsigned t0;
while(1){
if ((t0=clock())>=50+elapsed){//if time elapsed is 50ms
elapsed=t0;
Sleep(50);
}
}
}
int main(){
int ThreadNr;
for(int i=0; i < 4;i++){//for each core (i.e. 4 cores)
_beginthread( task1, 0, &ThreadNr );//create a new thread and run the "task1" function
}
while(1){}
}
I wrote this code using the same methodology as in the answers given in this thread: Simulate steady CPU load and spikes
My questions are:
Have I translated the C# code from the other post correctly over to C++?
Will this code generate an average CPU load of 50% on a quad-core processor?
How can I, within reasonable accuracy, find out the load percentage of the CPU? (is task manager my only option?)
EDIT: The reason I ask this question is that I want to eventually be able to generate CPU loads of 10,20,30,...,90% within a reasonable tolerance. This code seems to work well for to generate loads 70%< but seems to be very inaccurate at any load below 70% (as measured by the task manager CPU load readings).
Would anyone have any ideas as to how I could generate said loads but still be able to use my program on different computers (i.e. with different CPUs)?

At first sight, this looks like not-pretty-but-correct C++ or C (an easy way to be sure is to compile it). Includes are missing (<windows.h>, <process.h>, and <time.h>) but otherwise it compiles fine.
Note that clock and Sleep are not terribly accurate, and Sleep is not terribly reliable either. On the average, the thread function should kind of work as intended, though (give or take a few percent of variation).
However, regarding question 2) you should replace the last while(1){} with something that blocks rather than spins (e.g. WaitForSingleObject or Sleep if you will). otherwise the entire program will not have 50% load on a quadcore. You will have 100% load on one core due to the main thread, plus the 4x 50% from your four workers. This will obviously sum up to more than 50% per core (and will cause threads to bounce from one core to the other, resulting in nasty side effects).
Using Task Manager or a similar utility to verify whether you get the load you want is a good option (and since it's the easiest solution, it's also the best one).
Also do note that simulating load in such a way will probably kind of work, but is not 100% reliable.
There might be effects (memory, execution units) that are hard to predict. Assume for example that you're using 100% of the CPU's integer execution units with this loop (reasonable assumption) but zero of it's floating point or SSE units. Modern CPUs may share resources between real or logical cores, and you might not be able to predict exactly what effects you get. Or, another thread may be memory bound or having significant page faults, so taking away CPU time won't affect it nearly as much as you think (might in fact give it enough time to make prefetching work better). Or, it might block on AGP transfers. Or, something else you can't tell.
EDIT:
Improved version, shorter code that fixes a few issues and also works as intended:
Uses clock_t for the value returned by clock (which is technically "more correct" than using a not specially typedef'd integer. Incidentially, that's probably the very reason why the original code does not work as intended, since clock_t is a signed integer under Win32. The condition in if() always evaluates true, so the workers sleep almost all the time, consuming no CPU.
Less code, less complicated math when spinning. Computes a wakeup time 50 ticks in the future and spins until that time is reached.
Uses getchar to block the program at the end. This does not burn CPU time, and it allows you to end the program by pressing Enter. Threads are not properly ended as one would normally do, but in this simple case it's probably OK to just let the OS terminate them as the process exits.
Like the original code, this assumes that clock and Sleep use the same ticks. That is admittedly a bold assumption, but it holds true under Win32 which you used in the original code (both "ticks" are milliseconds). C++ doesn't have anything like Sleep (without boost::thread, or C++11 std::thread), so if non-Windows portability is intended, you'd have to rethink anyway.
Like the original code, it relies on functions (clock and Sleep) which are unprecise and unreliable. Sleep(50) equals Sleep(63) on my system without using timeBeginPeriod. Nevertheless, the program works "almost perfectly", resulting in a 50% +/- 0.5% load on my machine.
Like the original code, this does not take thread priorities into account. A process that has a higher than normal priority class will be entirely unimpressed by this throttling code, because that is how the Windows scheduler works.
#include <windows.h>
#include <process.h>
#include <time.h>
#include <stdio.h>
void task1(void *)
{
while(1)
{
clock_t wakeup = clock() + 50;
while(clock() < wakeup) {}
Sleep(50);
}
}
int main(int, char**)
{
int ThreadNr;
for(int i=0; i < 4; i++) _beginthread( task1, 0, &ThreadNr );
(void) getchar();
return 0;
}

Here is an a code sample which loaded my CPU to 100% on Windows.
#include "windows.h"
DWORD WINAPI thread_function(void* data)
{
float number = 1.5;
while(true)
{
number*=number;
}
return 0;
}
void main()
{
while (true)
{
CreateThread(NULL, 0, &thread_function, NULL, 0, NULL);
}
}
When you build the app and run it, push Ctrl-C to kill the app.

You can use the Windows perf counter API to get the CPU load. Either for the entire system or for your process.

gettimeofday on uLinux wierd behaviour

Recently i've been trying to create a wait function that waits for 25 ms using the wall clock as reference. I looked around and found "gettimeofday", but i've been having problems with it. My code (simplified):
while(1)
{
timeval start, end;
double t_us;
bool release = false;
while (release == false)
{
gettimeofday(&start, NULL);
DoStuff();
{
gettimeofday(&end, NULL);
t_us = ( (end.tv_sec - start.tv_sec) * 1000*1000) + (end.tv_usec - start.tv_usec);
if (t_us >= 25000) //25 ms
{
release = true;
}
}
}
}
This code runs in a thread (Posix) and, on it's its own, works fine. DoStuff() is called every 25ms. It does however eat all the CPU if it can (as you might expect) so obviously this isn't a good idea.
When I tried throttling it by adding a Sleep(1); in the wait loop after the if statement, the entire thing slows by about 50% (that is, it called DoStuff every 37ms or so. This makes no sense to me - assuming DoStuff and any other threads complete their tasks in under (25 - 1) ms the called rate of DoStuff shouldn't be affected (allowing a 1ms error margin)
I also tried Sleep(0), usleep(1000) and usleep(0) but the behaviour is the same.
The same behaviour occurs whenever another higher priority thread needs CPU time (without the sleep). It's as if the clock stops counting when the thread reliqnuishes runtime.
I'm aware that gettimeofday is vulnerable to things like NTP updates etc... so I tried using clock_gettime but linking -ltr on my system causes problems so i don't think that is an option.
Does anyone know what i'm doing wrong?

The part that's missing here is how the kernel does thread scheduling based on time slices. In rough numbers, if you sleep at the beginning of your time slice for 1ms and the scheduling is done on a 35ms clock rate, your thread may not execute again for 35ms. If you sleep for 40ms, your thread may not execute again for 70ms. You can't really change that without changing the scheduling, but that's not recommended due to overall performance implications of the system. You could use a "high-resolution" timer, but often that's implemented in a tight cycle-wasting loop of "while it's not time yet, chew CPU" so that's not really desirable either.
If you used a high-resolution clock and queried it frequently inside of your DoStuff loop, you could potentially play some tricks like run for 30ms, then do a sleep(1) which could effectively relinquish your thread for the remainder of your timeslice (e.g. 5ms) to let other threads run. Kind of a cooperative/preemptive multitasking if you will. It's still possible you don't get back to work for an extended period of time though...

All variants of sleep()/usleep() involve yielding the CPU to other runnable tasks. Your programm can then run only after it is rescheduled by the kernel, which seems to last about 37 ms in your case.

How to make thread sleep less than a millisecond on Windows

On Windows I have a problem I never encountered on Unix. That is how to get a thread to sleep for less than one millisecond. On Unix you typically have a number of choices (sleep, usleep and nanosleep) to fit your needs. On Windows, however, there is only Sleep with millisecond granularity.
On Unix, I can use the use the select system call to create a microsecond sleep which is pretty straightforward:
int usleep(long usec)
{
struct timeval tv;
tv.tv_sec = usec/1000000L;
tv.tv_usec = usec%1000000L;
return select(0, 0, 0, 0, &tv);
}
How can I achieve the same on Windows?

This indicates a mis-understanding of sleep functions. The parameter you pass is a minimum time for sleeping. There's no guarantee that the thread will wake up after exactly the time specified. In fact, threads don't "wake up" at all, but are rather chosen for execution by the OS scheduler. The scheduler might choose to wait much longer than the requested sleep duration to activate a thread, especially if another thread is still active at that moment.

As Joel says, you can't meaningfully 'sleep' (i.e. relinquish your scheduled CPU) for such short periods. If you want to delay for some short time, then you need to spin, repeatedly checking a suitably high-resolution timer (e.g. the 'performance timer') and hoping that something of high priority doesn't pre-empt you anyway.
If you really care about accurate delays of such short times, you should not be using Windows.

Use the high resolution multimedia timers available in winmm.lib. See this for an example.

#include <Windows.h>
static NTSTATUS(__stdcall *NtDelayExecution)(BOOL Alertable, PLARGE_INTEGER DelayInterval) = (NTSTATUS(__stdcall*)(BOOL, PLARGE_INTEGER)) GetProcAddress(GetModuleHandle("ntdll.dll"), "NtDelayExecution");
static NTSTATUS(__stdcall *ZwSetTimerResolution)(IN ULONG RequestedResolution, IN BOOLEAN Set, OUT PULONG ActualResolution) = (NTSTATUS(__stdcall*)(ULONG, BOOLEAN, PULONG)) GetProcAddress(GetModuleHandle("ntdll.dll"), "ZwSetTimerResolution");
static void SleepShort(float milliseconds) {
static bool once = true;
if (once) {
ULONG actualResolution;
ZwSetTimerResolution(1, true, &actualResolution);
once = false;
}
LARGE_INTEGER interval;
interval.QuadPart = -1 * (int)(milliseconds * 10000.0f);
NtDelayExecution(false, &interval);
}
Works very well for sleeping extremely short times. Remember though that at a certain point the actual delays will never be consistent because the system can't maintain consistent delays of such a short time.

Yes, you need to understand your OS' time quantums. On Windows, you won't even be getting 1ms resolution times unless you change the time quantum to 1ms. (Using for example timeBeginPeriod()/timeEndPeriod()) That still won't really guarantee anything. Even a little load or a single crappy device driver will throw everything off.
SetThreadPriority() helps, but is quite dangerous. Bad device drivers can still ruin you.
You need an ultra-controlled computing environment to make this ugly stuff work at all.

Generally a sleep will last at least until the next system interrupt occurs. However, this
depends on settings of the multimedia timer resources. It may be set to something close to
1 ms, some hardware even allows to run at interrupt periods of 0.9765625 (ActualResolution provided by NtQueryTimerResolution will show 0.9766 but that's actually wrong. They just can't put the correct number into the ActualResolution format. It's 0.9765625ms at 1024 interrupts per second).
There is one exception wich allows us to escape from the fact that it may be impossible to sleep for less than the interrupt period: It is the famous Sleep(0). This is a very powerful
tool and it is not used as often as it should! It relinquishes the reminder of the thread's time slice. This way the thread will stop until the scheduler forces the thread to get cpu service again. Sleep(0) is an asynchronous service, the call will force the scheduler to react independent of an interrupt.
A second way is the use of a waitable object. A wait function like WaitForSingleObject() can wait for an event. In order to have a thread sleeping for any time, also times in the microsecond regime, the thread needs to setup some service thread which will generate an event at the desired delay. The "sleeping" thread will setup this thread and then pause at the wait function until the service thread will set the event signaled.
This way any thread can "sleep" or wait for any time. The service thread can be of big complexity and it may offer system wide services like timed events at microsecond resolution. However, microsecond resolution may force the service thread to spin on a high resolution time service for at most one interrupt period (~1ms). If care is taken, this can
run very well, particulary on multi-processor or multi-core systems. A one ms spin does not hurt considerably on multi-core system, when the affinity mask for the calling thread and the service thread are carefully handled.
Code, description, and testing can be visited at the Windows Timestamp Project

As several people have pointed out, sleep and other related functions are by default dependent on the "system tick". This is the minimum unit of time between OS tasks; the scheduler, for instance, will not run faster than this. Even with a realtime OS, the system tick is not usually less than 1 ms. While it is tunable, this has implications for the entire system, not just your sleep functionality, because your scheduler will be running more frequently, and potentially increasing the overhead of your OS (amount of time for the scheduler to run, vs. amount of time a task can run).
The solution to this is to use an external, high-speed clock device. Most Unix systems will allow you to specify to your timers and such a different clock to use, as opposed to the default system clock.

What are you waiting for that requires such precision? In general if you need to specify that level of precision (e.g. because of a dependency on some external hardware) you are on the wrong platform and should look at a real time OS.
Otherwise you should be considering if there is an event you can synchronize on, or in the worse case just busy wait the CPU and use the high performance counter API to measure the elapsed time.

If you want so much granularity you are in the wrong place (in user space).
Remember that if you are in user space your time is not always precise.
The scheduler can start your thread (or app), and schedule it, so you are depending by the OS scheduler.
If you are looking for something precise you have to go:
1) In kernel space (like drivers)
2) Choose an RTOS.
Anyway if you are looking for some granularity (but remember the problem with user space ) look to
QueryPerformanceCounter Function and QueryPerformanceFrequency function in MSDN.

Actually using this usleep function will cause a big memory/resource leak. (depending how often called)
use this corrected version (sorry can't edit?)
bool usleep(unsigned long usec)
{
struct timeval tv;
fd_set dummy;
SOCKET s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
FD_ZERO(&dummy);
FD_SET(s, &dummy);
tv.tv_sec = usec / 1000000ul;
tv.tv_usec = usec % 1000000ul;
bool success = (0 == select(0, 0, 0, &dummy, &tv));
closesocket(s);
return success;
}

I have the same problem and nothing seems to be faster than a ms, even the Sleep(0). My problem is the communication between a client and a server application where I use the _InterlockedExchange function to test and set a bit and then I Sleep(0).
I really need to perform thousands of operations per second this way and it doesn't work as fast as I planned.
Since I have a thin client dealing with the user, which in turn invokes an agent which then talks to a thread, I will move soon to merge the thread with the agent so that no event interface will be required.
Just to give you guys an idea how slow this Sleep is, I ran a test for 10 seconds performing an empty loop (getting something like 18,000,000 loops) whereas with the event in place I only got 180,000 loops. That is, 100 times slower!

Try using SetWaitableTimer...

Like everybody mentioned, there is indeed no guarantees about the sleep time.
But nobody wants to admit that sometimes, on an idle system, the usleep command can be very precise. Especially with a tickless kernel. Windows Vista has it and Linux has it since 2.6.16.
Tickless kernels exists to help improve laptops batterly life: c.f. Intel's powertop utility.
In that condition, I happend to have measured the Linux usleep command that respected the requested sleep time very closely, down to half a dozen of micro seconds.
So, maybe the OP wants something that will roughly work most of the time on an idling system, and be able to ask for micro second scheduling!
I actually would want that on Windows too.
Also Sleep(0) sounds like boost::thread::yield(), which terminology is clearer.
I wonder if Boost-timed locks have a better precision. Because then you could just lock on a mutex that nobody ever releases, and when the timeout is reached, continue on...
Timeouts are set with boost::system_time + boost::milliseconds & cie (xtime is deprecated).

If your goal is to "wait for a very short amount of time" because you are doing a spinwait, then there are increasing levels of waiting you can perform.
void SpinOnce(ref Int32 spin)
{
/*
SpinOnce is called each time we need to wait.
But the action it takes depends on how many times we've been spinning:
1..12 spins: spin 2..4096 cycles
12..32: call SwitchToThread (allow another thread ready to go on time core to execute)
over 32 spins: Sleep(0) (give up the remainder of our timeslice to any other thread ready to run, also allows APC and I/O callbacks)
*/
spin += 1;
if (spin > 32)
Sleep(0); //give up the remainder of our timeslice
else if (spin > 12)
SwitchTothread(); //allow another thread on our CPU to have the remainder of our timeslice
else
{
int loops = (1 << spin); //1..12 ==> 2..4096
while (loops > 0)
loops -= 1;
}
}
So if your goal is actually to wait only for a little bit, you can use something like:
int spin = 0;
while (!TryAcquireLock())
{
SpinOne(ref spin);
}
The virtue here is that we wait longer each time, eventually going completely to sleep.

Just use Sleep(0). 0 is clearly less than a millisecond. Now, that sounds funny, but I'm serious. Sleep(0) tells Windows that you don't have anything to do right now, but that you do want to be reconsidered as soon as the scheduler runs again. And since obviously the thread can't be scheduled to run before the scheduler itself runs, this is the shortest delay possible.
Note that you can pass in a microsecond number to your usleep, but so does void usleep(__int64 t) { Sleep(t/1000); } - no guarantees to actually sleeping that period.

Sleep function that is way less than a millisecond-maybe
I found that sleep(0) worked for me. On a system with a near 0% load on the cpu in task manager, I wrote a simple console program and the sleep(0) function slept for a consistent 1-3 microseconds, which is way less than a millisecond.
But from the above answers in this thread, I know that the amount sleep(0) sleeps can vary much more wildly than this on systems with a large cpu load.
But as I understand it, the sleep function should not be used as a timer. It should be used to make the program use the least percentage of the cpu as possible and execute as frequently as possible. For my purposes, such as moving a projectile across the screen in a videogame much faster than one pixel a millisecond, sleep(0) works, I think.
You would just make sure the sleep interval is way smaller than the largest amount of time it would sleep. You don't use the sleep as a timer but just to make the game use the minimum amount of cpu percentage possible. You would use a separate function that has nothing to do is sleep to get to know when a particular amount of time has passed and then move the projectile one pixel across the screen-at a time of say 1/10th of a millisecond or 100 microseconds.
The pseudo-code would go something like this.
while (timer1 < 100 microseconds) {
sleep(0);
}
if (timer2 >=100 microseconds) {
move projectile one pixel
}
//Rest of code in iteration here
I know the answer may not work for advanced issues or programs but may work for some or many programs.

If the machine is running Windows 10 version 1803 or later then you can use CreateWaitableTimerExW with the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag.

On Windows the use of select forces you to include the Winsock library which has to be initialized like this in your application:
WORD wVersionRequested = MAKEWORD(1,0);
WSADATA wsaData;
WSAStartup(wVersionRequested, &wsaData);
And then the select won't allow you to be called without any socket so you have to do a little more to create a microsleep method:
int usleep(long usec)
{
struct timeval tv;
fd_set dummy;
SOCKET s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
FD_ZERO(&dummy);
FD_SET(s, &dummy);
tv.tv_sec = usec/1000000L;
tv.tv_usec = usec%1000000L;
return select(0, 0, 0, &dummy, &tv);
}
All these created usleep methods return zero when successful and non-zero for errors.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js