Looking at several videos and the documentation example, we unlock the mutex before calling the notify_all(). Will it be better to instead call it after?
The common way:
Inside the Notifier thread:
//prepare data for several worker-threads;
//and now, awaken the threads:
std::unique_lock<std::mutex> lock2(sharedMutex);
_threadsCanAwaken = true;
lock2.unlock();
_conditionVar.notify_all(); //awaken all the worker threads;
//wait until all threads completed;
//cleanup:
_threadsCanAwaken = false;
//prepare new batches once again, etc, etc
Inside one of the worker threads:
while(true){
// wait for the next batch:
std::unique_lock<std::mutex> lock1(sharedMutex);
_conditionVar.wait(lock1, [](){return _threadsCanAwaken});
lock1.unlock(); //let sibling worker-threads work on their part as well
//perform the final task
//signal the notifier that one more thread has completed;
//loop back and wait until the next task
}
Notice how the lock2 is unlocked before we notify the condition variable - should we instead unlock it after the notify_all() ?
Edit
From my comment below: My concern is that, what if the worker spuriously awakes, sees that the mutex is unlocked, super-quickly completes the task and loops back to the start of while. Now the slow-poke Notifier finally calls notify_all(), causing the worker to loop an additional time (excessive and undesired).
There are no advantages to unlocking the mutex before signaling the condition variable unless your implementation is unusual. There are two disadvantages to unlocking before signaling:
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
There is a performance penalty for unlocking before signaling that is avoided by unlocking after signaling. If you signal before you unlock, a good implementation will know that your signal cannot possibly render any thread ready-to-run because the mutex is held by the calling thread and any thread affects by the condition variable necessarily cannot make forward progress without the mutex. This permits a significant optimization (often called "wait morphing") that is not possible if you unlock first.
So signal while holding the lock unless you have some unusual reason to do otherwise.
should we instead unlock it after the notify_all() ?
It is correct to do it either way but you may have different behavior in different situations. It is quite difficult to predict how it will affect performance of your program - I've seen both positive and negative effects for different applications. So it is better you profile your program and make decision on your particular situation based on profiling.
As mentioned here : cppreference.com
The notifying thread does not need to hold the lock on the same mutex
as the one held by the waiting thread(s); in fact doing so is a
pessimization, since the notified thread would immediately block
again, waiting for the notifying thread to release the lock.
That said, documentation for wait
At the moment of blocking the thread, the function automatically calls
lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function
unblocks and calls lck.lock(), leaving lck in the same state as when
the function was called. Then the function returns (notice that this
last mutex locking may block again the thread before returning).
so when notified wait will re-attempt to gain the lock and in that process it will get blocked again till original notifying thread releases the lock.
So I'll suggest that release the lock before calling notify. As done in example on cppreference.com and most importantly
Don't be Pessimistic.
David's answer seems to me wrong.
First, assuming the simple case of two threads, one waiting for the other on a condition variable, unlocking first by the notifier will not waken the other waiting thread, as the signal has not arrived. Then the notify call will immediately waken the waiting thread. You do not need any special optimizations.
On the other hand, signalling first has the potential of waking up a thread and making it sleep immediately again, as it cannot hold the lock—unless wait morphing is implemented.
Wait morphing does not exist in Linux at least, according to the answer under this StackOverflow question: Which OS / platforms implement wait morphing optimization?
The cppreference example also unlocks first before signalling: https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all
It explicit says:
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s). Doing so may be a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock, though some implementations recognize the pattern and do not attempt to wake up the thread that is notified under lock.
should we instead unlock it after the notify_all() ?
After reading several related posts, I've formed the opinion that it's purely a performance issue. If OS supports "wait morphing", unlock after; otherwise, unlock before.
I'm adding an answer here to augment that of #DavidSchwartz 's. Particularly, I'd like to clarify his point 1.
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
The 1st thing I said is that, because it's a CV and not a Mutex, a better term for the so-called "deadlock" might be "sleep paralysis" - a mistake some programs make is that
a thread that's supposed to wake
went to sleep due to not rechecking the condition it's been waiting for before wait'ng again.
The 2nd thing is that, when waking some other thread(s),
the default choice should be broadcast/notify_all (broadcast is the POSIX term, which is equivalent to its C++ counterpart).
signal/notify is an optimized special case used for when there's only 1 other thread is waiting.
Finally 3rd, David is adamant that
it's better to unlock after notify,
because it can avoid the "deadlock" which I've been referring to as "sleep paralysis".
If it's unlock then notify, then there's a window where another thread (let's call this the "wrong" thread) may i.) acquire the mutex, ii.)going into wait, and iii.) wake up. The steps i. ii. and iii. happens too quickly, consumed the signal, leaving the intended (let's call it "correct") thread in sleep.
I discussed this extensively with David, he clarified that only when all 3 points are violated ( 1. condvar associated with several separate conditions and/or didn't check it before waiting again; 2. signal/notify only 1 thread when there're more than 1 other threads using the condvar; 3. unlock before notify creating a window for race condition ), the "sleep paralysis" would occur.
Finally, my recommendation is that, point 1 and 2 are essential for correctness of the program, and fixing issues associated with 1 and 2 should be prioritized over 3, which should only be a augmentative "last resort".
For the purpose of providing reference, manpage for signal/broadcast and wait contains some info from version 3 of Single Unix Specification that gave some explanations on point 1 and 2, and partly 3. Although specified for POSIX/Unix/Linux in C, it's concepts are applicable to C++.
As of this writing (2023-01-31), the 2018 edition of version 4 of Single Unix Specification is released, and the drafting of version 5 is underway.
Related
I have a program structured like that: one thread that receives tasks and writes them to input queue, multiple which process them and write in output queue, one that responds with results from it. When queue is empty, thread sleeps for several milliesconds. Queue has mutex inside it, pushing does lock(), and popping does try_lock() and returns if there is nothing in queue.
This is processing thread for example:
//working - atomic bool
while (working) {
if (!inputQue_->pop(msg)) {
std::this_thread::sleep_for(std::chrono::milliseconds(200));
continue;
} else {
string reply = messageHandler_->handle(msg);
if (!reply.empty()) {
outputQue_->push(reply);
}
}
}
And the thing that I dont like is that the time since receiving task until responding, as i have measured with high_resolution_clock, is almost 0, when there is no sleeping. When there is sleeping, it becomes bigger.
I dont want cpu resources to be wasted and want to do something like that: when recieving thread gets task, it notifies one of the processing threads, that does wait_for, and when processing task is done, it notifies responding thread same way. As a result I think i will get less time spent and cpu resources will not be wasted. And I have some questions:
Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?
Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
Yes, assuming you do it correctly.
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
You can use a single mutex and a single condition variable, but that makes it a bit more complex. I'd suggest a single mutex, but one condition variable for each condition a thread might want to wait for.
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
Absolutely not. You need to hold the mutex while you check whether the queue is empty and continue to hold it until you call wait_for. Otherwise, you destroy the entire logic of the condition variable. The mutex associated with the condition variable must protect the condition that the thread is going to wait for, which in this case is the queue being non-empty.
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
I don't know what you mean by the "free thread". As a general rule, you can use notify_one if it's not possible for a thread to be blocked on the condition variable that can't handle the condition. You should use notify_all if either more than one thread might need to be awoken or there's a possibility that more than one thread will be blocked on the condition variable and the "wrong thread" could be woken, that is, there could be at least one thread that can't do whatever it is that needs to be done.
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?
Sure, it's quite possible. But that would mean no threads were blocked on the condition. In that case, no thread can block on the condition because threads must check the condition before they wait, and they do it while holding a mutex. To provide this atomic "unlock and wait" semantic is the entire purpose of a condition variable.
The mechanism you have is called polling. The thread repeatedly checks (polls) if there is data available. As you mentioned, it has the drawback of wasting time. (But it is simple). What you mentioned you would like to use is called a blocking mechanism. This deschedules the thread until the moment that work becomes available.
1) Yes (although I don't know exactly what you're imagining)
2) a) Yes, 2 condition variables is one way to do it. b) Common mutex is best
3) You would probably place those within pop, and calling pop would have the potential to block.
4) No. notify_one will only wake a thread that is currently waiting from having called wait. Also, if multiple are waiting, it is not necessarily guaranteed which will receive the notification. (OS/library dependent)
5) No. If 1+ threads are waiting, notify_one it is guaranteed to wake one. BUT if no threads are waiting, the notification is consumed (and has no effect). Note that under certain edge conditions, notify_one may actually wake more than one. Also, a thread may wake from wait without anyone having called notify_one ("Spurious wake up"). The fact that this can happen at all means that you always have to do additional checking for it.
This is called the producer/consumer problem btw.
In general, your considerations about condition variable are correct. My proposal is more connected to design and reusability of such functionality.
The main idea is to implement ThreadPool pattern, which has constructor with number of worker threads ,methods submitTask, shutdown, join.
Having such class, you will use 2 instances of pools: one multithreaded for processing, second (singlethreaded by your choice) for result sending.
The pool consists of Blocking Queue of Tasks and array of Worker threads, each performing the same "pop Task and run" loop.The Blocking Queue encapsulates mutex and cond_var. The Task is common functor.
This also brings your design to Task oriented approach, which has a lot of advantages in future of your application.
You are welcome to ask more questions about implementation details if you like this idea.
Best regards, Daniel
On my neverending quest to understand std::contion_variables I've run into the following. On this page it says the following:
void print_id (int id) {
std::unique_lock<std::mutex> lck(mtx);
while (!ready) cv.wait(lck);
// ...
std::cout << "thread " << id << '\n';
}
And after that it says this:
void go() {
std::unique_lock<std::mutex> lck(mtx);
ready = true;
cv.notify_all();
}
Now as I understand it, both of these functions will halt on the std::unqique_lock line. Until a unique lock is acquired. That is, no other thread has a lock.
So say the print_id function is executed first. The unique lock will be aquired and the function will halt on the wait line.
If the go function is then executed (on a separate thread), the code there will halt on the unique lock line. Since the mutex is locked by the print_id function already.
Obviously this wouldn't work if the code was like that. But I really don't see what I'm not getting here. So please enlighten me.
What you're missing is that wait unlocks the mutex and then waits for the signal on cv.
It locks the mutex again before returning.
You could have found this out by clicking on wait on the page where you found the example:
At the moment of blocking the thread, the function automatically calls lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function unblocks and calls lck.lock(), leaving lck in the same state as when the function was called.
There's one point you've missed—calling wait() unlocks the mutex. The thread atomically (releases the mutex + goes to sleep). Then, when woken by the signal, it tries to re-acquire the mutex (possibly blocking); once it acquires it, it can proceed.
Notice that it's not necessary to have the mutex locked for calling notify_*, only for wait*
To answer the question as posed, which seems necessary regarding claims that you should not acquire a lock on notification for performance reasons (isn't correctness more important than performance?): The necessity to lock on "wait" and the recommendation to always lock around "notify" is to protect the user from himself and his program from data and logical races. Without the lock in "go", the program you posted would immediately have a data race on "ready". However, even if ready were itself synchronized (e.g. atomic) you would have a logical race with a missed notification, because without the lock in "go" it is possible for the notify to occur just after the check for "ready" and just before the actual wait, and the waiting thread may then remain blocked indefinitely. The synchronization on the atomic variable itself is not enough to prevent this. This is why helgrind will warn when a notification is done without holding the lock. There are some fringe cases where the mutex lock is really not required around the notify. In all of these cases, there needs to be a bidirectional synchronization beforehand so that the producing thread can know for sure that the other thread is already waiting. IMO these cases are for experts only. Actually, I have seen an expert, giving a talk about multi-threading, getting this wrong — he thought an atomic counter would suffice. That said, the lock around the wait is always necessary for correctness (or, at least, an operation that is atomic with the wait), and this is why the standard library enforces it and atomically unlocks the mutex on entering the wait.
POSIX condition variables are, unlike Windows events, not "idiot-proof" because they are stateless (apart from being aware of waiting threads). The recommendation to use a lock on the notify is there to protect you from the worst and most common screwups. You can build a Windows-like stateful event using a mutex + condition var + bool variable if you like, of course.
i want to know how it is possible to wait for a work to done and then continue and create new one
while(!stop)
{
CreateWork();
waitForWorkToDone();
}
wait must not block calling thread
how i can achive this?
To achieve this, you can rely on the operating system providing a facility to block until notified with or without a timeout. Thus, your thread correctly does not use unnecessary CPU cycles by performing a busy wait, but is still able to respond to program state changes. With POSIX threads, you can use a condition timed wait. I'll illustrate with the boost implementation, but the concept extends generally.
do
{
boost::unique_lock<boost::mutex> lock(state_change_mutex);
boost::system_time const timeout = boost::get_system_time() + boost::posix_time::seconds(5);
state_change_cond.timed_wait(lock,timeout);
...
} while(!done);
Overall this thread will loop until the done sentinel value becomes true. Other threads can signal this thread by calling
state_change_cond.notify_all();
Or in this example if no signal happens in 5 seconds then the thread wakes up by itself.
Note that condition variables require locking by mutexes. This is to guarantee that the thread is awoken atomically and that it will behave correctly in a mutually exclusive section as inter-thread signaling implicitly is.
How about Creating a Signal. Create a handler that creates CreateWork() and signals when the job is done! Just a Suggestion
void WorkHandler::addWork(Work* w){
printf("WorkHandler::insertWork Thread, insertWork locking \n");
lock();
printf("WorkHandler::insertWork Locked, and inserting into queue \n");
m_workQueue.push(w);
signal();
unLock();
}
I followed a tutorial and I got this. I was wondering if it is ok to change the order of singal() and unLock() like this
void WorkHandler::addWork(Work* w){
printf("WorkHandler::insertWork Thread, insertWork locking \n");
lock();
printf("WorkHandler::insertWork Locked, and inserting into queue \n");
m_workQueue.push(w);
unLock();
signal();
}
If I can't do this, could you please give me details why I am not allowed to do this?
Thanks in advance.
First, there is no correctness issue here. Either order will work. Recall that whenever you use condition variables, you must loop on a predicate while waiting:
pthread_mutex_lock(mutex);
while (!predicate)
pthread_cond_wait(cvar);
pthread_mutex_unlock(mutex);
By signalling after the unlock, you don't introduce any correctness issues; the thread is still guaranteed to wake up, and the worst case is another wakeup comes first - at which point it sees the predicate becomes true and proceeds.
However, there are two possible performance issues that can come up.
"Hurry up and wait". Basically, if you signal while the lock is held, the other thread still needs to wait until the mutex is available. Many pthreads implementations will, instead of waking up the other thread, simply move it to the wait queue of the mutex, saving an unnecessary wakeup->wait cycle. In some cases, however this is unimplemented or unavailable, leading to a potential spurious context switch or IPI.
Spurious wakeups. If you signal after the unlock, it's possible for another thread to issue another wakeup. Consider the following scenario:
Thread A starts waiting for items to be added to a threadsafe queue.
Thread B inserts an item on the queue. After unlocking the queue, but before it issues the signal, a context switch occurs.
Thread C inserts an item on the queue, and issues the cvar signal.
Thread A wakes up, and processes both items. It then goes back to waiting on the queue.
Thread B resumes, and signals the cvar.
Thread A wakes up, then immediately goes back to sleep, because the queue is empty.
As you can see, this can introduce a spurious wakeup, which might waste some CPU time.
Personally, I don't think it's worth worrying too much about it either way. You don't often know offhand whether your implementation supports moving waiters from the condition variable to the mutex wait queue, which is the only real criterion you could use to decide which to use.
My gut feeling would be that, if I had to choose, signalling after the unlock is marginally less likely to introduce an inefficiency, as the inefficiency requires a three-thread race, rather than a two-thread race for the "hurry up and wait" condition. However, this is not really worth worrying about, unless benchmarks show too much context switch overhead or something.
This article is really worth reading towards your question:
Signal with mutexed or not?
Assuming you use same mutex with conditional variable to make condition change to be atomic. There are two cases and you should know their behavior:
wait on signal (conditional var) while holding the mutex. The result is to let thread join the conditional var's queue and then go to sleep.
signaled but without mutex. For this case, the thread won't sleep but block on it. (A mistake I made on this is that I thought it will sleep too. In this case, if producer signals and context switch happens right before it releases mutex, then all threads will wake up and know they can't lock the mutex, go to sleep forever. This is wrong because they won't sleep but wait and block).
Pthreads are implemented with wait-morphing, that is, instead of waking up threads upon signaling, it just transfer threads on conditional variable to the attached mutex queue. So signal while locking is more preferable without too much performance impact.
For signaling before unlocking mutex, it may causes spurious wake-up. If your code is not well designed to handle predicate changes made by spurious wake-up, you should choose signal while holding the lock.
The answer to your question is "Yes". In fact, it's slightly preferable (as you've probably guessed) as it avoids the 'hurry up and wait' issue of waking up a thread to test a condition only to have it immediately block on the mutex it needs to acquire before testing the condition.
This answer is predicated on the guess that these things hold true:
lock is a thin wrapper for pthread_mutex_lock.
unLock is a thin wrapper for pthread_mutex_unlock.
signal is a thing wrapper for pthread_cond_signal.
The mutex your locking and unlocking is the one that your giving to pthread_cond_wait.
Is there any downside to calling pthread_cond_timedwait without taking a lock on the associated mutex first, and also not taking a mutex lock when calling pthread_cond_signal ?
In my case there is really no condition to check, I want a behavior very similar to Java wait(long) and notify().
According to the documentation, there can be "unpredictable scheduling behavior". I am not sure what that means.
An example program seems to work fine without locking the mutexes first.
The first is not OK:
The pthread_cond_timedwait() and
pthread_cond_wait() functions shall
block on a condition variable. They
shall be called with mutex locked by
the calling thread or undefined
behavior results.
http://opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html
The reason is that the implementation may want to rely on the mutex being locked in order to safely add you to a waiter list. And it may want to release the mutex without first checking it is held.
The second is disturbing:
if predictable scheduling behaviour is
required, then that mutex is locked by
the thread calling
pthread_cond_signal() or
pthread_cond_broadcast().
http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_signal.html
Off the top of my head, I'm not sure what the specific race condition is that messes up scheduler behaviour if you signal without taking the lock. So I don't know how bad the undefined scheduler behaviour can get: for instance maybe with broadcast the waiters just don't get the lock in priority order (or however your particular scheduler normally behaves). Or maybe waiters can get "lost".
Generally, though, with a condition variable you want to set the condition (at least a flag) and signal, rather than just signal, and for this you need to take the mutex. The reason is that otherwise, if you're concurrent with another thread calling wait(), then you get completely different behaviour according to whether wait() or signal() wins: if the signal() sneaks in first, then you'll wait for the full timeout even though the signal you care about has already happened. That's rarely what users of condition variables want, but may be fine for you. Perhaps this is what the docs mean by "unpredictable scheduler behaviour" - suddenly the timeslice becomes critical to the behaviour of your program.
Btw, in Java you have to have the lock in order to notify() or notifyAll():
This method should only be called by a
thread that is the owner of this
object's monitor.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#notify()
The Java synchronized {/}/wait/notifty/notifyAll behaviour is analogous to pthread_mutex_lock/pthread_mutex_unlock/pthread_cond_wait/pthread_cond_signal/pthread_cond_broadcast, and not by coincidence.
Butenhof's excellent "Programming with POSIX Threads" discusses this right at the end of chapter 3.3.3.
Basically, signalling the condvar without locking the mutex is a potential performance optimisation: if the signalling thread has the mutex locked, then the thread waking on the condvar has to immediately block on the mutex that the signalling thread has locked even if the signalling thread is not modifying any of the data the waiting thread will use.
The reason that "unpredictable scheduler behavior" is mentioned is that if you have a high-priority thread waiting on the condvar (which another thread is going to signal and wakeup the high priority thread), any other lower-priority thread can come and lock the mutex so that when the condvar is signalled and the high-priority thread is awakened, it has to wait on the lower-priority thread to release the mutex. If the mutex is locked whilst signalling, then the higher-priority thread will be scheduled on the mutex before the lower-priority thread: basically you know that that when you "awaken" the high-priority thread it will awaken as soon as the scheduler allows it (of course, you might have to wait on the mutex before signalling the high-priority thread, but that's a different issue).
The point of waiting on conditional variable paired with a mutex is to atomically enter wait and release the lock, i.e. allow other threads to modify the protected state, then again atomically receive notification of the state change and acquire the lock. What you describe can be done with many other methods like pipes, sockets, signals, or - probably the most appropriate - semaphores.
I think this should work (note untested code):
// initialize a semaphore
sem_t sem;
sem_init(&sem,
0, // not shared
0 // initial value of 0
);
// thread A
struct timespec tm;
struct timeb tp;
const long sec = msecs / 1000;
const long millisec = msecs % 1000;
ftime(&tp);
tp.time += sec;
tp.millitm += millisec;
if(tp.millitm > 999) {
tp.millitm -= 1000;
tp.time++;
}
tm.tv_sec = tp.time;
tm.tv_nsec = tp.millitm * 1000000;
// wait until timeout or woken up
errno = 0;
while((sem_timedwait(&sem, &tm)) == -1 && errno == EINTR) {
continue;
}
return errno == ETIMEDOUT; // returns true if a timeout occured
// thread B
sem_post(&sem); // wake up Thread A early
Conditions should be signaled outside of the mutex whenever possible. Mutexes are a necessary evil in concurrent programming. Their use leads to contention which robs the system of the maximum performance that it can gain from the use of multiple processors.
The purpose of a mutex is to guard access to some shared variables in the program so that they behave atomically. When a signaling operation is done inside a mutex, it causes an inclusion of hundreds of irrelevant machine cycles into the mutex which have nothing to do with guarding the shared data. Potentially, it calls from a user space all the way into a kernel.
The notes about "predictable scheduler behavior" in the standard are completely bogus.
When we want the machine to execute statements in a predictable, well-defined order, the tool for that is the sequencing of statements within a single thread of execution: S1 ; S2. Statement S1 is "scheduled" before S2.
We use threads when we realize that some actions are independent and their scheduling order is not important, and there are performance benefits to be realized, like more timely response to real time events or computing on multiple processors.
At times when scheduling orders do become important among multiple threads, this falls under a concept called priority. Priority resolves what happens first when any one of N statements could potentially be scheduled to execute. Another tool for ordering under multithreading is queuing. Events are placed into a queue by one or more threads and a single service thread processes the events in the queue order.
The bottom line is, the placement of pthread_cond_broadcast is not an appropriate tool for controlling execution order. It will not make execution order predictable in the sense that the program suddenly has exactly the same, reproducible behavior on every platform.
"unpredictable scheduling behavior" means just that. You don't know what's going to happen.
Nor do the implementation. It could work as expected. It could crash your app. It could work fine for years, then a race condition makes your app go monkey. It could deadlock.
Basically if any docs suggest anything undefined/unpredicatble can happen unless you do what the docs tell you to do, you better do it. Else stuff might blow up in your face.
(And it won't blow up until you put the code into production , just to annoy you even more. Atleast that's my experience)