condition_variable workaround for wait_until with system time change

condition_variable workaround for wait_until with system time change - c++

I have a timer class which uses the std::condition_variable wait_until (I have also tried wait_for). I am using the std::chrono::steady_clock time to wait until a specific time in the future.
This is meant to be monotonic, but there has been a long standing issue with this where this actually uses the system clock and fails to work correctly when the system time is changed.
It has been fixed in libc as suggested here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861.
the issue is that this is still pretty new ~2019 and only available in gcc version 10. I have some cross compilers that are only up to gcc version ~8.
I am wandering if there is a way to get this fix into my versions (I have quite a few cross compilers) of gcc? - but this might prove difficult to maintain if I have to re-build the cross compilers each time I update them or such.
So a better question might be, what is a solution to this issue until I can get all my tools up to gcc v10? - how can I make my timer resistant to system time changes?
updated notes
Rustyx mentions the version of glibc needed is 2.3.0+ (more for my ref - use ldd --version to check that)
glibc change log showing the relevant entry for the fix supplied by Daniel Langr: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/ChangeLog-2019#L2093
The required glibc patch (supplied by rustyx): https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=ad4d1d21ad5c515ba90355d13b14cbb74262edd2

Create a data structure that contains a list of condition variables each with a use count protected by a mutex.
When a thread is about to block on a condition variable, first acquire the mutex and add the condition variable to the list (or bump its use count if it's already on the list).
When done blocking on the condition variable, have the thread again acquire the mutex that protects the list and decrement the use count of the condition variable it was blocked on. Remove the condition variable from the list if its use count drops to zero.
Have a dedicated thread to watch the system clock. If it detects a clock jump, acquire the mutex that protects the list of condition variables and broadcast every condition variable.
That's it. That solves the problem.
If necessary, you can also add a boolean to each entry in the table and set it to false when the entry is added. If the clock watcher thread hast broadcast the condition variable, have it set the bool to true so the woken threads will know why they were woken.
If you wish, you can just add the condition variable to the list when it's created and remove it from the list when it's destroyed. This will result in broadcasting condition variables no threads are blocked on if the clock jumps, but that's harmless.
Here are some implementation suggestions:
Use a dedicated thread to watch the clock. An easy thing to look at is the offset between wall time and the system's uptime clock.
One simple thing to do is to keep a count of the number of time jumps observed and increment it each time you sense a time jump. When you wait for a condition, you can use the following logic:
Note the number of time jumps.
Block on the condition.
When you wake up, recheck the condition.
If the condition isn't satisfied, check the number of time jumps.
If the count from 1 and 4 mismatch, handle it as a time jump wakeup.
You can wrap this all up so that there's no ugliness in the calling code. It just becomes another possible return value from your version of wait_for.

Related

Is it possible that a store with memory_order_relaxed never reaches other threads?

Suppose I have a thread A that writes to an atomic_int x = 0;, using x.store(1, std::memory_order_relaxed);. Without any other synchronization methods, how long would it take before other threads can see this, using x.load(std::memory_order_relaxed);? Is it possible that the value written to x stays entirely thread-local given the current definition of the C/C++ memory model that the standard gives?
The practical case that I have at hand is where a thread B reads an atomic_bool frequently to check if it has to quit; Another thread, at some point, writes true to this bool and then calls join() on thread B. Clearly I do not mind to call join() before thread B can even see that the atomic_bool was set, nor do I mind when thread B already saw the change and exited execution before I call join(). But I am wondering: using memory_order_relaxed on both sides, is it possible to call join() and block "forever" because the change is never propagated to thread B?
Edit
I contacted Mark Batty (the brain behind mathematically verifying and subsequently fixing the C++ memory model requirements). Originally about something else (which turned out to be a known bug in cppmem and his thesis; so fortunately I didn't make a complete fool of myself, and took the opportunity to ask him about this too; his answer was:
Q: Can it theoretically be that such a store [memory_order_relaxed without (any following) release operation] never reaches the other thread?
Mark: Theoretically, yes, but I don't think that has been observed.
Q: In other words, do relaxed stores make no sense
whatsoever unless you combine them with some release operation (and
acquire on the other thread), assuming you want another thread to
see it?
Mark: Nearly all of the use cases for them do use release and acquire, yes.

This is all the standard has to say on the matter, I believe:
[intro.multithread]/25 An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

In practice
Without any other synchronization methods, how long would it take
before other threads can see this, using
x.load(std::memory_order_relaxed);?
No time. It's a normal write, it goes to the store buffer, so it will be available in the L1d cache in less time than a blink. But that's only when the assembly instruction is run.
Instructions can be reordered by the compiler, but no reasonable compiler would reorder atomic operation over arbitrarily long loops.
In theory
Q: Can it theoretically be that such a store [memory_order_relaxed
without (any following) release operation] never reaches the other
thread?
Mark: Theoretically, yes,
You should have asked him what would happen if the "following release fence" was added back. Or with atomic store release operation.
Why wouldn't these be reordered and delayed a loooong time? (so long that it seems like an eternity in practice)
Is it possible that the value written to x stays entirely thread-local
given the current definition of the C/C++ memory model that the
standard gives?
If an imaginary and especially perverse implementation wanted to delay the visibility of atomic operation, why would it do that only for relaxed operations? It could well do it for all atomic operations.
Or never run some threads.
Or run some threads so slowly that you would believe they aren't running.

This is what the standard says in 29.3.12:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
There is no guarantee a store will become visible in another thread, there is no guaranteed timing and there is no formal relationship with memory order.
Of course, on each regular architecture a store will become visible, but on rare platforms that do not support cache coherency, it may never become visible to a load.
In that case, you would have to reach for an atomic read-modify-write operation to get the latest value in the modification order.

What is the best architecture to frequently communicate values between multiple threads?

I am writing an application in C++14 that consists of a master thread and multiple slave threads. The master thread coordinates the slave threads which coordinately perform a search, each exploring a part of the search space. A slave thread sometimes encounters a bound on the search. Then it communicates this bound to the master thread which sends the bound to all other slave threads so that they can possibly narrow their searches.
A slave thread must very frequently check whether there is a new bound available, possibly at the entrance of a loop.
What would be the best way to communicate the bound to the slave threads? I can think of using std::atomic<int>, but I am afraid of the performance implications this has whenever the variable is read inside the loop.

The simplest way here is IMO to not overthink this. Just use a std::mutex for each thread, protecting a std::queue that the boundary information is in. Have the main thread wait on a std::condition_variable that each child can lock, write to a "new boundary" queue , then signals te cv, which the main thread then wakes up and copies the value to each child one at at time. As you said in your question, at the top of their loops, the child threads can check their thread-specific queue to see if there's additional bounding conditions.
You actually don't NEED the "main thread" in this. You could have the children write to all other children's queues directly (still mutex-protected), as long as you're careful to avoid deadlock, it would work that way too.
All of these classes can be seen in the thread support library, with decent documentation here.
Yes there's interrupt-based ways of doing things, but in this case polling is relatively cheap because it's not a lot of threads smashing on one mutex, but mostly thread-specific mutexes, and mutexes aren't all that expensive to lock, check, unlock quickly. You're not "holding" on to them for long periods, and thus it's OK. It's a bit of a test really: do you NEED the additional complexity of lock-free? If it's only a dozen (or less) threads, then probably not.

Basically you could make a bet with your architecture that a single write to a primitive datatype is atomic. As you only have one writer, your program would not break if you use the volatile keyword to prevent compiler optimizations that might perform updates to it only in local caches.
However everybody serious about doing things right(tm) will tell you otherwise. Have a look at this article to get a pretty good riskassessment: http://preshing.com/20130618/atomic-vs-non-atomic-operations/
So if you want to be on the safe side, which I recommend, you need to follow the C++ standard. As the C++ standard does not guarantee any atomicity even for the simplest operations, you are stuck with using std::atomic. But honestly, I don't think it is too bad. Sure there is a lock involved, but you can balance out the reading frequency with the benefit of knowing the new boundary early.
To prevent polling the atomic variable, you could use the POSIX signal mechanism to notify slave threads of an update (make sure it works with the platform you are programming for). If that benefits performance or not needs to be seen.

This is actually very simple. You only have to be aware of how things work to be confident the simple solution is not broken. So, what you need is two things:
1. Be sure the variable is written/read to/from memory every time you access it.
2. Be sure you read it in an atomic way, which means you have to read the full value in one go, or if it is not done naturally, have a cheap test to verify it.
To address #1, you have to declare it volatile. Make sure the volatile keyword is applied to the variable itself. Not it's pointer of anything like that.
To address #2, it depends on the type. On x86/64 accesses to integer types is atomic as long as they are aligned to their size. That is, int32_t has to be aligned to 4 bit boundary, and int64_t has to be aligned to 8 byte boundary.
So you may have something like this:
struct Params {
volatile uint64_t bound __attribute__((aligned(8)));
};
If your bounds variable is more complex (a struct) but still fits in 64 bits, you may union it with uint64_t and use the same attribute and volatile as above.
If it's too big for 64 bit, you will need some sort of a lock to ensure you did not read half stale value. The best lock for your circumstances (single writer, multiple readers) is a sequence lock. A sequence lock is simply an volatile int, like above, that serves as the version of the data. Its value starts from 0 and advances 2 on every update. You increment it by 1 before updating the protected value, and again afterwards. The net result is that even numbers are stable states and odd numbers are transient (value updating). In the readers you do this:
1. Read the version. If not changed - return
2. Read till you get an even number
3. Read the protected variable
4. Read the version again. If you get the same number as before - you're good
5. Otherwise - back to step 2
This is actually one of the topics in my next article. I'll implement that in C++ and let you know. Meanwhile, you can look at the seqlock in the linux kernel.
Another word of caution - you need compiler barriers between your memory accesses so that the compiler does not reorder things it should really not. That's how you do it in gcc:
asm volatile ("":::"memory");

What could happen if two threads access the same bool variable at the same time?

I have a cross platform c++ program where I'm using the boost libraries to create an asynchronous timer.
I have a global variable:
bool receivedInput = false;
One thread waits for and processes input
string argStr;
while (1)
{
getline(cin, argStr);
processArguments(argStr);
receivedInput = true;
}
The other thread runs a timer where a callback gets called every 10 seconds. In that callback, I check to see if I've received a message
if (receivedInput)
{
//set up timer to fire again in 10 seconds
receivedInput = false;
}
else
exit(1);
So is this safe? For the read in thread 2, I think it wouldn't matter since the condition will evaluate to either true or false. But I'm unsure what would happen if both threads try to set receivedInput at the same time. I also made my timer 3x longer than the period I expect to receive input so I'm not worried about a race condition.
Edit:
To solve this I used boost::unique_lock when I set receivedInput and boost::shared_lock when I read receivedInput. I used an example from here

This is fundamentally unsafe. After thread 1 has written true to receivedInput it isn't guaranteed that thread 2 will see the new value. For example, the compiler may optimize your code making certain assumptions about the value of receivedInput at the time it is used as the if condition or caching it in a register, so you are not guaranteed that main memory will actually be read at the time the if condition is evaluated. Also, both compiler and CPU may change the order of reads and writes for optimization, for example true may be written to receivedInput before getLine() and processArguments().
Moreover, relying on timing for synchronization is a very bad idea since often you have no guarantees as to the amount of CPU time each thread will get in a given time interval or whether it will be scheduled in a given time interval at all.
A common mistake is to think that making receivedInput volatile may help here. In fact, volatile guarantees that values are actually read/written to the main memory (instead of for example being cached in a register) and that reads and writes of the variable are ordered with respect to each other. However, it does not guarantee that the reads and writes of the volatile variable are ordered with respect to other instructions.
You need memory barriers or a proper synchronization mechanism for this to work as you expect.

You would have to check your threading standard. Assuming we're talking about POSIX threads, this is explicitly undefined behavior -- an object may not be accessed by one thread while another thread is or might be modifying it. Anything can happen.

If your threads use the value of receivedInput to control independent code blocks, but not to synchronize with each other, there is one simple solution:
add "volatile" before receivedInput, so the compiler will not do the optimization preventing the threads share the value of receivedInput.

How do I safely read a variable from one thread and modify it from another?

I have a class instances which is being used in multiple threads. I am updating multiple member variables from one thread and reading the same member variables from one thread. What is the correct way to maintain the thread safety?
eg:
phthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
//unlock here?
Should I unlock the mutex over here? ( if another thread access the obj1 member variables 1 and 2 now, the accessed data might not be correct because memberV2 has not yet be updated. However, if I does not release the lock, the other thread might block because there is time consuming operation below.
//perform some time consuming operation which must be done before the assignment to memberV2 and after the assignment to memberV1
obj1.memberV2 = update field 2 from some calculation
pthread_mutex_unlock(&mutex1) //should I only unlock here?
Thanks

Your locking is correct. You should not release the lock early just to allow another thread to proceed (because that would allow the other thread to see the object in an inconsistent state.)

Perhaps it would be better to do something like:
//perform time consuming calculation
pthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
obj1.memberV2 = result;
pthread_mutex_unlock(&mutex1)
This of course assumes that the values used in the calculation won't be modified on any other thread.

Its hard to tell what you are doing that is causing problems. The mutex pattern is pretty simple. You Lock the mutex, access the shared data, unlock the mutex. This protects data, becuase the mutex will only let one thread get the lock at a time. Any thread that fails to get the lock has to wait till the mutex is unlocked. Unlocking wakes the waiters up. They will then fight to attain the lock. Losers go back to sleep. The time it takes to wake up might be multiple ms or more from the time the lock is released. Make sure you always unlock the mutex eventually.
Make sure you don't to keep locks locked for a long period of time. Most of the time, a long period of time is like a micro second. I prefer to keep it down around "a few lines of code." Thats why people have suggested that you do the long running calculation outside the lock. The reason for not keeping locks a long time is you increase the number of times other threads will hit the lock and have to spin or sleep, which decreases performance. You also increase the probability that your thread might be pre-empted while owning the lock, which means the lock is enabled while that thread sleeps. Thats even worse performance.
Threads that fail a lock dont have to sleep. Spinning means a thread encountering a locked mutex doesn't sleep, but loops repeatedly testing the lock for a predefine period before giving up and sleeping. This is a good idea if you have multiple cores or cores capable of multiple simultaneous threads. Multiple active threads means two threads can be executing the code at the same time. If the lock is around a small amount of code, then the thread that got the lock is going to be done real soon. the other thread need only wait a couple nano secs before it will get the lock. Remember, sleeping your thread is a context switch and some code to attach your thread to the waiters on the mutex, all have costs. Plus, once your thread sleeps, you have to wait for a period of time before the scheduler wakes it up. that could be multiple ms. Lookup spinlocks.
If you only have one core, then if a thread encounters a lock it means another sleeping thread owns the lock and no matter how long you spin it aint gonna unlock. So you would use a lock that sleeps a waiter immediately in hopes that the thread owning the lock will wake up and finish.
You should assume that a thread can be preempted at any machine code instruction. Also you should assume that each line of c code is probably many machine code instructions. The classic example is i++. This is one statement in c, but a read, an increment, and a store in machine code land.
If you really care about performance, try to use atomic operations first. Look to mutexes as a last resort. Most concurrency problems are easily solved with atomic operations (google gcc atomic operations to start learning) and very few problems really need mutexes. Mutexes are way way way slower.
Protect your shared data wherever it is written and wherever it is read. else...prepare for failure. You don't have to protect shared data during periods of time when only a single thread is active.
Its often useful to be able to run your app with 1 thread as well as N threads. This way you can debug race conditions easier.
Minimize the shared data that you protect with locks. Try to organize data into structures such that a single thread can gain exclusive access to the entire structure (perhaps by setting a single locked flag or version number or both) and not have to worry about anything after that. Then most of the code isnt cluttered with locks and race conditions.
Functions that ultimately write to shared variables should use temp variables until the last moment and then copy the results. Not only will the compiler generate better code, but accesses to shared variables especially changing them cause cache line updates between L2 and main ram and all sorts of other performance issues. Again if you don't care about performance disregard this. However i recommend you google the document "everything a programmer should know about memory" if you want to know more.
If you are reading a single variable from the shared data you probably don't need to lock as long as the variable is an integer type and not a member of a bitfield (bitfield members are read/written with multiple instructions). Read up on atomic operations. When you need to deal with multiple values, then you need a lock to make sure you didn't read version A of one value, get preempted, and then read version B of the next value. Same holds true for writing.
You will find that copies of data, even copies of entire structures come in handy. You can be working on building a new copy of the data and then swap it by changing a pointer in with one atomic operation. You can make a copy of the data and then do calculations on it without worrying if it changes.
So maybe what you want to do is:
lock the mutex
Make a copy of the input data to the long running calculation.
unlock the mutex
L1: Do the calculation
Lock the mutex
if the input data has changed and this matters
read the input data, unlock the mutex and go to L1
updata data
unlock mutex
Maybe, in the example above, you still store the result if the input changed, but go back and recalc. It depends if other threads can use a slightly out of date answer. Maybe other threads when they see that a thread is already doing the calculation simply change the input data and leave it to the busy thread to notice that and redo the calculation (there will be a race condition you need to handle if you do that, and easy one). That way the other threads can do other work rather than just sleep.
cheers.

Probably the best thing to do is:
temp = //perform some time consuming operation which must be done before the assignment to memberV2
pthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
obj1.memberV2 = temp; //result from previous calculation
pthread_mutex_unlock(&mutex1)

What I would do is separate the calculation from the update:
temp = some calculation
pthread_mutex_lock(&mutex1);
obj.memberV1 = 1;
obj.memberV2 = temp;
pthread_mutex_unlock(&mutex1);

Why do condition variables sometimes erroneously wake up?

I've known for eons that the way you use a condition variable is
lock
while not task_done
wait on condition variable
unlock
Because sometimes condition variables will spontaneously wake. But I've never understood why that's the case. In the past I've read it's expensive to make a condition variable that doesn't have that behavior, but nothing more than that.
So... why do you need to worry about falsely being woken up when waiting on a condition variable?

It isn't that the condition variable will erroneously wake up; the condition variable will only wake up if it has been signalled from another thread. However, it is possible that by the time the thread has been re-scheduled for execution, some other thread has already managed to nab the resource on which you were waiting, and so it is necessary to double-check. For example, if a group of threads x,y,z are waiting on some resource R that w was previously holding, and x,y,z,w communicate through a condition variable... suppose w is done with R and signals x,y,z. So, x,y, and z will all be taken off of the wait queue and placed in the runqueue to be scheduled for execution. Suppose x is scheduled first... so then it acquires R, and then it might be put to sleep, and then y might be scheduled, and so when y is running, the resource R on which y was previously waiting is still not available, so it is necessary for y to go to sleep again. Then z wakes up, and z also finds that R is still in use, so z needs to go back to sleep again, etc.
If you have exactly two threads, and the condition variable is shared between just the two of them, there are sometimes situations where it is ok to not perform that check. However, if you want to make your application dynamic and capable of scaling up to an arbitrary number of threads, then it's good to be in the habit (not to mention much simpler and less worrisome) to do that extra check as it is required in most situations.

Threads can wake up without being signaled. This is called a spurious wakeup. However, just precisely why they occur is a question that seems to be mired in superstition and uncertainty. Reasons I have seen include being a side effect of the way threading implementations work, or being intentionally added to force programmers to properly use loops instead of conditionals around wait.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js