What does prevent DOMTimerCoordinator::NextID from entering a endless loop? - c++

I had a look into the Blink codebase to answer this question about the maximum possible number of timers in JavaScript.
New timers are created by DOMTimerCoordinator::InstallNewTimeout(). It calls NextID() to retrieve an available integer key. Then, it inserts the new timer and the corresponding key into timers_.
int timeout_id = NextID();
timers_.insert(timeout_id, DOMTimer::Create(context, action, timeout,
single_shot, timeout_id));
NextID() gets the next id in a circular sequence from 1 to 231-1:
int DOMTimerCoordinator::NextID() {
while (true) {
++circular_sequential_id_;
if (circular_sequential_id_ <= 0)
circular_sequential_id_ = 1;
if (!timers_.Contains(circular_sequential_id_))
return circular_sequential_id_;
}
}
What happen if all the IDs are in use?
What does prevent NextID() from entering in a endless loop?
The whole process is explained with more detail in my answer to that question.

I needed a bit to understand this but I believe I got it.
These are the steps which turned it into sense for me.
circular_sequential_id_ is used as unique identifier. It's not exposed but from the other info I suspect it's an int with 32 bit (e.g. std::int32_t).
I suspect circular_sequential_id_ is a member variable of class (or struct) DOMTimerCoordinator. Hence, after each call of NextID() it “remembers” the last returned value. When NextID() is entered circular_sequential_id_ is incremented first:
++circular_sequential_id_;
The increment ++circular_sequential_id_; may sooner or later cause an overflow (Uuuh. If I remember right this is considered as Undefined Behavior but in real world it mostly simply wraps around.) and becomes negative. To handle this, the next line is good for:
if (circular_sequential_id_ <= 0)
circular_sequential_id_ = 1;
The last statement in loop checks whether the generated ID is still in use in any timer:
if (!timers_.Contains(circular_sequential_id_))
return circular_sequential_id_;
If not used the ID is returned. Otherwise, “Play it again, Sam.”
This brings me to the most reasonable answer:
Yes, this can become an endless loop...
...if 231 - 1 timers have been occupied and, hence, all IDs have been consumed.
I assume with 231 - 1 timers you have much more essential other problems. (Alone, imaging the storage that those timers may need and the time to handle all of them...)
Even if 231 - 1 timers are not a fatal problem, the function may cycle further until one of the timers releases it's ID and it can be occupied again. So, NextID() would be blocking if a resource (a free ID for a timer) is temporarily not available.
Thinking twice, the 2. option is rather theoretically. I cannot believe that somebody would manage limited resources this way.
I guess, this code works under assumption that there will never be 231 - 1 timers concurrently and hence it will find a free ID with a few iterations.

Related

Large performance difference between comparing a variable to a fixed value and reading or writing from mapped memory address

I'm developing a software that runs on a DE10 board, in an ARM Cortex-A9 processor.
This software has to access physical memory addresses in order to communicate with the FPGA in the DE10, and this is done mapping /dev/mem, this method is described here.
I have a situation where I have to select which of 4 addresses to send some values, and this could be done in 1 of 2 ways:
Using an if statement and checking a integer variable (which is always 0 or 1 at that part of the loop) and only write if it's 1.
Multiply the values that should be sent by the aforementioned variable and write on all addresses without any conditional, because writing zero doesn't have any effect on my system.
I was curious about which would be faster, so I tried this:
First, I made this loop:
int test=0;
for(int i=0;i<1000000;i++)
{
if(test==9)
{
test=15;
}
test++;
if(test==9)
{
test=0;
}
}
The first if statement should never be satisfied, so its only contribution to the time taken in the loop is from its comparison itself.
The increment and the second if statement are just things I added in an attempt to prevent the compiler from just "optimizing out" the first if statement.
This loop is ran once without being benchmarked (just in case there's any frequency scaling ramp, although I'm pretty sure it has none) and then its ran once again being benchmarked, and it takes around 18350 μs to complete.
Without the first if statement, it takes around 17260 μs
Now, If I change that first if statement by a line that sets the value of a memory-mapped address to the value of the integer test, like this:
for(int i=0;i<1000000;i++)
{
*(uint8_t*)address=test;
test++;
if(test==9)
{
test=0;
}
}
This loops takes around 253600 μs to complete, almost 14 x slower.
Reading that address instead of writing on it barely changes anything.
Is this what it really is, or is there some kind of compiler optimization possibly frustrating my benchmarking?
Should I expect this difference in performance (and thus favoring the comparison method) in the actual software?

Why my std::atomic<int> variable isn't thread-safe?

I don't know why my code isn't thread-safe, as it outputs some inconsistent results.
value 48
value 49
value 50
value 54
value 51
value 52
value 53
My understanding of an atomic object is it prevents its intermediate state from exposing, so it should solve the problem when one thread is reading it and the other thread is writing it.
I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.
I probably misunderstood what an atomic object is, Can someone explain?
void
inc(std::atomic<int>& a)
{
while (true) {
a = a + 1;
printf("value %d\n", a.load());
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
}
int
main()
{
std::atomic<int> a(0);
std::thread t1(inc, std::ref(a));
std::thread t2(inc, std::ref(a));
std::thread t3(inc, std::ref(a));
std::thread t4(inc, std::ref(a));
std::thread t5(inc, std::ref(a));
std::thread t6(inc, std::ref(a));
t1.join();
t2.join();
t3.join();
t4.join();
t5.join();
t6.join();
return 0;
}
I used to think I could use std::atomic without a mutex to solve the multi-threading counter increment problem, and it didn't look like the case.
You can, just not the way you have coded it. You have to think about where the atomic accesses occur. Consider this line of code …
a = a + 1;
First the value of a is fetched atomically. Let's say the value fetched is 50.
We add one to that value getting 51.
Finally we atomically store that value into a using the = operator
a ends up being 51
We atomically load the value of a by calling a.load()
We print the value we just loaded by calling printf()
So far so good. But between steps 1 and 3 some other threads may have changed the value of a - for example to the value 54. So, when step 3 stores 51 into a it overwrites the value 54 giving you the output you see.
As #Sopel and #Shawn suggest in the comments, you can atomically increment the value in a using one of the appropriate functions (like fetch_add) or operator overloads (like operator ++ or operator +=. See the std::atomic documentation for details
Update
I added steps 5 and 6 above. Those steps can also lead to results that may not look correct.
Between the store at step 3. and the call tp a.load() at step 5. other threads can modify the contents of a. After our thread stores 51 in a at step 3 it may find that a.load() returns some different number at step 5. Thus the thread that set a to the value 51 may not pass the value 51 to printf().
Another source of problems is that nothing coordinates the execution of steps 5. and 6. between two threads. So, for example, imagine two threads X and Y running on a single processor. One possible execution order might be this …
Thread X executes steps 1 through 5 above incrementing a from 50 to 51 and getting the value 51 back from a.load()
Thread Y executes steps 1 through 5 above incrementing a from 51 to 52 and getting the value 52 back from a.load()
Thread Y executes printf() sending 52 to the console
Thread X executes printf() sending 51 to the console
We've now printed 52 on the console, followed by 51.
Finally, there's another problem lurking at step 6. because printf() doesn't make any promises about what happens if two threads call printf() at the same time (at least I don't think it does).
On a multiprocessor system threads X and Y above might call printf() at exactly the same moment (or within a few ticks of exactly the same moment) on two different processors. We can't make any prediction about which printf() output will appear first on the console.
Note The documentation for printf mentions a lock introduced in C++17 "… used to prevent data races when multiple threads read, write, position, or query the position of a stream." In the case of two threads simultaneously contending for that lock we still can't tell which one will win.
Besides the increment of a being done non-atomically, the fetch of the value to display after the increment is non-atomic with respect to the increment. It is possible that one of the other threads increments a after the current thread has incremented it but before the fetch of the value to display. This would possibly result in the same value being shown twice, with the previous value skipped.
Another issue here is that the threads do not necessarily run in the order they have been created. Thread 7 could execute its output before threads 4, 5, and 6, but after all four threads have incremented a. Since the thread that did the last increment displays its output earlier, you end up with the output not being sequential. This is more likely to happen on a system with fewer than six hardware threads available to run on.
Adding a small sleep between the various thread creates (e.g., sleep_for(10)) would make this less likely to occur, but would still not eliminate the possibility. The only sure way to keep the output ordered is to use some sort of exclusion (like a mutex) to ensure only one thread has access to the increment and output code, and treat both the increment and output code as a single transaction that must run together before another thread tries to do an increment.
The other answers point out the non-atomic increment and various problems. I mostly want to point out some interesting practical details about exactly what we see when running this code on a real system. (x86-64 Arch Linux, gcc9.1 -O3, i7-6700k 4c8t Skylake).
It can be useful to understand why certain bugs or design choices lead to certain behaviours, for troubleshooting / debugging.
Use int tmp = ++a; to capture the fetch_add result in a local variable instead of reloading it from the shared variable. (And as 1202ProgramAlarm says, you might want to treat the whole increment and print as an atomic transaction if you insist on having your counts printed in order as well as being done properly.)
Or you might want to have each thread record the values it saw in a private data structure to be printed later, instead of also serializing threads with printf during the increments. (In practice all trying to increment the same atomic variable will serialize them waiting for access to the cache line; ++a will go in order so you can tell from the modification order which thread went in which order.)
Fun fact: a.store(1 + a.load(std:memory_order_relaxed), std::memory_order_release) is what you might do for a variable that was only written by 1 thread, but read by multiple threads. You don't need an atomic RMW because no other thread ever modifies it. You just need a thread-safe way to publish updates. (Or better, in a loop keep a local counter and just .store() it without loading from the shared variable.)
If you used the default a = ... for a sequentially-consistent store, you might as well have done an atomic RMW on x86. One good way to compile that is with an atomic xchg, or mov+mfence is as expensive (or more).
What's interesting is that despite the massive problems with your code, no counts were lost or stepped on (no duplicate counts), merely printing reordered. So in practice the danger wasn't encountered because of other effects going on.
I tried it on my own machine and did lose some counts. But after removing the sleep, I just got reordering. (I copy-pasted about 1000 lines of the output into a file, and sort -u to uniquify the output didn't change the line count. It did move some late prints around though; presumably one thread got stalled for a while.) My testing didn't check for the possibility of lost counts, skipped by not saving the value being stored into a, and instead reloading it. I'm not sure there's a plausible way for that to happen here without multiple threads reading the same count, which would be detected.
Store + reload, even a seq-cst store which has to flush the store buffer before it can reload, is very fast compared to printf making a write() system call. (The format string includes a newline and I didn't redirect output to a file so stdout is line-buffered and can't just append the string to a buffer.)
(write() system calls on the same file descriptor are serializing in POSIX: write(2) is atomic. Also, printf(3) itself is thread-safe on GNU/Linux, as required by C++17, and probably by POSIX long before that.)
Stdio locking in printf happens to be enough serialization in almost all cases: the thread that just unlocked stdout and left printf can do the atomic increment and then try to take the stdout lock again.
The other threads were all blocked trying to take the lock on stdout. One (other?) thread can wake up and take the lock on stdout, but for its increment to race with the other thread it would have to enter and leave printf and load a the first time before that other thread commits its a = ... seq-cst store.
This does not mean it's actually safe
Just that testing this specific version of the program (at least on x86) doesn't easily reveal the lack of safety. Interrupts or scheduling variations, including competition from other things running on the same machine, certainly could block a thread at just the wrong time.
My desktop has 8 logical cores so there were enough for every thread to get one, not having to get descheduled. (Although normally that would tend to happen on I/O or when waiting on a lock anyway).
With the sleep there, it is not unlikely for multiple threads to wake up at nearly the same time and race with each other in practice on real x86 hardware. It's so long that timer granularity becomes a factor, I think. Or something like that.
Redirecting output to a file
With stdout open on a non-TTY file, it's full-buffered instead of line-buffered, and doesn't always make a system call while holding the stdout lock.
(I got a 17MiB file in /tmp from hitting control-C a fraction of a second after running ./a.out > output.)
This makes it fast enough for threads to actually race with each other in practice, showing the expected bugs of duplicate values. (A thread reads a but loses ownership of the cache line before it stores (tmp)+1, resulting in two or more threads doing the same increment. And/or multiple threads reading the same value when they reload a after flushing their store buffer.)
1228589 unique lines (sort -u | wc) but total output of
1291035 total lines. So ~5% of the output lines were duplicates.
I didn't check if it was usually one value duplicated multiple times or if it was usually only one duplicate. Or how far backward the value ever jumped. If a thread happened to be stalled by an interrupt handler after loading but before storing val+1, it could be quite far. Or if it actually slept or blocked for some reason, it could rewind indefinitely far.

While loop with empty body checking volatile ints - what does this mean?

I am looking at a C++ class which has the following lines:
while( x > y );
return x - y;
x and y are member variables of type volatile int. I do not understand this construct.
I found the code stub here: https://gist.github.com/r-lyeh/cc50bbed16759a99a226. I guess it is not guaranteed to be correct or even work.
Since x and y have been declared as volatile, the programmer expects that they will be changed from outside the program.
In that case, your code will remain in the loop
while(x>y);
and will return the value x-y after the values are changed from outside such that x <= y. The exact reason why this is written can be guessed after you tell us more about your code and where you saw it. The while loop in this case is a waiting technique for some other event to occur.
It seems
while( x > y );
is a spinning loop. It won't stop until x <= y. As x and y are volatile, they may be changed outside of this routine. So, once x <= y becomes true, x - y will be returned. This technique is used to wait for some event.
Update
According to the gist you added, it seems the idea was to implement thread-safe lock-free circular buffer. Yes, the implementation is incorrect. For example, the original code-snippet is
unsigned count() const {
while( tail > head );
return head - tail;
}
Even if tail becomes less or equal to head, it is not guaranteed that head - tail returns positive number. The scheduler may switch execution to another thread immediately after the while loop, and that thread may change head value. Anyway, there are a lot of other issues related to how reading and writing to shared memory work (memory reordering etc.), so just ignore this code.
Other replies have already pointed out in detail what the instruction does, but just to recap that, since y (or head in the linked example) is declared as volatile changes made to that variable from a different thread will cause the while loop to finish once the condition has been met.
However, even though the linked code example is very short, it's a near perfect example of how NOT to write code.
First of all the line
while( tail > head );
will waste enormous amounts of CPU cycles, pretty much locking up one core until the condition has been met.
The code gets even better as we go along.
buffer[head++ % N] = item;
Thanks to JAB for pointing out i mistook post- with pre-increment here. Corrected the implications.
Since there are no locks or mutexes we obviously will have to assume the worst. The thread will switch after assigning the value in item and before head++ executes. Murphy will then call the function containing this statement again, assigning the value of item at the same head position.
After that head increments. Now we switch back to the first thread and increment head again. So instead of
buffer[original_value_of_head+1] = item_from_thread1;
buffer[original_value_of_head+2] = item_from_thread2;
we end up with
buffer[original_value_of_head+1] = item_from_thread2;
buffer[original_value_of_head+2] = whatever_was_there_previously;
You might get away with sloppy coding like this on the client side with few threads, but on the server side this could only be considered a ticking time bomb. Please use synchronisation constructs such as locks or mutexes instead.
And well, just for the sake of completeness, the line
while( tail > head );
in the method pop_back() should be
while( tail >= head );
unless you want to be able to pop one more element than you actually pushed in (or even pop one element before pushing anything in).
Sorry for writing what basically boils down to a long rant, but if this keeps just one person from copying and pasting that obscene code it was worth while.
Update: Thought i might as well give an example where a code like while(x>y); actually makes perfect sense.
Actually you used to see code like that fairly often in the "good old" days. cough DOS.
Was´nt used in the context of threading though. Mainly as a fallback in case registering an interrupt hook was not possible (you kids might translate that as "not possible to register an event handler").
startAsynchronousDeviceOperation(..);
That might be pretty much anything, e.g. tell the hardisk to read data via DMA, or tell the soundcard to record via DMA, possibly even invoke functions on a different processor (like the GPU). Typically initiated via something like outb(2).
while(byteswritten==0); // or while (status!=DONE);
If the only communication channel with a device is shared memory, then so be it. Wouldnt expect to see code like that nowadays outside of device drivers and microcontrollers though. Obviously assumes the specs state that memory location is the last one written to.
The volatile keyword is designed to prevent certain optimisations. In this case, without the keyword, the compiler could unroll your while loop into a concrete sequence of instructions which will obviously break in reality since the values could be modified externally.
Imagine the following:
int i = 2;
while (i-- > 0) printf("%d", i);
Most compilers will look at this and simply generate two calls to printf - adding the volatile keyword will result in it instead generating the CPU instructions that invoke a counter set to 2 and a check on the value after every iteration.
For example,
volatile int i = 2;
this_function_runs_on_another_process_and_modifies_the_value(&i);
while(i-- > 0) printf("%d", i);

Concurrent/Asynchronous access to shared data

I searched around a bit, but could not find anything useful. Could someone help me with this concurrency/synchronization problem?
Given five instances of the program below running asynchronously, with s being a shared data with an initial value of 0 and i a local variable, which values are obtainable by s?
for (i = 0; i < 5; i ++) {
s = s + 1;
}
2
1
6
I would like to know which values, and why exactly.
The non-answering answer is: Uaaaagh, don't do such a thing.
An answer more in the sense of your question is: In principle, any value is possible, because it is totally undefined. You have no strict guarantee that concurrent writes are atomic in any way and don't result in complete garbage.
In practice, less-than-machine-word sized writes are atomic everywhere (for what I know, at least), but they do not have a defined order. Also, you usually don't know in which order threads/processes are scheduled. So you will never see a "random garbage" value, but also you cannot know what it will be. It will be anything 5 or higher (up to 25).
Since no atomic increment is used, there is a race between reading the value, incrementing it, and writing it back. If the value is being written by another instance before the result is written back, the write (and thus increment) that finished earlier has no effect. If that does not happen, both increments are effective.
Nevertheless, each instance increments the value at least 5 times, so apart from the theoretical "total garbage" possibility, there is no way a value less than 5 could result in the end. Therefore (1) and (2) are not possible, but (3) is.

pthreads - previously created thread uses new value (updated after thread creation)

So here's my scenario. First, I have a structure -
struct interval
{
double lower;
double higher;
}
Now my thread function -
void* thread_function(void* i)
{
interval* in = (interval*)i;
double a = in->lower;
cout << a;
pthread_exit(NULL)
}
In main, let's say I create these 2 threads -
pthread_t one,two;
interval i;
i.lower = 0; i.higher = 5;
pthread_create(&one,NULL,thread_function,&i);
i.lower=10; i.higher = 20;
pthread_create(&two,NULL,thread_function, &i);
pthread_join(one,NULL);
pthread_join(two,NULL);
Here's the problem. Ideally, thread "one" should print out 0 and thread "two" should print out 10. However, this doesn't happen. Occasionally, I end up getting two 10s.
Is this by design? In other words, by the time the thread is created, the value in i.lower has been changed already in main, therefore both threads end up using the same value?
Is this by design?
Yes. It's unspecified when exactly the threads start and when they will access that value. You need to give each one of them their own copy of the data.
Your application is non-deterministic.
There is no telling when a thread will be scheduled to run.
Note: By creating a thread does not mean it will start executing immediately (or even first). The second thread created may actually start running before the first (it is all dependant on the OS and hardware).
To get deterministic behavior each thread must be given its own data (that is not modified by the main thread).
pthread_t one,two;
interval oneData,twoData
oneData.lower = 0; oneData.higher = 5;
pthread_create(&one,NULL,thread_function,&oneData);
twoData.lower=10; twoData.higher = 20;
pthread_create(&two,NULL,thread_function, &twoData);
pthread_join(one,NULL);
pthread_join(two,NULL);
I would not call it by design.
I would rather refer to it as a side-effect of scheduling policy. But the observed behavior is what I would expect.
This is the classic 'race condition'; where the results vary depending on which thread wins the 'race'. You have no way of knowing which thread will 'win' each time.
Your analysis of the problem is correct; you simply don't have any guarantees that the first thread created will be able to read i.lower before the data is changed on the next line of your main function. This is in some sense the heart of why it can be hard to think about multithreaded programming at first.
The straight forward solution to your immediate problem is to keep different intervals with different data, and pass a separate one to each thread, i.e.
interval i, j;
i.lower = 0; j.lower = 10;
pthread_create(&one,NULL,thread_function,&i);
pthread_create(&two,NULL,thread_function,&j);
This will of course solve your immediate problem. But soon you'll probably wonder what to do if you want multiple threads actually using the same data. What if thread 1 wants to make changes to i and thread 2 wants to take these into account? It would hardly be much point in doing multithreaded programming if each thread would have to keep its memory separate from the others (well, leaving message passing out of the picture for now). Enter mutex locks! I thought I'd give you a heads up that you'll want to look into this topic sooner rather than later, as it'll also help you understand the basics of threads in general and the required change in mentality that goes along with multithreaded programming.
I seem to recall that this is a decent short introduction to pthreads, including getting started with understanding locking etc.