Concurrent incrementing of a int variable - c++

A question from a job-interview
int count = 0;
void func1()
{
for ( int i =0 ; i < 10; ++i )
count = count + 1;
}
void func2()
{
for ( int i =0 ; i < 10; ++i )
count++;
}
void func3()
{
for ( int i =0 ; i < 10; ++i )
++count;
}
int main()
{
thread(func1);
thread(func2);
thread(func3);
//joining all the threads
return 0;
}
The question is: what's the range of values count might theoreticaly take? The upper bound apparently is 30, but what's the lower one? They told me it's 10, but i'm not sure about it. Otherwise, why do we need memory barriers?
So, what's the lower bound of the range?

It's undefined behavior, so count could take on any value
imaginable. Or the program could crash.

James Kanze's answer is the right one for all practical purposes, but in this particular case, if the code is exactly as written and the thread used here is std::thread from C++11, the behavior is actually defined.
In particular, thread(func1); will start a thread running func1. Then, at the end of the expression, the temporary thread object will be destroyed, without join or detach having been called on it. So the thread is still joinable, and the standard defines that in such a case, the destructor calls std::terminate. (See [thread.thread.destr]: "If joinable() then terminate(), otherwise no effects.") So your program aborts.
Because this happens before the second thread is even started, there is no actual race condition - the first thread is the only one that ever touches count, if it even gets that far.

Starting with the easy part, the obvious upper bound is 30 since, if everything goes right, you have 3 calls to functions; each capable of incrementing count 10 times. Overall: 3*10=30.
Ad to the lower bound, they are correct and this is why - the worst-case scenario is that each time that one thread tries to increment count, the other threads will be doing so at exactly the same time. Keeping in mind that ++count actually is the following pseudo code:
count_temp = count;
count_temp = count_temp+1;
count = count_temp;
It should be obvious that if they all perform the same code at the same time, you have only 10 real increments, since they all read the same initial value of count and all write back the same added value.

First of all, I'd like to thank you guys for giving me reason me to read the standard in depth. I would not be able to continue this debate otherwise.
The standard states quite clearly in section 1.10 clause 21: The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
However, the term undefined behavior is also defined in the standard, section 1.3.24: behavior for which this International Standard imposes no requirements... Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment...
Taking Sebasian's answer regarding std::terminate into account, and working under the assumption that these threads will not throw an exception thereby causing premature termination; while the standard doesn't define the result - it is fairly evident what it may be because of the simplicity of the algorithm. In other words, while the 100% accurate answer would be that the result is undefined - I still maintain that the range of possible outcomes is well defined and is 10-30 due to the characteristic of the environment.
BTW - I really wanted to make this a comment instead of another answer, however it was too long

Related

Question regarding race conditions while multithreading

So I'm reading through a book an in this chapter that goes over multithreading and concurrency they gave me a question that does not really make sense to me.
I'm suppose to create 3 functions with param x that simply calculates x * x; one using mutex, one using atomic types, and one using neither. And create 3 global variables holding the values.
The first two functions will prevent race conditions but the third might not.
After that I create N threads and then loop through and tell each thread to calculate it's x function (3 separate loops, one for each function. So I'm creating N threads 3 times)
Now the book tells me that using function 1 & 2 I should always get the correct answer but using function 3 I won't always get the right answer. However, I am always getting the right answer for all of them. I assume this is because I am just calculating x * x which is all it does.
As an example, when N=3, the correct value is 0 * 0 + 1 * 1 + 2 * 2 = 5.
this is the atomic function:
void squareAtomic(atomic<int> x)
{
accumAtomic += x * x;
}
And this is how I call the function
thread threadsAtomic[N]
for (int i = 0; i < N; i++) //i will be the current thread that represents x
{
threadsAtomic[i] = thread(squareAtomic, i);
}
for (int i = 0; i < N; i++)
{
threadsAtomic[i].join();
}
This is the function that should sometimes create race conditions:
void squareNormal(int x)
{
accumNormal += x * x;
}
Heres how I call that:
thread threadsNormal[N];
for (int i = 0; i < N; i++) //i will be the current thread that represents x
{
threadsNormal[i] = thread(squareNormal, i);
}
for (int i = 0; i < N; i++)
{
threadsNormal[i].join();
}
This is all my own code so I might not be doing this question correctly, and in that case I apologize.
One problem with race conditions (and with undefined behavior in general) is that their presence doesn't guarantee that your program will behave incorrectly. Rather, undefined behavior only voids the guarantee that your program will behave according to rules of the C++ language spec. That can make undefined behavior very difficult to detect via empirical testing. (Every multithreading-programmer's worst nightmare is the bug that was never seen once during the program's intensive three-month testing period, and only appears in the form of a mysterious crash during the big on-stage demo in front of a live audience)
In this case your racy program's race condition comes in the form of multiple threads reading and writing accumNormal simultaneously; in particular, you might get an incorrect result if thread A reads the value of accumNormal, and then thread B writes a new value to accumNormal, and then thread A writes a new value to accumNormal, overwriting thread B's value.
If you want to be able to demonstrate to yourself that race conditions really can cause incorrect results, you'd want to write a program where multiple threads hammer on the same shared variable for a long time. For example, you might have half the threads increment the variable 1 million times, while the other half decrement the variable 1 million times, and then check afterwards (i.e. after joining all the threads) to see if the final value is zero (which is what you would expect it to be), and if not, run the test again, and let that test run all night if necessary. (and even that might not be enough to detect incorrect behavior, e.g. if you are running on hardware where increments and decrements are implemented in such a way that they "just happen to work" for this use case)

How are these two pieces of code different?

I tried these lines of code and found out shocking output. I am expecting some reason related to initialisation either in general or in for loop.
1.)
int i = 0;
for(i++; i++; i++){
if(i>10) break;
}
printf("%d",i);
Output - 12
2.)
int i;
for(i++; i++; i++){
if(i>10) break;
}
printf("%d",i);
Output - 1
I expected the statements "int i = 0" and "int i" to be the same.What is the difference between them?
I expected the statements "int i = 0" and "int i" to be the same.
No, that was a wrong expectation on your part. If a variable is declared outside of a function (as a "global" variable), or if it is declared with the static keyword, it's guaranteed to be initialized to 0 even if you don't write = 0. But variables defined inside functions (ordinary "local" variables without static) do not have this guaranteed initialization. If you don't explicitly initialize them, they start out containing indeterminate values.
(Note, though, that in this context "indeterminate" does not mean "random". If you write a program that uses or prints an uninitialized variable, often you'll find that it starts out containing the same value every time you run your program. By chance, it might even be 0. On most machines, what happens is that the variable takes on whatever value was left "on the stack" by the previous function that was called.)
See also these related questions:
Non-static variable initialization
Static variable initialization?
See also section 4.2 and section 4.3 in these class notes.
See also question 1.30 in the C FAQ list.
Addendum: Based on your comments, it sounds like when you fail to initialize i, the indeterminate value it happens to start out with is 0, so your question is now:
"Given the program
#include <stdio.h>
int main()
{
int i; // note uninitialized
printf("%d\n", i); // prints 0
for(i++; i++; i++){
if(i>10) break;
}
printf("%d\n", i); // prints 1
}
what possible sequence of operations could the compiler be emitting that would cause it to compute a final value of 1?"
This can be a difficult question to answer. Several people have tried to answer it, in this question's other answer and in the comments, but for some reason you haven't accepted that answer.
That answer again is, "An uninitialized local variable leads to undefined behavior. Undefined behavior means anything can happen."
The important thing about this answer is that it says that "anything can happen", and "anything" means absolutely anything. It absolutely does not have to make sense.
The second question, as I have phrased it, does not really even make sense, because it contains an inherent contradiction, because it asks, "what possible sequence of operations could the compiler be emitting", but since the program contains Undefined behavior, the compiler isn't even obliged to emit a sensible sequence of operations at all.
If you really want to know what sequence of operations your compiler is emitting, you'll have to ask it. Under Unix/Linux, compile with the -S flag. Under other compilers, I don't know how to view the assembly-language output. But please don't expect the output to make any sense, and please don't ask me to explain it to you (because I already know it won't make any sense).
Because the compiler is allowed to do anything, it might be emitting code as if your program had been written, for example, as
#include <stdio.h>
int main()
{
int i; // note uninitialized
printf("%d\n", i); // prints 0
i++;
printf("%d\n", i); // prints 1
}
"But that doesn't make any sense!", you say. "How could the compiler turn "for(i++; i++; i++) ..." into just "i++"? And the answer -- you've heard it, but maybe you still didn't quite believe it -- is that when a program contains undefined behavior, the compiler is allowed to do anything.
The difference is what you already observed. The first code initializes i the other does not. Using an unitialized value is undefined behaviour (UB) in c++. The compiler assumes UB does not happen in a correct program, and hence is allowed to emit code that does whatever.
Simpler example is:
int i;
i++;
Compiler knows that i++ cannot happen in a correct program, and the compiler does not bother to emit correct output for wrong input, hece when you run this code anything could happen.
For further reading see here: https://en.cppreference.com/w/cpp/language/ub
The is a rule of thumb that (among other things) helps to avoid uninitialized variables. It is called Almost-Always-Auto, and it suggests to use auto almost always. If you write
auto i = 0;
You cannot forget to initialize i, because auto requires an initialzer to be able to deduce the type.
PS: C and C++ are two different languages with different rules. Your second code is UB in C++, but I cannot answer your question for C.

Needs upper bound in for loop to be saved in extra variable?

I see that often in older code
DWORD num = someOldListImplementation.GetNum();
for (DWORD i = 0; i < num; i++)
{
someOldListImplementation.Get(i);
}
rather than
for (DWORD i = 0; i < someOldListImplementation.GetNum(); i++)
{
someOldListImplementation.Get(i);
}
I guess the first implementation should prevent calls to GetNum() on each cycle. But, are there cases that the compiler in C++11 does some optimization to the second code snippet which makes the num variable obsolete?
Should I always prefer the first implementation?
If that question duplicates another question for 100%, tell me and close this one. No problem.
The C++ (C++11 in this case) standard doesn't seem to leave much room for this. It states explicitly in 6.5.3 The for statement /1 that (my emphasis):
the condition specifies a test, made before each iteration, such that the loop is exited when the condition becomes false;
However, as stated in 1.9 Program execution /1:
Conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
This provision is sometimes called the "as-if" rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program.
So, if an implementation can ascertain that the return value from GetNum() will not change during the loop, it can quite freely optimise all but the first call out of existence.
Whether an implementation can ascertain that depends on a great many things, and that number of things seems to expand with each new iteration of the standard. The things I'm thinking of are volatility, access from multiple threads, constexpr and so on.
You may well have to work rather hard to notify the compiler that it is free to do this (and, even then, it's not required to do so).
The only problem I can see with the first code sample you have is that num may exist for longer than is necessary (if its reason for existence is only to manage this loop). However, this can easily be fixed by explicitly restricting the scope, such as with the "small-change":
{
DWORD num = someOldListImplementation.GetNum();
for (DWORD i = 0; i < num; i++)
{
someOldListImplementation.Get(i);
}
}
// That num is no longer in existence.
or by incorporating it into the for statement itself:
for (DWORD num = someOldListImplementation.GetNum(), i = 0; i < num; ++i)
{
someOldListImplementation.Get(i);
}
// That num is no longer in existence.
The exact answer to my question is to use
for (auto i = 0, num = x.GetNum(); i < num; ++i)
{
}
because
it prevent calls to GetNum() on each cycle
variable num is not exposed outside of the loop
Credits go to user Nawaz!
I also used ++i instead of i++. Reason: https://www.quora.com/Why-doesnt-it-matter-if-we-use-I++-or-++I-for-a-for-loop

Why doesn't my C++ compiler optimize these memory writes away?

I created this program. It does nothing of interest but use processing power.
Looking at the output with objdump -d, I can see the three rand calls and corresponding mov instructions near the end even when compiling with O3 .
Why doesn't the compiler realize that memory isn't going to be used and just replace the bottom half with while(1){}? I'm using gcc, but I'm mostly interested in what is required by the standard.
/*
* Create a program that does nothing except slow down the computer.
*/
#include <cstdlib>
#include <unistd.h>
int getRand(int max) {
return rand() % max;
}
int main() {
for (int thread = 0; thread < 5; thread++) {
fork();
}
int len = 1000;
int *garbage = (int*)malloc(sizeof(int)*len);
for (int x = 0; x < len; x++) {
garbage[x] = x;
}
while (true) {
garbage[getRand(len)] = garbage[getRand(len)] - garbage[getRand(len)];
}
}
Because GCC isn't smart enough to perform this optimization on dynamically allocated memory. However, if you change garbageto be a local array instead, GCC compiles the loop to this:
.L4:
call rand
call rand
call rand
jmp .L4
This just calls rand repeatedly (which is needed because the call has side effects), but optimizes out the reads and writes.
If GCC was even smarter, it could also optimize out the randcalls, because its side effects only affect any later randcalls, and in this case there aren't any. However, this sort of optimization would probably be a waste of compiler writers' time.
It can't, in general, tell that rand() doesn't have observable side-effects here, and it isn't required to remove those calls.
It could remove the writes, but it may be the use of arrays is enough to suppress that.
The standard neither requires nor prohibits what it is doing. As long as the program has the correct observable behaviour any optimisation is purely a quality of implementation matter.
This code causes undefined behaviour because it has an infinite loop with no observable behaviour. Therefore any result is permissible.
In C++14 the text is 1.10/27:
The implementation may assume that any thread will eventually do one of the following:
terminate,
make a call to a library I/O function,
access or modify a volatile object, or
perform a synchronization operation or an atomic operation.
[Note: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. —end note ]
I wouldn't say that rand() counts as an I/O function.
Related question
Leave it a chance to crash by array overflow ! The compiler won't speculate on the range of outputs of getRand.

Atomic counter in gcc

I must be just having a moment, because this should be easy but I can't seem to get it working right.
Whats the correct way to implement an atomic counter in GCC?
i.e. I want a counter that runs from zero to 4 and is thread safe.
I was doing this (which is further wrapped in a class, but not here)
static volatile int _count = 0;
const int limit = 4;
int get_count(){
// Create a local copy of diskid
int save_count = __sync_fetch_and_add(&_count, 1);
if (save_count >= limit){
__sync_fetch_and_and(&_count, 0); // Set it back to zero
}
return save_count;
}
But it's running from 1 through from 1 - 4 inclusive then around to zero.
It should go from 0 - 3. Normally I'd do a counter with a mod operator but I don't
know how to do that safely.
Perhaps this version is better. Can you see any problems with it, or offer
a better solution.
int get_count(){
// Create a local copy of diskid
int save_count = _count;
if (save_count >= limit){
__sync_fetch_and_and(&_count, 0); // Set it back to zero
return 0;
}
return save_count;
}
Actually, I should point out that it's not absolutely critical that each thread get a different value. If two threads happened to read the same value at the same time that wouldn't be a problem. But they can't exceed limit at any time.
Your code isn't atomic (and your second get_count doesn't even increment the counter value)!
Say count is 3 at the start and two threads simultaneously call get_count. One of them will get his atomic add done first and increments count to 4. If the second thread is fast enough, it can increment it to 5 before the first thread resets it to zero.
Also, in your wraparound processing, you reset count to 0 but not save_count. This is clearly not what's intended.
This is easiest if limit is a power of 2. Don't ever do the reduction yourself, just use
return (unsigned) __sync_fetch_and_add(&count, 1) % (unsigned) limit;
or alternatively
return __sync_fetch_and_add(&count, 1) & (limit - 1);
This only does one atomic operation per invocation, is safe and very cheap. For generic limits, you can still use %, but that will break the sequence if the counter ever overflows. You can try using a 64-bit value (if your platform supports 64-bit atomics) and just hope it never overflows; this is a bad idea though. The proper way to do this is using an atomic compare-exchange operation. You do this:
int old_count, new_count;
do {
old_count = count;
new_count = old_count + 1;
if (new_count >= limit) new_count = 0; // or use %
} while (!__sync_bool_compare_and_swap(&count, old_count, new_count));
This approach generalizes to more complicated sequences and update operations too.
That said, this type of lockless operation is tricky to get right, relies on undefined behavior to some degree (all current compilers get this right, but no C/C++ standard before C++0x actually has a well-defined memory model) and is easy to break. I recommend using a simple mutex/lock unless you've profiled it and found it to be a bottleneck.
You're in luck, because the range you want happens to fit into exactly 2 bits.
Easy solution: Let the volatile variable count up forever. But after you read it, use just the lowest two bits (val & 3). Presto, atomic counter from 0-3.
It's impossible to create anything atomic in pure C, even with volatile. You need asm. C1x will have special atomic types, but until then you're stuck with asm.
You have two problems.
__sync_fetch_and_add will return the previous value (i.e., before adding one). So at the step where _count becomes 3, your local save_count variable is getting back 2. So you actually have to increment _count up to 4 before it'll come back as a 3.
But even on top of that, you're specifically looking for it to be >= 4 before you reset it back to 0. That's just a question of using the wrong limit if you're only looking for it to get as high as three.