thread synchronization - delicate issue - c++

let's i have this loop :
static a;
for (static int i=0; i<10; i++)
{
a++;
///// point A
}
to this loop 2 threads enters...
i'm not sure about something.... what will happen in case thread1 gets into POINT A , stay there, while THREAD2 gets into the loop 10 times, but after the 10'th loop after incrementing i's value to 10, before checking i's value if it's less then 10,
Thread1 is getting out of the loop and suppose to increment i and get into the loop again.
what's the value that Thread1 will increment (which i will he see) ? will it be 10 or 0 ?
is it posibble that Thread1 will increment i to 1, and then thread 2 will get to the loop again for 9 times (and them maybe 8 ,7 , etc...)
thanks

You have to realize that an increment operation is effectively really:
read the value
add 1
write the value back
You have to ask yourself, what happens if two of these happen in two independent threads at the same time:
static int a = 0;
thread 1 reads a (0)
adds 1 (value is 1)
thread 2 reads a (0)
adds 1 (value is 1)
thread 1 writes (1)
thread 2 writes (1)
For two simultaneous increments, you can see that it is possible that one of them gets lost because both threads read the pre-incremented value.
The example you gave is complicated by the static loop index, which I didn't notice at first.
Since this is c++ code, standard implementation is that the static variables are visible to all threads, thus there is only one loop counting variable for all threads. The sane thing to do would be to use a normal auto variable, because each thread would have its own, no locking required.
That means that while you will lose increments sometimes, you also may gain them because the loop itself may lose count and iterate extra times. All in all, a great example of what not to do.

If i is shared between multiple threads, all bets are off. It's possible for any thread to increment i at essentially any point during another thread's execution (including halfway through that thread's increment operation). There is no meaningful way to reason about the contents of i in the above code. Don't do that. Either give each thread its own copy of i, or make the increment and comparison with 10 a single atomic operation.

It's not really a delicate issue because you would never allow this in real code if the synchronization was going to be an issue.

I'm just going to use i++ in your loop:
for (static int i=0; i<10; i++)
{
}
Because it mimics a. (Note, static here is very strange)
Consider if Thread A is suspended just as it reaches i++. Thread B gets i all the way to 9, goes into i++ and makes it 10. If it got to move on, the loop would exist. Ah, but now Thread A is resumed! So it continues where it left off: increment i! So i becomes 11, and your loop is borked.
Any time threads share data, it needs to be protected. You could also make i++ and i < 10 happen atomically (never be interrupted), if your platform supports it.

You should use mutual exclusion to solve this problem.

And that is why, on multi-threaded environment, we are suppose to use locks.
In your case, you should write:
bool test_increment(int& i)
{
lock()
++i;
bool result = i < 10;
unlock();
return result;
}
static a;
for(static int i = -1 ; test_increment(i) ; )
{
++a;
// Point A
}
Now the problem disappears .. Note that lock() and unlock() are supposed to lock and unlock a mutex common to all threads trying to access i!

Yes, it's possible that either thread can do the majority of the work in that loop. But as Dynite explained, this would (and should) never show up in real code. If synchronization is an issue, you should provide mutual exclusion (a Boost, pthread, or Windows Thread) mutex to prevent race conditions such as this.

Why would you use a static loop counter?
This smells like homework, and a bad one at that.

Both the threads have their own copy of i, so the behavior can be anything at all. That's part of why it's such a problem.
When you use a mutex or critical section the threads will generally sync up, but even that is not absolutely guaranteed if the variable is not volatile.
And someone will no doubt point out "volatile has no use in multithreading!" but people say lots of stupid things. You don't have to have volatile but it is helpful for some things.

If your "int" is not the atomic machine word size (think 64 bit address + data emulating a 32-bit VM) you will "word-tear". In that case your "int" is 32 bits, but the machine addresses 64 atomically. Now you have to read all 64, increment half, and write them all back.
This is a much larger issue; bone up on processor instruction sets, and grep gcc for how it implements "volatile" everywhere if you really want the gory details.
Add "volatile" and see how the machine code changes. If you aren't looking down at the chip registers, please just use boost libraries and be done with it.

If you need to increment a value with multiple threads at the same time, then Look up "atomic operations". For linux, look up "gcc atomic operations". There is hardware support on most platforms to atomicly increment, add, compare and swap, and more. LOCKING WOULD BE OVERKILL for this....atomic inc is magnitudes faster than lock inc unlock. If you have to change a lot of fields at the same time you may need a lock, although you can change 128 bits worth of fields at a time with most atomic ops.
volatile is not the same as an atomic operation. Volatile helps the compiler know when its a bad idea to use a copy of a variable. Among its uses, volatile is important when you have multiple threads changing data that you would like to read the "most up to date version of" of without locking. Volatile will still not fix your a++ problem as two threads can read the value of "a" at the same time and then both increment the same "a" and then the last one to write "a" wins and you lost an inc. Volatile will slow down optimized code by not letting the compiler hold values in registers and what not.

Related

std::atomic - behaviour of relaxed ordering

Can the following call to print result in outputting stale/unintended values?
std::mutex g;
std::atomic<int> seq;
int g_s = 0;
int i = 0, j = 0, k = 0; // ignore fact that these could easily made atomic
// Thread 1
void do_work() // seldom called
{
// avoid over
std::lock_guard<std::mutex> lock{g};
i++;
j++;
k++;
seq.fetch_add(1, std::memory_order_relaxed);
}
// Thread 2
void consume_work() // spinning
{
const auto s = g_s;
// avoid overhead of constantly acquiring lock
g_s = seq.load(std::memory_order_relaxed);
if (s != g_s)
{
// no lock guard
print(i, j, k);
}
}
TL:DR: this is super broken; use a Seq Lock instead. Or RCU if your data structure is bigger.
Yes, you have data-race UB, and in practice stale values are likely; so are inconsistent values (from different increments). ISO C++ has nothing to say about what will happen, so it depends on how it happens to compile for some real machine, and interrupts / context switches in the reader that happen in the middle of reading some of these multiple vars. e.g. if the reader sleeps for any reason between reading i and j, you could miss many updates, or at least get a j that doesn't match your i.
Relaxed seq with writer+reader using lock_guard
I'm assuming the writer would look the same, so the atomic RMW increment is inside the critical section.
I'm picturing the reader checking seq like it is now, and only taking a lock after that, inside the block that runs print.
Even if you did use lock_guard to make sure the reader got a consistent snapshot of all three variables (something you couldn't get from making each of them separately atomic), I'm not sure relaxed would be sufficient in theory. It might be in practice on most real implementations for real machines (where compilers have to assume there might be a reader that synchronizes a certain way, even if there isn't in practice). I'd use at least release/acquire for seq, if I was going to take a lock in the reader.
Taking a mutex is an acquire operation, same as a std::memory_order_acquire load on the mutex object. A relaxed increment inside a critical section can't become visible to other threads until after the writer has taken the lock.
But in the reader, with if( xyz != seq.load(relaxed) ) { take_lock; ... }, the load is not guaranteed to "happen before" taking the lock. In practice on many ISAs it will, especially x86 where all atomic RMWs are full memory barriers. But in ISO C++, and maybe some real implementations, it's possible for the relaxed load to reorder into the reader's critical section. Of course, ISO C++ doesn't define things in terms of "reordering", only in terms of syncing with and values loads are allowed to see.
(This reordering may not be fully plausible; it would mean the read side would have to actually take the lock based on branch prediction / speculation on the load result. Maybe with lock elision like x86 did with transactional memory, except without x86's strong memory ordering?)
Anyway, it's pretty hairly to reason about, and release / acquire ops are quite cheap on most CPUs. If you expected it to be expensive, and for the check to often be false, you could check again with an acquire load, or put an acquire fence inside the if so it doesn't happen on the no-new-work path.
Use a Seq Lock
Your problem is better solved by using your sequence counter as part of a Seq Lock, so neither reader nor writer needs a mutex. (Summary: increment before writing, then touch the payload, then increment again. In the reader, read i, j, and k into local temporaries, then check the sequence number again to make sure it's the same, and an even number. With appropriate memory barriers.
See the wikipedia article and/or link below for actual details, but the real change from what you have now is that the sequence number has to increment by 2. If you can't handle that, use a separate counter for the actual lock, with seq as part of the payload.)
If you don't want to use a mutex in the reader, using one in the writer only helps in terms of implementation-detail side-effects, like making sure stores to memory actually happen, not keeping i in a register across calls if do_work inlines into some caller.
BTW, updating seq doesn't need to be an atomic RMW if there's only one writer. You can relaxed load and separately store an incremented temporary (with release semantics).
A Seq Lock is good for cheap reads and occasional writes that make the reader retry. Implementing 64 bit atomic counter with 32 bit atomics shows appropriate fencing.
It relies on non-atomic reads that may have a data race, but not using the result if your sequence counter detects tearing. C++ doesn't define the behaviour in that case, but it works in practice on real implementations. (C++ is mostly keeping its options open in case of hardware race detection, which normal CPUs don't do.)
If you have multiple writers, you'd still use a normal lock to give mutual exclusion between them. or use the sequence counter as a spinlock, as a writer acquires it by making the count odd. Otherwise you just need the sequence counter.
Your global g_s is just to track the latest sequence number the reader has seen? Storing it next to the data defeats some of the purpose/benefit, since it means the reader is writing the same cache line as the writer, assuming that variables declared near each other all end up together. Consider making it static inside the function, or separate it with other stuff, or with padding, like alignas(64) or 128. (That wouldn't guarantee that a compiler doesn't put it right before the other vars, though; a struct would let you control the layout of all of them. With enough alignment, you can make sure they're not in the same aligned pair of cache lines.)
Even ignoring the staleness, this is causes a data race and UB.
Thread 2 can read i,j,k while thread 1 is modifying them, you don't synchronize the access to those variables. If thread 2 doesn't respect the g, there's no point in locking it in thread 1.
Yes, it can.
First of all, the lock guard does not have any effect on your code. A lock has to be used by at least two threads to have any effect.
Thread 2 can read at any moment. It can read an incremented i and not incremented j and k. In theory, it can even read a weird partial value obtained by reading in between updating the various bytes that compose i - for example incrementing from 0xFF to 0x100 results reading 0x1FF or 0x0 - but not on x86 where these updates happen to be atomic.

C++ atomics memory ordering for some specific use case

I'm in the next situation where I use a atomic<uint64_t> as a counter and increment it from 5 or more threads and use the value before increment to take some decision.
atomic<uint64_t> global_counter;
void thread_funtion(){
uint64_t local_counter = global_counter.fetch_add(1,std::memory_order_relaxed);
if(local_counter == 24)
do_somthing(local_counter);
}
thread_funtion() will be executed by 5 different threads. Once I got the local_counter my code doesn't care anymore if the global_counter changes again while thread_funtion() is running (the business logic is in such a way that I only need have a unique incrementing value per thread_function() call).
Is std::memory_order_relaxed safe to be used in this case ?
atomic<...>::fetch_add(..., std::memory_order_relaxed) guarantees the atomic execution, but nothing more.
But even with memory_order_relaxed, there will be one, and only one thread calling do_something(). Since this fetch_add is the only operation on global_counter, and it is executed atomically, the value 24 must be reached exactly once. But there is no guarantee which thread it will be.

Is x++; threadsafe?

If I update a variable in one thread like this:
receiveCounter++;
and then from another thread I only ever read this variable and write its value to a GUI.
Is that safe? Or could this instruction be interrupted in the middle so the value in receiveCounter is wrong when it is read by another thread? it must be so right since ++ is not atomic, it is several instructions.
I don't care about synchronizing reads and writes, it just needs to be incremented and then update in the GUI but this does not have to happen directly after each other.
What I care about is that the value cannot be wrong. Like the ++ operation being interrupted in the middle so the read value is completely off.
Do I need to lock this variable? I really do not want to since it is update very often. I could solves this by just posting a message to a MAIN thread and copy the value to a Queue (which then would need to be locked but I would not do this on every update) I guess.
But I am interested in the above problem anyway.
If one thread changes the value in a variable and another thread reads that value and the program does not synchronize the accesses it has a data race, and the behavior of the program is undefined. Change the type of receiveCounter to std::atomic<int> (assuming it's an int to begin with)
At its core it is a read-modify-write operation, they are not atomic. There are some processors around that have a dedicated instruction for it. Like Intel/AMD cores, very common, they have an INC instruction.
While that sounds like that could be atomic, since it is a single instruction, it still isn't. The x86/x64 instruction set doesn't have much to do anymore with the way the execution engine is actually implemented. Which executes RISC-like "micro-ops", the INC instruction is translated to multiple micro-ops. It can be made atomic with the LOCK prefix on the instruction. But compilers don't emit it unless they know that an atomic update is desired.
You will thus need to be explicit about it. The C++11 std::atomic<> standard addition is a good way. Your operating system or compiler will have intrinsics for it, usually named something like "Interlocked" or "__built_in".
Simple answer: NO.
because i++; is the same as i = i + 1;, it contains load, math-operation, and saving the value. so it is in general not atomic.
However the really executed operations depend on the instruction set of the CPU and might be atomic, depending on the CPU architecture. But the default is still not atomic.
Generally speaking, it is not thread-safe because the ++ operator consists in one read and one write, the pair of which is not atomic and can be interrupted in between.
Then, it probably also depends on the language/compiler/architecture, but in a typical case the increment operation is probably not thread safe.
(edited)
As for the read and write operations themselves, as long as you are not in 64-bit mode, they should be atomic, so if you don't care about other threads having the value wrong by 1, it may be ok for your case.
No. Increment operation is not atomic, hence not thread-safe.
In your case it is safe to use this operation if you don't care about its value in specific time (if you only read this variable from another thread and not trying to write to it). It will eventually increment receiveCounter's value, you just don't have any guarantees about operations order.
++ is equivalent to i=i+1.
++ is not an atomic operation so its NOT thread safe. Only read and write of primitive variables (except long and double) are atomic and hence thread safe.

Synchronizing access to variable

I need to provide synchronization to some members of a structure.
If the structure is something like this
struct SharedStruct {
int Value1;
int Value2;
}
and I have a global variable
SharedStruct obj;
I want that the write from a processor
obj.Value1 = 5; // Processor B
to be immediately visible to the other processors, so that when I test the value
if(obj.Value1 == 5) { DoSmth(); } // Processor A
else DoSmthElse();
to get the new value, not some old value from the cache.
First I though that if I use volatile when writing/reading the values, it is enough. But I read that volatile can't solve this kind o issues.
The members are guaranteed to be properly aligned on 2/4/8 byte boundaries, and writes should be atomic in this case, but I'm not sure how the cache could interfere with this.
Using memory barriers (mfence, sfence, etc.) would be enough ? Or some interlocked operations are required ?
Or maybe something like
lock mov addr, REGISTER
?
The easiest would obviously be some locking mechanism, but speed is critical and can't afford locks :(
Edit
Maybe I should clarify a bit. The value is set only once (behaves like a flag). All the other threads need just to read it. That's why I think that it may be a way to force the read of this new value without using locks.
Thanks in advance!
There Ain't No Such Thing As A Free Lunch. If your data is being accessed from multiple threads, and it is necessary that updates are immediately visible by those other threads, then you have to protect the shared struct by a mutex, or a readers/writers lock, or some similar mechanism.
Performance is a valid concern when synchronizing code, but it is trumped by correctness. Generally speaking, aim for correctness first and then profile your code. Worrying about performance when you haven't yet nailed down correctness is premature optimization.
Use explicitly atomic instructions. I believe most compilers offer these as intrinsics. Compare and Exchange is another good one.
If you intend to write a lockless algorithm, you need to write it so that your changes only take effect when conditions are as expected.
For example, if you intend to insert a linked list object, use the compare/exchange stuff so that it only inserts if the pointer still points at the same location when you actually do the update.
Or if you are going to decrement a reference count and free the memory at count 0, you will want to pre-free it by making it unavailable somehow, check that the count is still 0 and then really free it. Or something like that.
Using a lock, operate, unlock design is generally a lot easier. The lock-free algorithms are really difficult to get right.
All the other answers here seem to hand wave about the complexities of updating shared variables using mutexes, etc. It is true that you want the update to be atomic.
And you could use various OS primitives to ensure that, and that would be good
programming style.
However, on most modern processors (certainly the x86), writes of small, aligned scalar values is atomic and immediately visible to other processors due to cache coherency.
So in this special case, you don't need all the synchronizing junk; the hardware does the
atomic operation for you. Certainly this is safe with 4 byte values (e.g., "int" in 32 bit C compilers).
So you could just initialize Value1 with an uninteresting value (say 0) before you start the parallel threads, and simply write other values there. If the question is exiting the loop on a fixed value (e.g., if value1 == 5) this will be perfectly safe.
If you insist on capturing the first value written, this won't work. But if you have a parallel set of threads, and any value written other than the uninteresting one will do, this is also fine.
I second peterb's answer to aim for correctness first. Yes, you can use memory barriers here, but they will not do what you want.
You said immediately. However, how immediate this update ever can be, you could (and will) end up with the if() clause being executed, then the flag being set, and than the DoSmthElse() being executed afterwards. This is called a race condition...
You want to synchronize something, it seems, but it is not this flag.
Making the field volatile should make the change "immediately" visible in other threads, but there is no guarantee that the instant at which thread A executes the update doesn't occur after thread B tests the value but before thread B executes the body of the if/else statement.
It sounds like what you really want to do is make that if/else statement atomic, and that will require either a lock, or an algorithm that is tolerant of this sort of situation.

I've heard i++ isn't thread safe, is ++i thread-safe?

I've heard that i++ isn't a thread-safe statement since in assembly it reduces down to storing the original value as a temp somewhere, incrementing it, and then replacing it, which could be interrupted by a context switch.
However, I'm wondering about ++i. As far as I can tell, this would reduce to a single assembly instruction, such as 'add r1, r1, 1' and since it's only one instruction, it'd be uninterruptable by a context switch.
Can anyone clarify? I'm assuming that an x86 platform is being used.
You've heard wrong. It may well be that "i++" is thread-safe for a specific compiler and specific processor architecture but it's not mandated in the standards at all. In fact, since multi-threading isn't part of the ISO C or C++ standards (a), you can't consider anything to be thread-safe based on what you think it will compile down to.
It's quite feasible that ++i could compile to an arbitrary sequence such as:
load r0,[i] ; load memory into reg 0
incr r0 ; increment reg 0
stor [i],r0 ; store reg 0 back to memory
which would not be thread-safe on my (imaginary) CPU that has no memory-increment instructions. Or it may be smart and compile it into:
lock ; disable task switching (interrupts)
load r0,[i] ; load memory into reg 0
incr r0 ; increment reg 0
stor [i],r0 ; store reg 0 back to memory
unlock ; enable task switching (interrupts)
where lock disables and unlock enables interrupts. But, even then, this may not be thread-safe in an architecture that has more than one of these CPUs sharing memory (the lock may only disable interrupts for one CPU).
The language itself (or libraries for it, if it's not built into the language) will provide thread-safe constructs and you should use those rather than depend on your understanding (or possibly misunderstanding) of what machine code will be generated.
Things like Java synchronized and pthread_mutex_lock() (available to C/C++ under some operating systems) are what you need to look into (a).
(a) This question was asked before the C11 and C++11 standards were completed. Those iterations have now introduced threading support into the language specifications, including atomic data types (though they, and threads in general, are optional, at least in C).
You can't make a blanket statement about either ++i or i++. Why? Consider incrementing a 64-bit integer on a 32-bit system. Unless the underlying machine has a quad word "load, increment, store" instruction, incrementing that value is going to require multiple instructions, any of which can be interrupted by a thread context switch.
In addition, ++i isn't always "add one to the value." In a language like C, incrementing a pointer actually adds the size of the thing pointed to. That is, if i is a pointer to a 32-byte structure, ++i adds 32 bytes. Whereas almost all platforms have an "increment value at memory address" instruction that is atomic, not all have an atomic "add arbitrary value to value at memory address" instruction.
They are both thread-unsafe.
A CPU cannot do math directly with memory. It does that indirectly by loading the value from memory and doing the math with CPU registers.
i++
register int a1, a2;
a1 = *(&i) ; // One cpu instruction: LOAD from memory location identified by i;
a2 = a1;
a1 += 1;
*(&i) = a1;
return a2; // 4 cpu instructions
++i
register int a1;
a1 = *(&i) ;
a1 += 1;
*(&i) = a1;
return a1; // 3 cpu instructions
For both cases, there is a race condition that results in the unpredictable i value.
For example, let's assume there are two concurrent ++i threads with each using register a1, b1 respectively. And, with context switching executed like the following:
register int a1, b1;
a1 = *(&i);
a1 += 1;
b1 = *(&i);
b1 += 1;
*(&i) = a1;
*(&i) = b1;
In result, i doesn't become i+2, it becomes i+1, which is incorrect.
To remedy this, moden CPUs provide some kind of LOCK, UNLOCK cpu instructions during the interval a context switching is disabled.
On Win32, use InterlockedIncrement() to do i++ for thread-safety. It's much faster than relying on mutex.
If you are sharing even an int across threads in a multi-core environment, you need proper memory barriers in place. This can mean using interlocked instructions (see InterlockedIncrement in win32 for example), or using a language (or compiler) that makes certain thread-safe guarantees. With CPU level instruction-reordering and caches and other issues, unless you have those guarantees, don't assume anything shared across threads is safe.
Edit: One thing you can assume with most architectures is that if you are dealing with properly aligned single words, you won't end up with a single word containing a combination of two values that were mashed together. If two writes happen over top of each other, one will win, and the other will be discarded. If you are careful, you can take advantage of this, and see that either ++i or i++ are thread-safe in the single writer/multiple reader situation.
If you want an atomic increment in C++ you can use C++0x libraries (the std::atomic datatype) or something like TBB.
There was once a time that the GNU coding guidelines said updating datatypes that fit in one word was "usually safe" but that advice is wrong for SMP machines, wrong for some architectures, and wrong when using an optimizing compiler.
To clarify the "updating one-word datatype" comment:
It is possible for two CPUs on an SMP machine to write to the same memory location in the same cycle, and then try to propagate the change to the other CPUs and the cache. Even if only one word of data is being written so the writes only take one cycle to complete, they also happen simultaneously so you cannot guarantee which write succeeds. You won't get partially updated data, but one write will disappear because there is no other way to handle this case.
Compare-and-swap properly coordinates between multiple CPUs, but there is no reason to believe that every variable assignment of one-word datatypes will use compare-and-swap.
And while an optimizing compiler doesn't affect how a load/store is compiled, it can change when the load/store happens, causing serious trouble if you expect your reads and writes to happen in the same order they appear in the source code (the most famous being double-checked locking does not work in vanilla C++).
NOTE My original answer also said that Intel 64 bit architecture was broken in dealing with 64 bit data. That is not true, so I edited the answer, but my edit claimed PowerPC chips were broken. That is true when reading immediate values (i.e., constants) into registers (see the two sections named "Loading pointers" under listing 2 and listing 4) . But there is an instruction for loading data from memory in one cycle (lmw), so I've removed that part of my answer.
Even if it is reduced to a single assembly instruction, incrementing the value directly in memory, it is still not thread safe.
When incrementing a value in memory, the hardware does a "read-modify-write" operation: it reads the value from the memory, increments it, and writes it back to memory. The x86 hardware has no way of incrementing directly on the memory; the RAM (and the caches) is only able to read and store values, not modify them.
Now suppose you have two separate cores, either on separate sockets or sharing a single socket (with or without a shared cache). The first processor reads the value, and before it can write back the updated value, the second processor reads it. After both processors write the value back, it will have been incremented only once, not twice.
There is a way to avoid this problem; x86 processors (and most multi-core processors you will find) are able to detect this kind of conflict in hardware and sequence it, so that the whole read-modify-write sequence appears atomic. However, since this is very costly, it is only done when requested by the code, on x86 usually via the LOCK prefix. Other architectures can do this in other ways, with similar results; for instance, load-linked/store-conditional and atomic compare-and-swap (recent x86 processors also have this last one).
Note that using volatile does not help here; it only tells the compiler that the variable might have be modified externally and reads to that variable must not be cached in a register or optimized out. It does not make the compiler use atomic primitives.
The best way is to use atomic primitives (if your compiler or libraries have them), or do the increment directly in assembly (using the correct atomic instructions).
On x86/Windows in C/C++, you should not assume it is thread-safe. You should use InterlockedIncrement() and InterlockedDecrement() if you require atomic operations.
If your programming language says nothing about threads, yet runs on a multithreaded platform, how can any language construct be thread-safe?
As others pointed out: you need to protect any multithreaded access to variables by platform specific calls.
There are libraries out there that abstract away the platform specificity, and the upcoming C++ standard has adapted it's memory model to cope with threads (and thus can guarantee thread-safety).
Never assume that an increment will compile down to an atomic operation. Use InterlockedIncrement or whatever similar functions exist on your target platform.
Edit: I just looked up this specific question and increment on X86 is atomic on single processor systems, but not on multiprocessor systems. Using the lock prefix can make it atomic, but it's much more portable just to use InterlockedIncrement.
According to this assembly lesson on x86, you can atomically add a register to a memory location, so potentially your code may atomically execute '++i' ou 'i++'.
But as said in another post, the C ansi does not apply atomicity to '++' opération, so you cannot be sure of what your compiler will generate.
The 1998 C++ standard has nothing to say about threads, although the next standard (due this year or the next) does. Therefore, you can't say anything intelligent about thread-safety of operations without referring to the implementation. It's not just the processor being used, but the combination of the compiler, the OS, and the thread model.
In the absence of documentation to the contrary, I wouldn't assume that any action is thread-safe, particularly with multi-core processors (or multi-processor systems). Nor would I trust tests, as thread synchronization problems are likely to come up only by accident.
Nothing is thread-safe unless you have documentation that says it is for the particular system you're using.
Throw i into thread local storage; it isn't atomic, but it then doesn't matter.
AFAIK, According to the C++ standard, read/writes to an int are atomic.
However, all that this does is get rid of the undefined behavior that's associated with a data race.
But there still will be a data race if both threads try to increment i.
Imagine the following scenario:
Let i = 0 initially:
Thread A reads the value from memory and stores in its own cache.
Thread A increments the value by 1.
Thread B reads the value from memory and stores in its own cache.
Thread B increments the value by 1.
If this is all a single thread you would get i = 2 in memory.
But with both threads, each thread writes its changes and so Thread A writes i = 1 back to memory, and Thread B writes i = 1 to memory.
It's well defined, there's no partial destruction or construction or any sort of tearing of an object, but it's still a data race.
In order to atomically increment i you can use:
std::atomic<int>::fetch_add(1, std::memory_order_relaxed)
Relaxed ordering can be used because we don't care where this operation takes place all we care about is that the increment operation is atomic.
You say "it's only one instruction, it'd be uninterruptible by a context switch." - that's all well and good for a single CPU, but what about a dual core CPU? Then you can really have two threads accessing the same variable at the same time without any context switches.
Without knowing the language, the answer is to test the heck out of it.
I think that if the expression "i++" is the only in a statement, it's equivalent to "++i", the compiler is smart enough to not keep a temporal value, etc. So if you can use them interchangeably (otherwise you won't be asking which one to use), it doesn't matter whichever you use as they're almost the same (except for aesthetics).
Anyway, even if the increment operator is atomic, that doesn't guarantee that the rest of the computation will be consistent if you don't use the correct locks.
If you want to experiment by yourself, write a program where N threads increment concurrently a shared variable M times each... if the value is less than N*M, then some increment was overwritten. Try it with both preincrement and postincrement and tell us ;-)
For a counter, I recommend a using the compare and swap idiom which is both non locking and thread-safe.
Here it is in Java:
public class IntCompareAndSwap {
private int value = 0;
public synchronized int get(){return value;}
public synchronized int compareAndSwap(int p_expectedValue, int p_newValue){
int oldValue = value;
if (oldValue == p_expectedValue)
value = p_newValue;
return oldValue;
}
}
public class IntCASCounter {
public IntCASCounter(){
m_value = new IntCompareAndSwap();
}
private IntCompareAndSwap m_value;
public int getValue(){return m_value.get();}
public void increment(){
int temp;
do {
temp = m_value.get();
} while (temp != m_value.compareAndSwap(temp, temp + 1));
}
public void decrement(){
int temp;
do {
temp = m_value.get();
} while (temp > 0 && temp != m_value.compareAndSwap(temp, temp - 1));
}
}