do integer reads need to be critical section protected?

do integer reads need to be critical section protected? - c++

I have come across C++03 some code that takes this form:
struct Foo {
int a;
int b;
CRITICAL_SECTION cs;
}
// DoFoo::Foo foo_;
void DoFoo::Foolish()
{
if( foo_.a == 4 )
{
PerformSomeTask();
EnterCriticalSection(&foo_.cs);
foo_.b = 7;
LeaveCriticalSection(&foo_.cs);
}
}
Does the read from foo_.a need to be protected? e.g.:
void DoFoo::Foolish()
{
EnterCriticalSection(&foo_.cs);
int a = foo_.a;
LeaveCriticalSection(&foo_.cs);
if( a == 4 )
{
PerformSomeTask();
EnterCriticalSection(&foo_.cs);
foo_.b = 7;
LeaveCriticalSection(&foo_.cs);
}
}
If so, why?
Please assume the integers are 32-bit aligned. The platform is ARM.

Technically yes, but no on many platforms. First, let us assume that int is 32 bits (which is pretty common, but not nearly universal).
It is possible that the two words (16 bit parts) of a 32 bit int will be read or written to separately. On some systems, they will be read separately if the int isn't aligned properly.
Imagine a system where you can only do 32-bit aligned 32 bit reads and writes (and 16-bit aligned 16 bit reads and writes), and an int that straddles such a boundary. Initially the int is zero (ie, 0x00000000)
One thread writes 0xBAADF00D to the int, the other reads it "at the same time".
The writing thread first writes 0xBAAD to the high word of the int. The reader thread then reads the entire int (both high and low) getting 0xBAAD0000 -- which is a state that the int was never put into on purpose!
The writer thread then writes the low word 0xF00D.
As noted, on some platforms all 32 bit reads/writes are atomic, so this isn't a concern. There are other concerns, however.
Most lock/unlock code includes instructions to the compiler to prevent reordering across the lock. Without that prevention of reordering, the compiler is free to reorder things so long as it behaves "as-if" in a single threaded context it would have worked that way. So if you read a then b in code, the compiler could read b before it reads a, so long as it doesn't see an in-thread opportunity for b to be modified in that interval.
So possibly the code you are reading is using these locks to make sure that the read of the variable happens in the order written in the code.
Other issues are raised in the comments below, but I don't feel competent to address them: cache issues, and visibility.

Looking at this it seems that arm has quite relaxed memory model so you need a form of memory barrier to ensure that writes in one thread are visible when you'd expect them in another thread. So what you are doing, or else using std::atomic seems likely necessary on your platform. Unless you take this into account you can see updates out of order in different threads which would break your example.

I think you can use C++11 to ensure that integer reads are atomic, using (for example) std::atomic<int>.

The C++ standard says that there's a data race if one thread writes to a variable at the same time as another thread reads from that variable, or if two threads write to the same variable at the same time. It further says that a data race produces undefined behavior. So, formally, you must synchronize those reads and writes.
There are three separate issues when one thread reads data that was written by another thread. First, there is tearing: if writing requires more than a single bus cycle, it's possible for a thread switch to occur in the middle of the operation, and another thread could see a half-written value; there's an analogous problem if a read requires more than a single bus cycle. Second, there's visibility: each processor has its own local copy of the data that it's been working on recently, and writing to one processor's cache does not necessarily update another processor's cache. Third, there's compiler optimizations that reorder reads and writes in ways that would be okay within a single thread, but will break multi-threaded code. Thread-safe code has to deal with all three problems. That's the job of synchronization primitives: mutexes, condition variables, and atomics.

Although the integer read/write operation indeed will most likely be atomic, the compiler optimizations and processor cache will still give you problems if you don't do it properly.
To explain - the compiler will normally assume that the code is single-threaded and make many optimizations that rely on that. For example, it might change the order of instructions. Or, if it sees that the variable is only written and never read, it might optimize it away entirely.
The CPU will also cache that integer, so if one thread writes it, the other one might not get to see it until a lot later.
There are two things you can do. One is to wrap in in critical section like in your original code. The other is to mark the variable as volatile. That will signal the compiler that this variable will be accessed by multiple threads and will disable a range of optimizations, as well as placing special cache-sync instructions (aka "memory barriers") around accesses to the variable (or so I understand). Apparently this is wrong.
Added: Also, as noted by another answer, Windows has Interlocked APIs that can be used to avoid these issues for non-volatile variables.

Related

volatile variable updated from multiple threads C++

volatile bool b;
Thread1: //only reads b
void f1() {
while (1) {
if (b) {do something};
else { do something else};
}
}
Thread2:
//only sets b to true if certain condition met
// updated by thread2
void f2() {
while (1) {
//some local condition evaluated - local_cond
if (!b && (local_cond == true)) b = true;
//some other work
}
}
Thread3:
//only sets b to false when it gets a message on a socket its listening to
void f3() {
while (1) {
//select socket
if (expected message came) b = false;
//do some other work
}
}
If thread2 updates b first at time t and later thread3 updates b at time t+5:
will thread1 see the latest value "in time" whenever it is reading b?
for example: reads from t+delta to t+5+delta should read true and
reads after t+5+delta should read false.
delta is the time for the store of "b" into memory when one of threads 2 or 3 updated it

The effect of volatile keyword is principally two things (I avoid scientifically strict formulations here):
1) Its accesses can't be cached or combined. (UPD: on suggestion, I underline this is for caching in registers or another compiler-provided location, not the RAM cache in CPU.) For example, the following code:
x = 1;
x = 2;
for a volatile x will never be combined into single x = 2, whatever optimization level is required; but if x is not volatile, even low levels will likely cause this collapse into a single write. The same for reads: each read operation will access the variable value without any attempt to cache it.
2) All volatile operations are relayed onto machine command layer in the same order between them (to underline, only between volatile operations), as they are defined in source code.
But this is not true for accesses between non-volatile and volatile memory. For the following code:
int *x;
volatile int *vy;
void foo()
{
*x = 1;
*vy = 101;
*x = 2;
*vy = 102;
}
gcc (9.4) with -O2 and clang (10.0) with -O produce something similar to:
movq x(%rip), %rax
movq vy(%rip), %rcx
movl $101, (%rcx)
movl $2, (%rax)
movl $102, (%rcx)
retq
so one access to x is already gone, despite its presence between two volatile accesses. If one need the first x = 1 to succeed before first write to vy, let him put an explicit barrier (since C11, atomic_signal_fence is the platform-independent mean for this).
That was the common rule but without regarding multithread issues. What happens here with multithreading?
Well, imagine as you declare that thread 2 writes true to b, so, this is writing of value 1 to single-byte location. But, this is ordinary write without any memory ordering requirements. What you provided with volatile is that compiler won't optimize it. But what for processor?
If this was a modern abstract processor, or one with relaxed rules, like ARM, I'd say nothing prevent it from postponing the real write for an indefinite time. (To clarify, "write" is exposing the operation to RAM-and-all-caches conglomerate.) It's fully up to processor's deliberation. Well, processors are designed to flush their stockpiling of pending writes as fast as possible. But what affects real delay, you can't know: for example, it could "decide" to fill instruction cache with a few next lines, or flush another queued writings... lots of variants. The only thing we know it provides "best effort" to flush all queued operations, to avoid getting buried under previous results. That's truly natural and nothing more.
With x86, there is an additional factor. Nearly every memory write (and, I guess, this one as well) is "releasing" write in x86, so, all previous reads and writes shall be completed before this write. But, the gut fact is that the operations to complete are before this write. So when you write true to volatile b, you will be sure all previous operations have already got visible to other participants... but this one still could be postponed for a while... how long? Nanoseconds? Microseconds? Any other write to memory will flush and so publish this write to b... do you have writes in cycle iteration of thread 2?
The same affects thread 3. You can't be sure this b = false will be published to other CPUs when you need it. Delay is unpredictable. The only thing is guaranteed, if this is not a realtime-aware hardware system, for an indefinite time, and the ISA rules and barriers provide ordering but not exact times. And, x86 is definitely not for such a realtime.
Well, all this means you also need an explicit barrier after write which affects not only compiler, but CPU as well: barrier before previous write and following reads or writes. Among C/C++ means, full barrier satifies this - so you have to add std::atomic_thread_fence(std::memory_order_seq_cst) or use atomic variable (instead of plain volatile one) with the same memory order for write.
And, all this still won't provide you with exact timings like you described ("t" and "t+5"), because the visible "timestamps" of the same operation can differ for different CPUs! (Well, this resembles Einstein's relativity a bit.) All you could say in this situation is that something is written into memory, and typically (not always) the inter-CPU order is what you expected (but the ordering violation will punish you).
But, I can't catch the general idea of what do you want to implement with this flag b. What do you want from it, what state should it reflect? Let you return to the upper level task and reformulate. Is this (I'm just guessing on coffee grounds) a green light to do something, which is cancelled by an external order? If so, an internal permission ("we are ready") from the thread 2 shall not drop this cancellation. This can be done using different approaches, as:
1) Just separate flags and a mutex/spinlock around their set. Easy but a bit costly (or even substantially costly, I don't know your environment).
2) An atomically modified analog. For example, you can use a bitfield variable which is modified using compare-and-swap. Assign bit 0 to "ready" but bit 1 for "cancelled". For C, atomic_compare_exchange_strong is what you'll need here at x86 (and at most other ISAs). And, volatile is not needed anymore here if you keep residing with memory_order_seq_cst.

Will thread1 see the latest value "in time" whenever it is reading b?
Yes, the volatile keyword denotes that it can be modified outside of the thread or hardware without the compiler being aware thus every access (both read and write) will be made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point). This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.
Unfortunately, the volatile keyword is not thread-safe and operation will have to be taken with care, it is recommended to use atomic for this, unless in an embedded or bare-metal scenario.
Also the whole struct should be atomic struct X {int a; volatile bool b;};.

Say I have a system with 2 cores. The first core runs thread 2, the second core runs thread 3.
reads from t+delta to t+5+delta should read true and reads after t+5+delta should read false.
Problem is that thread 1 will read at t + 10000000 when the kernel decides one of the threads has run long enough and schedules a different thread. So it likely thread1 will not see the change a lot of the time.
Note: this ignores all the additional problems of synchronicity of caches and observability. If the thread isn't even running all of that becomes irrelevant.

Synchronization with "versioning" in c++

Please consider the following synchronization problem:
initially:
version = 0 // atomic variable
data = 0 // normal variable (there could be many)
Thread A:
version++
data = 3
Thread B:
d = data
v = version
assert(d != 3 || v == 1)
Basically, if thread B sees data = 3 then it must also see version++.
What's the weakest memory order and synchronization we must impose so that the assertion in thread B is always satisfied?
If I understand C++ memory_order correctly, the release-acquire ordering won't do because that guarantees that operations BEFORE version++, in thread A, will be seen by the operations AFTER v = version, in thread B.
Acquire and release fences also work in the same directions, but are more general.
As I said, I need the other direction: B sees data = 3 implies B sees version = 1.
I'm using this "versioning approach" to avoid locks as much as possible in a data structure I'm designing. When I see something has changed, I step back, read the new version and try again.
I'm trying to keep my code as portable as possible, but I'm targeting x86-64 CPUs.

You might be looking for a SeqLock, as long as your data doesn't include pointers. (If it does, then you might need something more like RCU to protect readers that might load a pointer, stall / sleep for a while, then deref that pointer much later.)
You can use the SeqLock sequence counter as the version number. (version = tmp_counter >> 1 since you need two increments per write of the payload to let readers detect tearing when reading the non-atomic data. And to make sure they see the data that goes with this sequence number. Make sure you don't read the atomic counter a 3rd time; use the local tmp that you read it into to verify match before/after copying data.)
Readers will have to retry if they happen to attempt a read while data is being modified. But it's non-atomic, so there's no way if thread B sees data = 3 can ever be part of what creates synchronization; it can only be something you see as a result of synchronizing with a version number from the writer.
See:
Implementing 64 bit atomic counter with 32 bit atomics - my attempt at a SeqLock in C++, with lots of comments. It's a bit of a hack because ISO C++'s data-race UB rules are overly strict; a SeqLock relies on detecting possible tearing and not using torn data, rather than avoiding concurrent access entirely. That's fine on a machine without hardware race detection so that doesn't fault (like all real CPUs), but C++ still calls that UB, even with volatile (although that puts it more into implementation-defined territory). In practice it's fine.
GCC reordering up across load with `memory_order_seq_cst`. Is this allowed? - A GCC bug fixed in 8.1 that could break a seqlock implementation.
If you have multiple writers, you can use the sequence-counter itself as a spinlock for mutual exclusion between writers. e.g. using an atomic_fetch_or or CAS to attempt to set the low bit to make the counter odd. (tmp = seq.fetch_or(1, std::memory_order_acq_rel);, hopefully compiling to x86 lock bts). If it previously didn't have the low bit set, this writer won the race, but if it did then you have to try again.
But with only a single writer, you don't need to RMW the atomic sequence counter, just store new values (ordered with writes to the payload), so you can either keep a local copy of it, or just do a relaxed load of it, and store tmp+1 and tmp+2.

Can I use char variable without lock in the multi-threading case

As c/c++ standard said, the size of char must be 1. As my understanding, that means CPU guarantees that any read or write on a char must be done in one instruction.
Let's say we have many threads, which share a char variable:
char target = 1;
// thread a
target = 0;
// thread b
target = 1;
// thread 1
while (target == 1) {
// do something
}
// thread 2
while (target == 1) {
// do something
}
In a word, there are two kinds of threads: some of them are to set target into 0 or 1, and the others are to do some tasks if target == 1. The goal is that we can control the task-threds through modifying the value of target.
As my understanding, it doesn't seem that we need to use mutex/lock at all. But my coding experience gave me a strong feeling that we must use mutex/lock in this case.
I'm confused now. Should I use mutex/lock or not in this case?
You see, I can understand why we need mutex/lock in other cases, such as i++. Because i++ can't be done in only one instruction. So can target = 0 be done in one instruction, right? If so, does it mean that we don't need mutex/lock in this case?
Well, I know that we could use std::atomic, so my question is: is it OK to not use neither mutex/lcok nor std::atomic.

std::atomic guarantees that accessing a variable is atomic. From cppreference:
Each instantiation and full specialization of the std::atomic template
defines an atomic type. If one thread writes to an atomic object while
another thread reads from it, the behavior is well-defined (see memory
model for details on data races).
When a char actually is atomic (being size 1, is not sufficient), then std::atomic<char> needs no extra synchronization. However, on a platform where char is not atomic, std::atomic<char> guarantees that it can be read and written atomically by using a mutex or similar.
In practice, I'd expect char to be atomic, but the standard does not guarantee that.
Also consider that operations like eg += read and write the value, hence atomic reads and writes alone are not sufficient to safely call +=, while std::atomic<T> has a proper operator+=.
TL;DR
I'm confused now. Should I use mutex/lock or not in this case?
Let someone else take that decision for you. When you want something atomic, use a std::atomic<something> unless you want fine grained control over the synchronisation.

TL;DR
is it OK to not use neither mutex/lcok nor std::atomic
No
In general, it's not ok to assume things. If you need guarantees for something, then make sure you have them.
This is closely related to a common logical fallacy. Just because you cannot imagine why something could be true, that does not mean that it's true.
Longer version
As c/c++ standard said
There's no such thing as "C/C++" and definitely not "the C/C++ standard". They are two completely different languages with different standards. However, they do agree on this point. sizeof (char) is 1 in both languages.
(Sidenote: sizeof 'a' will yield different results.)
As my understanding, that means CPU guarantees that any read or write on a char must be done in one instruction.
That's not correct. The CPU has it's own specification, completely separate from the language standards. And there's nothing that says that this has to be true, even if it probably are in most or all cases.
Because i++ can't be done in only one instruction.
That is CPU dependent. The x86 architecture has an instruction for this. https://c9x.me/x86/html/file_module_x86_id_140.html
As my understanding, it doesn't seem that we need to use mutex/lock at all. But my coding experience gave me a strong feeling that we must use mutex/lock in this case.
Even if the target CPU does read and write in one instruction, which it probably does, there's nothing that says that the C or C++ code needs to be compiled to just that instruction.
The standards for both C and C++ describes the behavior of the code. Not how it is converted to assembly.
So no, you cannot make the assumptions you're doing.

In general, it cannot be assumed that reading or writing a char is an atomic operation. However, the target architecture may provide that guarantee. For embedded C programs it is common practice to rely on such underlying guarantees to avoid the overhead of synchronization mechanisms in certain situations.
In the example in the question it must be noted that even if reading/writing target is an atomic operation, the value could be changed at any time, so there is no guarantee that it will be 1 inside the while loops.

"pseudo-atomic" operations in C++

So I'm aware that nothing is atomic in C++. But I'm trying to figure out if there are any "pseudo-atomic" assumptions I can make. The reason is that I want to avoid using mutexes in some simple situations where I only need very weak guarantees.
1) Suppose I have globally defined volatile bool b, which
initially I set true. Then I launch a thread which executes a loop
while(b) doSomething();
Meanwhile, in another thread, I execute b=true.
Can I assume that the first thread will continue to execute? In other words, if b starts out as true, and the first thread checks the value of b at the same time as the second thread assigns b=true, can I assume that the first thread will read the value of b as true? Or is it possible that at some intermediate point of the assignment b=true, the value of b might be read as false?
2) Now suppose that b is initially false. Then the first thread executes
bool b1=b;
bool b2=b;
if(b1 && !b2) bad();
while the second thread executes b=true. Can I assume that bad() never gets called?
3) What about an int or other builtin types: suppose I have volatile int i, which is initially (say) 7, and then I assign i=7. Can I assume that, at any time during this operation, from any thread, the value of i will be equal to 7?
4) I have volatile int i=7, and then I execute i++ from some thread, and all other threads only read the value of i. Can I assume that i never has any value, in any thread, except for either 7 or 8?
5) I have volatile int i, from one thread I execute i=7, and from another I execute i=8. Afterwards, is i guaranteed to be either 7 or 8 (or whatever two values I have chosen to assign)?

There are no threads in standard C++, and Threads cannot be implemented as a library.
Therefore, the standard has nothing to say about the behaviour of programs which use threads. You must look to whatever additional guarantees are provided by your threading implementation.
That said, in threading implementations I've used:
(1) yes, you can assume that irrelevant values aren't written to variables. Otherwise the whole memory model goes out the window. But be careful that when you say "another thread" never sets b to false, that means anywhere, ever. If it does, that write could perhaps be re-ordered to occur during your loop.
(2) no, the compiler can re-order the assignments to b1 and b2, so it is possible for b1 to end up true and b2 false. In such a simple case I don't know why it would re-order, but in more complex cases there might be very good reasons.
[Edit: oops, by the time I got to answering (2) I'd forgotten that b was volatile. Reads from a volatile variable won't be re-ordered, sorry, so yes on a typical threading implementation (if there is any such thing), you can assume that you won't end up with b1 true and b2 false.]
(3) same as 1. volatile in general has nothing to do with threading at all. However, it is quite exciting in some implementations (Windows), and might in effect imply memory barriers.
(4) on an architecture where int writes are atomic yes, although volatile has nothing to do with it. See also...
(5) check the docs carefully. Likely yes, and again volatile is irrelevant, because on almost all architectures int writes are atomic. But if int write is not atomic, then no (and no for the previous question), even if it's volatile you could in principle get a different value. Given those values 7 and 8, though, we're talking a pretty weird architecture for the byte containing the relevant bits to be written in two stages, but with different values you could more plausibly get a partial write.
For a more plausible example, suppose that for some bizarre reason you have a 16 bit int on a platform where only 8bit writes are atomic. Odd, but legal, and since int must be at least 16 bits you can see how it could come about. Suppose further that your initial value is 255. Then increment could legally be implemented as:
read the old value
increment in a register
write the most significant byte of the result
write the least significant byte of the result.
A read-only thread which interrupted the incrementing thread between the third and fourth steps of that, could see the value 511. If the writes are in the other order, it could see 0.
An inconsistent value could be left behind permanently if one thread is writing 255, another thread is concurrently writing 256, and the writes get interleaved. Impossible on many architectures, but to know that this won't happen you need to know at least something about the architecture. Nothing in the C++ standard forbids it, because the C++ standard talks about execution being interrupted by a signal, but otherwise has no concept of execution being interrupted by another part of the program, and no concept of concurrent execution. That's why threads aren't just another library - adding threads fundamentally changes the C++ execution model. It requires the implementation to do things differently, as you'll eventually discover if for example you use threads under gcc and forget to specify -pthreads.
The same could happen on a platform where aligned int writes are atomic, but unaligned int writes are permitted and not atomic. For example IIRC on x86, unaligned int writes are not guaranteed atomic if they cross a cache line boundary. x86 compilers will not mis-align a declared int variable, for this reason and others. But if you play games with structure packing you could probably provoke an example.
So: pretty much any implementation will give you the guarantees you need, but might do so in quite a complicated way.
In general, I've found that it is not worth trying to rely on platform-specific guarantees about memory access, that I don't fully understand, in order to avoid mutexes. Use a mutex, and if that's too slow use a high-quality lock-free structure (or implement a design for one) written by someone who really knows the architecture and compiler. It will probably be correct, and subject to correctness will probably outperform anything I invent myself.

Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.
Consider an example taken from this post:
volatile int ready;
int message[100];
void foo(int i)
{
message[i/10] = 42;
ready = 1;
}
At -O2 and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready first, so it can be overlapped with computation of i/10 (volatile does not save you!):
leaq _message(%rip), %rax
movl $1, _ready(%rip) ; <-- whoa Nelly!
movq %rsp, %rbp
sarl $2, %edx
subl %edi, %edx
movslq %edx,%rdx
movl $42, (%rax,%rdx,4)
This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready before accessing the contents of message then you have a nasty and obscure race.
Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:
// load with 'consume' (data-dependent) memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T>
T load_consume(T const* addr)
{
T v = *const_cast<T const volatile*>(addr);
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
return v;
}
// store with 'release' memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T>
void store_release(T* addr, T v)
{
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
*const_cast<T volatile*>(addr) = v;
}
I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.
Besides correctness, these libraries can simplify your code and clarify your intent:
// thread 1
std::atomic<int> foo; // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);
// thread 2
int tmp = foo.load(std::memory_order_acquire);
Using explicit memory ordering, foo's inter-thread relationship is clear.

May be this thread is ancient, but the C++ 11 standard DOES have a thread library and also a vast atomic library for atomic operations. The purpose is specifically for concurrency support and avoid data races.
The relevant header is atomic

It's generally a really, really bad idea to depend on this, as you could end up with bad things happening and only one some architectures. The best solution would be to use a guaranteed atomic API, for example the Windows Interlocked api.

If your C++ implementation supplies the library of atomic operations specified by n2145 or some variant thereof, you can presumably rely on it. Otherwise, you cannot in general rely on "anything" about atomicity at the language level, since multitasking of any kind (and therefore atomicity, which deals with multitasking) is not specified by the existing C++ standard.

Volatile in C++ do not plays the same role than in Java. All the cases are undefined behavior as Steve saids. Some cases can be Ok for a compiler, oa given processor architecture and with a multi-threading system, but switching the optimization flags can make your program behave differently, as the C++03 compilers don't know about threads.
C++0x defines the rules that avoid race conditions and the operations that help you to master that, but to may knowledge there is no yet a compiler that implement yet all the part of the standard related to this subject.

My answer is going to be frustrating: No, No, No, No, and No.
1-4) The compiler is allowed to do ANYTHING it pleases with a variable it writes to. It may store temporary values in it, so long as ends up doing something that would do the same thing as that thread executing in a vacuum. ANYTHING is valid
5) Nope, no guarantee. If a variable is not atomic, and you write to it on one thread, and read or write to it on another, it is a race case. The spec declares such race cases to be undefined behavior, and absolutely anything goes. That being said, you will be hard pressed to find a compiler that does not give you 7 or 8, but it IS legal for a compiler to give you something else.
I always refer to this highly comical explanation of race cases.
http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

Are C++ Reads and Writes of an int Atomic?

I have two threads, one updating an int and one reading it. This is a statistic value where the order of the reads and writes is irrelevant.
My question is, do I need to synchronize access to this multi-byte value anyway? Or, put another way, can part of the write be complete and get interrupted, and then the read happen.
For example, think of a value = 0x0000FFFF that gets incremented value of 0x00010000.
Is there a time where the value looks like 0x0001FFFF that I should be worried about? Certainly the larger the type, the more possible something like this to happen.
I've always synchronized these types of accesses, but was curious what the community thinks.

Boy, what a question. The answer to which is:
Yes, no, hmmm, well, it depends
It all comes down to the architecture of the system. On an IA32 a correctly aligned address will be an atomic operation. Unaligned writes might be atomic, it depends on the caching system in use. If the memory lies within a single L1 cache line then it is atomic, otherwise it's not. The width of the bus between the CPU and RAM can affect the atomic nature: a correctly aligned 16bit write on an 8086 was atomic whereas the same write on an 8088 wasn't because the 8088 only had an 8 bit bus whereas the 8086 had a 16 bit bus.
Also, if you're using C/C++ don't forget to mark the shared value as volatile, otherwise the optimiser will think the variable is never updated in one of your threads.

At first one might think that reads and writes of the native machine size are atomic but there are a number of issues to deal with including cache coherency between processors/cores. Use atomic operations like Interlocked* on Windows and the equivalent on Linux. C++0x will have an "atomic" template to wrap these in a nice and cross-platform interface. For now if you are using a platform abstraction layer it may provide these functions. ACE does, see the class template ACE_Atomic_Op.

IF you're reading/writing 4-byte value AND it is DWORD-aligned in memory AND you're running on the I32 architecture, THEN reads and writes are atomic.

Yes, you need to synchronize accesses. In C++0x it will be a data race, and undefined behaviour. With POSIX threads it's already undefined behaviour.
In practice, you might get bad values if the data type is larger than the native word size. Also, another thread might never see the value written due to optimizations moving the read and/or write.

You must synchronize, but on certain architectures there are efficient ways to do it.
Best is to use subroutines (perhaps masked behind macros) so that you can conditionally replace implementations with platform-specific ones.
The Linux kernel already has some of this code.

On Windows, Interlocked***Exchange***Add is guaranteed to be atomic.

To echo what everyone said upstairs, the language pre-C++0x cannot guarantee anything about shared memory access from multiple threads. Any guarantees would be up to the compiler.

No, they aren't (or at least you can't assume they are). Having said that, there are some tricks to do this atomically, but they typically aren't portable (see Compare-and-swap).

I agree with many and especially Jason. On windows, one would likely use InterlockedAdd and its friends.

Asside from the cache issue mentioned above...
If you port the code to a processor with a smaller register size it will not be atomic anymore.
IMO, threading issues are too thorny to risk it.

Lets take this example
int x;
x++;
x=x+5;
The first statement is assumed to be atomic because it translates to a single INC assembly directive that takes a single CPU cycle. However, the second assignment requires several operations so it's clearly not an atomic operation.
Another e.g,
x=5;
Again, you have to disassemble the code to see what exactly happens here.

tc,
I think the moment you use a constant ( like 6) , the instruction wouldn't be completed in one machine cycle.
Try to see the instruction set of x+=6 as compared to x++

Some people think that ++c is atomic, but have a eye on the assembly generated. For example with 'gcc -S' :
movl cpt.1586(%rip), %eax
addl $1, %eax
movl %eax, cpt.1586(%rip)
To increment an int, the compiler first load it into a register, and stores it back into the memory. This is not atomic.

Definitively NO !
That answer from our highest C++ authority, M. Boost:
Operations on "ordinary" variables are not guaranteed to be atomic.

The only portable way is to use the sig_atomic_t type defined in signal.h header for your compiler. In most C and C++ implementations, that is an int. Then declare your variable as "volatile sig_atomic_t."

Reads and writes are atomic, but you also need to worry about the compiler re-ordering your code. Compiler optimizations may violate happens-before relationship of statements in your code. By using atomic you don't have to worry about that.
...
atomic i;
soap_status = GOT_RESPONSE ;
i = 1
In the above example, the variable 'i' will only be set to 1 after we get a soap response.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js