Are C++ Reads and Writes of an int Atomic? - c++

I have two threads, one updating an int and one reading it. This is a statistic value where the order of the reads and writes is irrelevant.
My question is, do I need to synchronize access to this multi-byte value anyway? Or, put another way, can part of the write be complete and get interrupted, and then the read happen.
For example, think of a value = 0x0000FFFF that gets incremented value of 0x00010000.
Is there a time where the value looks like 0x0001FFFF that I should be worried about? Certainly the larger the type, the more possible something like this to happen.
I've always synchronized these types of accesses, but was curious what the community thinks.

Boy, what a question. The answer to which is:
Yes, no, hmmm, well, it depends
It all comes down to the architecture of the system. On an IA32 a correctly aligned address will be an atomic operation. Unaligned writes might be atomic, it depends on the caching system in use. If the memory lies within a single L1 cache line then it is atomic, otherwise it's not. The width of the bus between the CPU and RAM can affect the atomic nature: a correctly aligned 16bit write on an 8086 was atomic whereas the same write on an 8088 wasn't because the 8088 only had an 8 bit bus whereas the 8086 had a 16 bit bus.
Also, if you're using C/C++ don't forget to mark the shared value as volatile, otherwise the optimiser will think the variable is never updated in one of your threads.

At first one might think that reads and writes of the native machine size are atomic but there are a number of issues to deal with including cache coherency between processors/cores. Use atomic operations like Interlocked* on Windows and the equivalent on Linux. C++0x will have an "atomic" template to wrap these in a nice and cross-platform interface. For now if you are using a platform abstraction layer it may provide these functions. ACE does, see the class template ACE_Atomic_Op.

IF you're reading/writing 4-byte value AND it is DWORD-aligned in memory AND you're running on the I32 architecture, THEN reads and writes are atomic.

Yes, you need to synchronize accesses. In C++0x it will be a data race, and undefined behaviour. With POSIX threads it's already undefined behaviour.
In practice, you might get bad values if the data type is larger than the native word size. Also, another thread might never see the value written due to optimizations moving the read and/or write.

You must synchronize, but on certain architectures there are efficient ways to do it.
Best is to use subroutines (perhaps masked behind macros) so that you can conditionally replace implementations with platform-specific ones.
The Linux kernel already has some of this code.

On Windows, Interlocked***Exchange***Add is guaranteed to be atomic.

To echo what everyone said upstairs, the language pre-C++0x cannot guarantee anything about shared memory access from multiple threads. Any guarantees would be up to the compiler.

No, they aren't (or at least you can't assume they are). Having said that, there are some tricks to do this atomically, but they typically aren't portable (see Compare-and-swap).

I agree with many and especially Jason. On windows, one would likely use InterlockedAdd and its friends.

Asside from the cache issue mentioned above...
If you port the code to a processor with a smaller register size it will not be atomic anymore.
IMO, threading issues are too thorny to risk it.

Lets take this example
int x;
x++;
x=x+5;
The first statement is assumed to be atomic because it translates to a single INC assembly directive that takes a single CPU cycle. However, the second assignment requires several operations so it's clearly not an atomic operation.
Another e.g,
x=5;
Again, you have to disassemble the code to see what exactly happens here.

tc,
I think the moment you use a constant ( like 6) , the instruction wouldn't be completed in one machine cycle.
Try to see the instruction set of x+=6 as compared to x++

Some people think that ++c is atomic, but have a eye on the assembly generated. For example with 'gcc -S' :
movl cpt.1586(%rip), %eax
addl $1, %eax
movl %eax, cpt.1586(%rip)
To increment an int, the compiler first load it into a register, and stores it back into the memory. This is not atomic.

Definitively NO !
That answer from our highest C++ authority, M. Boost:
Operations on "ordinary" variables are not guaranteed to be atomic.

The only portable way is to use the sig_atomic_t type defined in signal.h header for your compiler. In most C and C++ implementations, that is an int. Then declare your variable as "volatile sig_atomic_t."

Reads and writes are atomic, but you also need to worry about the compiler re-ordering your code. Compiler optimizations may violate happens-before relationship of statements in your code. By using atomic you don't have to worry about that.
...
atomic i;
soap_status = GOT_RESPONSE ;
i = 1
In the above example, the variable 'i' will only be set to 1 after we get a soap response.

Related

Can I use char variable without lock in the multi-threading case

As c/c++ standard said, the size of char must be 1. As my understanding, that means CPU guarantees that any read or write on a char must be done in one instruction.
Let's say we have many threads, which share a char variable:
char target = 1;
// thread a
target = 0;
// thread b
target = 1;
// thread 1
while (target == 1) {
// do something
}
// thread 2
while (target == 1) {
// do something
}
In a word, there are two kinds of threads: some of them are to set target into 0 or 1, and the others are to do some tasks if target == 1. The goal is that we can control the task-threds through modifying the value of target.
As my understanding, it doesn't seem that we need to use mutex/lock at all. But my coding experience gave me a strong feeling that we must use mutex/lock in this case.
I'm confused now. Should I use mutex/lock or not in this case?
You see, I can understand why we need mutex/lock in other cases, such as i++. Because i++ can't be done in only one instruction. So can target = 0 be done in one instruction, right? If so, does it mean that we don't need mutex/lock in this case?
Well, I know that we could use std::atomic, so my question is: is it OK to not use neither mutex/lcok nor std::atomic.
std::atomic guarantees that accessing a variable is atomic. From cppreference:
Each instantiation and full specialization of the std::atomic template
defines an atomic type. If one thread writes to an atomic object while
another thread reads from it, the behavior is well-defined (see memory
model for details on data races).
When a char actually is atomic (being size 1, is not sufficient), then std::atomic<char> needs no extra synchronization. However, on a platform where char is not atomic, std::atomic<char> guarantees that it can be read and written atomically by using a mutex or similar.
In practice, I'd expect char to be atomic, but the standard does not guarantee that.
Also consider that operations like eg += read and write the value, hence atomic reads and writes alone are not sufficient to safely call +=, while std::atomic<T> has a proper operator+=.
TL;DR
I'm confused now. Should I use mutex/lock or not in this case?
Let someone else take that decision for you. When you want something atomic, use a std::atomic<something> unless you want fine grained control over the synchronisation.
TL;DR
is it OK to not use neither mutex/lcok nor std::atomic
No
In general, it's not ok to assume things. If you need guarantees for something, then make sure you have them.
This is closely related to a common logical fallacy. Just because you cannot imagine why something could be true, that does not mean that it's true.
Longer version
As c/c++ standard said
There's no such thing as "C/C++" and definitely not "the C/C++ standard". They are two completely different languages with different standards. However, they do agree on this point. sizeof (char) is 1 in both languages.
(Sidenote: sizeof 'a' will yield different results.)
As my understanding, that means CPU guarantees that any read or write on a char must be done in one instruction.
That's not correct. The CPU has it's own specification, completely separate from the language standards. And there's nothing that says that this has to be true, even if it probably are in most or all cases.
Because i++ can't be done in only one instruction.
That is CPU dependent. The x86 architecture has an instruction for this. https://c9x.me/x86/html/file_module_x86_id_140.html
As my understanding, it doesn't seem that we need to use mutex/lock at all. But my coding experience gave me a strong feeling that we must use mutex/lock in this case.
Even if the target CPU does read and write in one instruction, which it probably does, there's nothing that says that the C or C++ code needs to be compiled to just that instruction.
The standards for both C and C++ describes the behavior of the code. Not how it is converted to assembly.
So no, you cannot make the assumptions you're doing.
In general, it cannot be assumed that reading or writing a char is an atomic operation. However, the target architecture may provide that guarantee. For embedded C programs it is common practice to rely on such underlying guarantees to avoid the overhead of synchronization mechanisms in certain situations.
In the example in the question it must be noted that even if reading/writing target is an atomic operation, the value could be changed at any time, so there is no guarantee that it will be 1 inside the while loops.

do integer reads need to be critical section protected?

I have come across C++03 some code that takes this form:
struct Foo {
int a;
int b;
CRITICAL_SECTION cs;
}
// DoFoo::Foo foo_;
void DoFoo::Foolish()
{
if( foo_.a == 4 )
{
PerformSomeTask();
EnterCriticalSection(&foo_.cs);
foo_.b = 7;
LeaveCriticalSection(&foo_.cs);
}
}
Does the read from foo_.a need to be protected? e.g.:
void DoFoo::Foolish()
{
EnterCriticalSection(&foo_.cs);
int a = foo_.a;
LeaveCriticalSection(&foo_.cs);
if( a == 4 )
{
PerformSomeTask();
EnterCriticalSection(&foo_.cs);
foo_.b = 7;
LeaveCriticalSection(&foo_.cs);
}
}
If so, why?
Please assume the integers are 32-bit aligned. The platform is ARM.
Technically yes, but no on many platforms. First, let us assume that int is 32 bits (which is pretty common, but not nearly universal).
It is possible that the two words (16 bit parts) of a 32 bit int will be read or written to separately. On some systems, they will be read separately if the int isn't aligned properly.
Imagine a system where you can only do 32-bit aligned 32 bit reads and writes (and 16-bit aligned 16 bit reads and writes), and an int that straddles such a boundary. Initially the int is zero (ie, 0x00000000)
One thread writes 0xBAADF00D to the int, the other reads it "at the same time".
The writing thread first writes 0xBAAD to the high word of the int. The reader thread then reads the entire int (both high and low) getting 0xBAAD0000 -- which is a state that the int was never put into on purpose!
The writer thread then writes the low word 0xF00D.
As noted, on some platforms all 32 bit reads/writes are atomic, so this isn't a concern. There are other concerns, however.
Most lock/unlock code includes instructions to the compiler to prevent reordering across the lock. Without that prevention of reordering, the compiler is free to reorder things so long as it behaves "as-if" in a single threaded context it would have worked that way. So if you read a then b in code, the compiler could read b before it reads a, so long as it doesn't see an in-thread opportunity for b to be modified in that interval.
So possibly the code you are reading is using these locks to make sure that the read of the variable happens in the order written in the code.
Other issues are raised in the comments below, but I don't feel competent to address them: cache issues, and visibility.
Looking at this it seems that arm has quite relaxed memory model so you need a form of memory barrier to ensure that writes in one thread are visible when you'd expect them in another thread. So what you are doing, or else using std::atomic seems likely necessary on your platform. Unless you take this into account you can see updates out of order in different threads which would break your example.
I think you can use C++11 to ensure that integer reads are atomic, using (for example) std::atomic<int>.
The C++ standard says that there's a data race if one thread writes to a variable at the same time as another thread reads from that variable, or if two threads write to the same variable at the same time. It further says that a data race produces undefined behavior. So, formally, you must synchronize those reads and writes.
There are three separate issues when one thread reads data that was written by another thread. First, there is tearing: if writing requires more than a single bus cycle, it's possible for a thread switch to occur in the middle of the operation, and another thread could see a half-written value; there's an analogous problem if a read requires more than a single bus cycle. Second, there's visibility: each processor has its own local copy of the data that it's been working on recently, and writing to one processor's cache does not necessarily update another processor's cache. Third, there's compiler optimizations that reorder reads and writes in ways that would be okay within a single thread, but will break multi-threaded code. Thread-safe code has to deal with all three problems. That's the job of synchronization primitives: mutexes, condition variables, and atomics.
Although the integer read/write operation indeed will most likely be atomic, the compiler optimizations and processor cache will still give you problems if you don't do it properly.
To explain - the compiler will normally assume that the code is single-threaded and make many optimizations that rely on that. For example, it might change the order of instructions. Or, if it sees that the variable is only written and never read, it might optimize it away entirely.
The CPU will also cache that integer, so if one thread writes it, the other one might not get to see it until a lot later.
There are two things you can do. One is to wrap in in critical section like in your original code. The other is to mark the variable as volatile. That will signal the compiler that this variable will be accessed by multiple threads and will disable a range of optimizations, as well as placing special cache-sync instructions (aka "memory barriers") around accesses to the variable (or so I understand). Apparently this is wrong.
Added: Also, as noted by another answer, Windows has Interlocked APIs that can be used to avoid these issues for non-volatile variables.

Thread safety and bit-field

I know that bit-field are compiler dependant, but I haven't find documentation about thread safety on bit-field with the latest g++ and Visual C++ 2010.
Does the operations on a bit-field member are atomic ?
"Thread safe" is unfortunately a very overloaded term in programming.
If you mean atomic access to bit-fields, the answer is no (at least on all processors I'm aware of). You have atomic access to 32bit memory locations on 32bit machines, but that only means you'll read or write a whole 32 bit value. This does not mean another thread won't do the same thing. If you're looking to stop that you likely want synchronization.
If you mean synchronized access to bit-fields, the answer is also no, unless you wrap your access in a higher level synchronization primitive (which are often built on atomic operations).
In short, the compiler does not provide atomic or synchronized access to bit fields without extra work on your part.
Does that help?
Edit: Dr. Dan Grossman has two nice lectures on atomicity and synchronization I found on UOregon's CS department page.
When writing to a bit-field, there may be a time window in which any attempt by another thread to access (read or write) any (same or different) bit-field in the same structure will result in Undefined Behavior, meaning anything can happen. When reading a bit field, there may be a time window when any attempt by another thread to write any bit-field in the same structure will result in Undefined Behavior.
If you cannot practically use separate variables for the bit-fields in question, you may be able to store multiple bit-fields in an integer and update them atomically by creating a union between the bit-field structure and a 32-bit integer, and then using a CompareExchange sequence:
Read the value of the bitfield as an Int32.
Convert it to a bitfield structure
Update the structure
Convert the structure back to an Int32.
Use CompareExchange to overwrite the variable with the new value only if it still holds the value read in (1); if the value has changed, start over with step (1).
For this approach to work well, steps 2-4 must be fast. The longer they take, the greater the likelihood that the CompareExchange in step 5 will fail, and thus the more times steps 2-4 will have to be re-executed before the CompareExchange succeeds.
If you want to update bitfields in thread-safe way you need to split your bit-fields into separate flags and use regular ints to store them. Accessing separate machine-words is thread-safe (although you need to consider optimizations and cache coherency on multiprocessor systems).
See the Windows Interlocked Functions
Also see this related SO question
just simple use atomic.AddInt32
example:
atomic.AddInt32(&intval, 1 << 0) //set the first bit
atomic.AddInt32(&intval, 1 << 1) //set the second bit
atomic.AddInt32(&intval, -(1 << 1 + 1 << 0)) //clear the first and second bit
the code is in Go, i think c++ also has some thing like atomic.AddInt32

"pseudo-atomic" operations in C++

So I'm aware that nothing is atomic in C++. But I'm trying to figure out if there are any "pseudo-atomic" assumptions I can make. The reason is that I want to avoid using mutexes in some simple situations where I only need very weak guarantees.
1) Suppose I have globally defined volatile bool b, which
initially I set true. Then I launch a thread which executes a loop
while(b) doSomething();
Meanwhile, in another thread, I execute b=true.
Can I assume that the first thread will continue to execute? In other words, if b starts out as true, and the first thread checks the value of b at the same time as the second thread assigns b=true, can I assume that the first thread will read the value of b as true? Or is it possible that at some intermediate point of the assignment b=true, the value of b might be read as false?
2) Now suppose that b is initially false. Then the first thread executes
bool b1=b;
bool b2=b;
if(b1 && !b2) bad();
while the second thread executes b=true. Can I assume that bad() never gets called?
3) What about an int or other builtin types: suppose I have volatile int i, which is initially (say) 7, and then I assign i=7. Can I assume that, at any time during this operation, from any thread, the value of i will be equal to 7?
4) I have volatile int i=7, and then I execute i++ from some thread, and all other threads only read the value of i. Can I assume that i never has any value, in any thread, except for either 7 or 8?
5) I have volatile int i, from one thread I execute i=7, and from another I execute i=8. Afterwards, is i guaranteed to be either 7 or 8 (or whatever two values I have chosen to assign)?
There are no threads in standard C++, and Threads cannot be implemented as a library.
Therefore, the standard has nothing to say about the behaviour of programs which use threads. You must look to whatever additional guarantees are provided by your threading implementation.
That said, in threading implementations I've used:
(1) yes, you can assume that irrelevant values aren't written to variables. Otherwise the whole memory model goes out the window. But be careful that when you say "another thread" never sets b to false, that means anywhere, ever. If it does, that write could perhaps be re-ordered to occur during your loop.
(2) no, the compiler can re-order the assignments to b1 and b2, so it is possible for b1 to end up true and b2 false. In such a simple case I don't know why it would re-order, but in more complex cases there might be very good reasons.
[Edit: oops, by the time I got to answering (2) I'd forgotten that b was volatile. Reads from a volatile variable won't be re-ordered, sorry, so yes on a typical threading implementation (if there is any such thing), you can assume that you won't end up with b1 true and b2 false.]
(3) same as 1. volatile in general has nothing to do with threading at all. However, it is quite exciting in some implementations (Windows), and might in effect imply memory barriers.
(4) on an architecture where int writes are atomic yes, although volatile has nothing to do with it. See also...
(5) check the docs carefully. Likely yes, and again volatile is irrelevant, because on almost all architectures int writes are atomic. But if int write is not atomic, then no (and no for the previous question), even if it's volatile you could in principle get a different value. Given those values 7 and 8, though, we're talking a pretty weird architecture for the byte containing the relevant bits to be written in two stages, but with different values you could more plausibly get a partial write.
For a more plausible example, suppose that for some bizarre reason you have a 16 bit int on a platform where only 8bit writes are atomic. Odd, but legal, and since int must be at least 16 bits you can see how it could come about. Suppose further that your initial value is 255. Then increment could legally be implemented as:
read the old value
increment in a register
write the most significant byte of the result
write the least significant byte of the result.
A read-only thread which interrupted the incrementing thread between the third and fourth steps of that, could see the value 511. If the writes are in the other order, it could see 0.
An inconsistent value could be left behind permanently if one thread is writing 255, another thread is concurrently writing 256, and the writes get interleaved. Impossible on many architectures, but to know that this won't happen you need to know at least something about the architecture. Nothing in the C++ standard forbids it, because the C++ standard talks about execution being interrupted by a signal, but otherwise has no concept of execution being interrupted by another part of the program, and no concept of concurrent execution. That's why threads aren't just another library - adding threads fundamentally changes the C++ execution model. It requires the implementation to do things differently, as you'll eventually discover if for example you use threads under gcc and forget to specify -pthreads.
The same could happen on a platform where aligned int writes are atomic, but unaligned int writes are permitted and not atomic. For example IIRC on x86, unaligned int writes are not guaranteed atomic if they cross a cache line boundary. x86 compilers will not mis-align a declared int variable, for this reason and others. But if you play games with structure packing you could probably provoke an example.
So: pretty much any implementation will give you the guarantees you need, but might do so in quite a complicated way.
In general, I've found that it is not worth trying to rely on platform-specific guarantees about memory access, that I don't fully understand, in order to avoid mutexes. Use a mutex, and if that's too slow use a high-quality lock-free structure (or implement a design for one) written by someone who really knows the architecture and compiler. It will probably be correct, and subject to correctness will probably outperform anything I invent myself.
Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.
Consider an example taken from this post:
volatile int ready;
int message[100];
void foo(int i)
{
message[i/10] = 42;
ready = 1;
}
At -O2 and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready first, so it can be overlapped with computation of i/10 (volatile does not save you!):
leaq _message(%rip), %rax
movl $1, _ready(%rip) ; <-- whoa Nelly!
movq %rsp, %rbp
sarl $2, %edx
subl %edi, %edx
movslq %edx,%rdx
movl $42, (%rax,%rdx,4)
This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready before accessing the contents of message then you have a nasty and obscure race.
Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:
// load with 'consume' (data-dependent) memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T>
T load_consume(T const* addr)
{
T v = *const_cast<T const volatile*>(addr);
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
return v;
}
// store with 'release' memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T>
void store_release(T* addr, T v)
{
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
*const_cast<T volatile*>(addr) = v;
}
I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.
Besides correctness, these libraries can simplify your code and clarify your intent:
// thread 1
std::atomic<int> foo; // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);
// thread 2
int tmp = foo.load(std::memory_order_acquire);
Using explicit memory ordering, foo's inter-thread relationship is clear.
May be this thread is ancient, but the C++ 11 standard DOES have a thread library and also a vast atomic library for atomic operations. The purpose is specifically for concurrency support and avoid data races.
The relevant header is atomic
It's generally a really, really bad idea to depend on this, as you could end up with bad things happening and only one some architectures. The best solution would be to use a guaranteed atomic API, for example the Windows Interlocked api.
If your C++ implementation supplies the library of atomic operations specified by n2145 or some variant thereof, you can presumably rely on it. Otherwise, you cannot in general rely on "anything" about atomicity at the language level, since multitasking of any kind (and therefore atomicity, which deals with multitasking) is not specified by the existing C++ standard.
Volatile in C++ do not plays the same role than in Java. All the cases are undefined behavior as Steve saids. Some cases can be Ok for a compiler, oa given processor architecture and with a multi-threading system, but switching the optimization flags can make your program behave differently, as the C++03 compilers don't know about threads.
C++0x defines the rules that avoid race conditions and the operations that help you to master that, but to may knowledge there is no yet a compiler that implement yet all the part of the standard related to this subject.
My answer is going to be frustrating: No, No, No, No, and No.
1-4) The compiler is allowed to do ANYTHING it pleases with a variable it writes to. It may store temporary values in it, so long as ends up doing something that would do the same thing as that thread executing in a vacuum. ANYTHING is valid
5) Nope, no guarantee. If a variable is not atomic, and you write to it on one thread, and read or write to it on another, it is a race case. The spec declares such race cases to be undefined behavior, and absolutely anything goes. That being said, you will be hard pressed to find a compiler that does not give you 7 or 8, but it IS legal for a compiler to give you something else.
I always refer to this highly comical explanation of race cases.
http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

Are C++ int operations atomic on the mips architecture

I wonder if I could read or write shared int value without locking on mips cpu (especially Amazon or Danube). What I mean is if such a read or write are atomic (other thread can't interrupt them). To be clear - I don't want to prevent the race between threads, but I care if int value itself is not corrupted.
Assuming that the compiler aligns all ints at the boundaries of cpu word, it should be possible. I use gcc (g++). Tests also shows that it seems work correctly. But maybe someone knows it for sure?
Use gcc's builtin atomic operations and you'll get warnings if they're not supported:
http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
It looks like combinations of addition/subtraction and testing (at least) are possible on the hardware:
http://rswiki.csie.org/lxr/http/source/include/asm-mips/atomic.h
Which operations? It's plausible that int a; a=42; is atomic. There's no guarantee that a= a+42; is atomic, or in any variants like with ++. Also, you have to be concerned about what the optimizer might do, say by holding an intermediate value in a register when convenient.
Depends on the operation. Having stared at enough disassembled programs in MIPS, I know that only some operations are atomic.
assignment of a new value could be 1 op or more, you'd have to look at the assembly.
eg:
x = 0; // move $a0, $0
x = 0x50000; // lui $a0, 0x0005
x = 0x50234; // lui $a0, 0x0005
// ori $a0, 0x0234
MIPS assembley reference or here
see here to see danube and amazon are MIPS32, which my example covers, and therefore not all 32bit integer immediate can be written atomically.
see R10000 in above posting is MIPS64. Since a 32bit value would be half the register size, it could be an atomic load/write.
The question invites misleading answers.
You can only authoritatively answer "is it atomic" questions about assembly/machine language.
Any given C/C++ code fragment makes no guarantees, can vary depending on exactly which compiler (and version) you use, etc. (Unless you call some platform-specific intrinsic or whatnot that is guaranteed to compile to a known atomic machine instruction.)