CRITICAL_SECTION for set and get single bool value - c++

now writing complicated class and felt that I use to much CRITICAL_SECTION.
As far as I know there are atomic operations for some types, that are always executed without any hardware or software interrupt.
I want to check if I understand everything correctly.
To set or get atomic value we don't need CRITICAL_SECTION because doing that there won't be interrupts.
bool is atomic.
So there are my statements, want to ask, if they are correct, also if they are correct, what types variables may be also set or get without CRITICAL_SECTION?
P. S. I'm talking about getting or setting one single value per method, not two, not five, but one.

You don't need locks round atomic data, but internally they might lock. Note for example, C++11's std::atomic has a is_lock_free function.
bool may not be atomic. See here and here

Note: This answer applies to Windows and says nothing of other platforms.
There are no InterlockedRead or InterlockedWrite functions; simple reads and writes with the correct integer size (and alignment) are atomic on Windows ("Simple reads and writes to properly-aligned 32-bit variables are atomic operations.").
(and there are no cache problems since a properly-aligned variable is always on a single cache line).
However, reading and modifying such variables (or any other variable) are not atomic:
Read a bool? Fine. Test-And-Set a bool? Better use
InterlockedCompareExchange.
Overwrite an integer? great! Add to
it? Critical section.

Here this can be found:
Simple reads and writes to properly aligned 64-bit variables are
atomic on 64-bit Windows. Reads and writes to 64-bit values are not
guaranteed to be atomic on 32-bit Windows. Reads and writes to
variables of other sizes are not guaranteed to be atomic on any
platform.
Result should be correct but in programming it is better not to trust should. There still remains small possibility of failure because of CPU cache.

You cannot guarantee for all implementations/platforms/compilers that bool, or any other type, or most operations, are atomic. So, no, I don't believe your statements are correct. You can retool your logic or use other means of establishing atomicity, but you probably can't get away with just removing CRITICAL_SECTION usage if you rely on it.

Related

In C11/C++11, possible to mix atomic/non-atomic ops on the same memory?

Is it possible to perform atomic and non-atomic ops on the same memory location?
I ask not because I actually want to do this, but because I'm trying to understand the C11/C++11 memory model. They define a "data race" like so:
The execution of a program contains a data race if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other. Any such data race
results in undefined behavior.
-- C11 §5.1.2.4 p25, C++11 § 1.10 p21
Its the "at least one of which is not atomic" part that is troubling me. If it weren't possible to mix atomic and non-atomic ops, it would just say "on an object which is not atomic."
I can't see any straightforward way of performing non-atomic operations on atomic variables. std::atomic<T> in C++ doesn't define any operations with non-atomic semantics. In C, all direct reads/writes of an atomic variable appear to be translated into atomic operations.
I suppose memcpy() and other direct memory operations might be a way of performing a non-atomic read/write on an atomic variable? ie. memcpy(&atomicvar, othermem, sizeof(atomicvar))? But is this even defined behavior? In C++, std::atomic is not copyable, so would it be defined behavior to memcpy() it in C or C++?
Initialization of an atomic variable (whether through a constructor or atomic_init()) is defined to not be atomic. But this is a one-time operation: you're not allowed to initialize an atomic variable a second time. Placement new or an explicit destructor call could would also not be atomic. But in all of these cases, it doesn't seem like it would be defined behavior anyway to have a concurrent atomic operation that might be operating on an uninitialized value.
Performing atomic operations on non-atomic variables seems totally impossible: neither C nor C++ define any atomic functions that can operate on non-atomic variables.
So what is the story here? Is it really about memcpy(), or initialization/destruction, or something else?
I think you're overlooking another case, the reverse order. Consider an initialized int whose storage is reused to create an std::atomic_int. All atomic operations happen after its ctor finishes, and therefore on initialized memory. But any concurrent, non-atomic access to the now-overwritten int has to be barred as well.
(I'm assuming here that the storage lifetime is sufficient and plays no role)
I'm not entirely sure because I think that the second access to int would be invalid anyway as the type of the accessing expression int doesn't match the object's type at the time (std::atomic<int>). However, "the object's type at the time" assumes a single linear time progression which doesn't hold in a multi-threaded environment. C++11 in general has that solved by making such assumptions about "the global state" Undefined Behavior per se, and the rule from the question appears to fit in that framework.
So perhaps rephrasing: if a single memory location contains an atomic object as well as a non-atomic object, and if the destruction of the earliest created (older) object is not sequenced-before the creation of the other (newer) object, then access to the older object conflicts with access to the newer object unless the former is scheduled-before the latter.
disclaimer: I am not a parallelism guru.
Is it possible to mix atomic/non-atomic ops on the same memory, and if
so, how?
you can write it in the code and compile, but it will probably yield undefined behaviour.
when talking about atomics, it is important to understand what kind o problems do they solve.
As you might know, what we call in shortly "memory" is multi-layered set of entities which are capable to hold memory.
first we have the RAM, then the cache lines , then the registers.
on mono-core processors, we don't have any synchronization problem. on multi-core processors we have all of them. every core has it own set of registers and cache lines.
this casues few problems.
First one of them is memory reordering - the CPU may decide on runtime to scrumble some reading/writing instructions to make the code run faster. this may yield some strange results that are completly invisible on the high-level code that brought this set of instruction. the most classic example of this phenomanon is the "two threads - two integer" example:
int i=0;
int j=0;
thread a -> i=1, then print j
thread b -> j=1 then print i;
logically, the result "00" cannot be. either a ends first, the result may be "01", either b ends first, the result may be "10". if both of them ends in the same time, the result may be "11". yet, if you build small program which imitates this situtation and run it in a loop, very quicly you will see the result "00"
another problem is memory invisibility. like I mentioned before, the variable's value may be cached in one of the cache lines, or be stored in one of the registered. when the CPU updates a variables value - it may delay the writing of the new value back to the RAM. it may keep the value in the cache/regiter because it was told (by the compiler optimizations) that that value will be updated again soon, so in order to make the program faster - update the value again and only then write it back to the RAM. it may cause undefined behaviour if other CPU (and consequently a thread or a process) depends on the new value.
for example, look at this psuedo code:
bool b = true;
while (b) -> print 'a'
new thread -> sleep 4 seconds -> b=false;
the character 'a' may be printed infinitly, because b may be cached and never be updated.
there are many more problems when dealing with paralelism.
atomics solves these kind of issues by (in a nutshell) telling the compiler/CPU how to read and write data to/from the RAM correctly without doing un-wanted scrumbling (read about memory orders). a memory order may force the cpu to write it's values back to the RAM, or read the valuse from the RAM even if they are cached.
So, although you can mix non atomics actions with atomic ones, you only doing part of the job.
for example let's go back to the second example:
atomic bool b = true;
while (reload b) print 'a'
new thread - > b = (non atomicly) false.
so although one thread re-read the value of b from the RAM again and again but the other thread may not write false back to the RAM.
So although you can mix these kind of operations in the code, it will yield underfined behavior.
I'm interested in this topic since I have code in which sometimes I need to access a range of addresses serially, and at other times to access the same addresses in parallel with some way of managing contention.
So not exactly the situation posed by the original question which (I think) implies concurrent, or nearly so, atomic and non atomic operationsin parallel code, but close.
I have managed by some devious casting to persuade my C11 compiler to allow me to access an integer and much more usefully a pointer both atomically and non-atomically ("directly"), having established that both types are officially lock-free on my x86_64 system. That is that the sizes of the atomic and non atomic types are the same.
I definitely would not attempt to mix both types of access to an address in a parallel context, that would be doomed to fail. However I have been successful in using "direct" syntax operations in serial code and "atomic" syntax in parallel code, giving me the best of both worlds of the fastest possible access (and much simpler syntax) in serial, and safely managed contention when in parallel.
So you can do it so long as you don't try to mix both methods in parallel code and you stick to using lock-free types, which probably means up to the size of a pointer.
I'm interested in this topic since I have code in which sometimes I need to access a range of addresses serially, and at other times to access the same addresses in parallel with some way of managing contention.
So not exactly the situation posed by the original question which (I think) implies concurrent, or nearly so, atomic and non atomic operations in parallel code, but close.
I have managed by some devious casting to persuade my C11 compiler to allow me to access an integer and much more usefully a pointer both atomically and non-atomically ("directly"), having established that both types are officially lock-free on my x86_64 system. My, possibly simplistic, interpretation of that is that the sizes of the atomic and non atomic types are the same and that the hardware can update such types in a single operation.
I definitely would not attempt to mix both types of access to an address in a parallel context, i think that would be doomed to fail. However I have been successful in using "direct" syntax operations in serial code and "atomic" syntax in parallel code, giving me the best of both worlds of the fastest possible access (and much simpler syntax) in serial, and safely managed contention when in parallel.
So you can do it so long as you don't try to mix both methods in parallel code and you stick to using lock-free types, which probably means up to the size of a pointer.

Is x++; threadsafe?

If I update a variable in one thread like this:
receiveCounter++;
and then from another thread I only ever read this variable and write its value to a GUI.
Is that safe? Or could this instruction be interrupted in the middle so the value in receiveCounter is wrong when it is read by another thread? it must be so right since ++ is not atomic, it is several instructions.
I don't care about synchronizing reads and writes, it just needs to be incremented and then update in the GUI but this does not have to happen directly after each other.
What I care about is that the value cannot be wrong. Like the ++ operation being interrupted in the middle so the read value is completely off.
Do I need to lock this variable? I really do not want to since it is update very often. I could solves this by just posting a message to a MAIN thread and copy the value to a Queue (which then would need to be locked but I would not do this on every update) I guess.
But I am interested in the above problem anyway.
If one thread changes the value in a variable and another thread reads that value and the program does not synchronize the accesses it has a data race, and the behavior of the program is undefined. Change the type of receiveCounter to std::atomic<int> (assuming it's an int to begin with)
At its core it is a read-modify-write operation, they are not atomic. There are some processors around that have a dedicated instruction for it. Like Intel/AMD cores, very common, they have an INC instruction.
While that sounds like that could be atomic, since it is a single instruction, it still isn't. The x86/x64 instruction set doesn't have much to do anymore with the way the execution engine is actually implemented. Which executes RISC-like "micro-ops", the INC instruction is translated to multiple micro-ops. It can be made atomic with the LOCK prefix on the instruction. But compilers don't emit it unless they know that an atomic update is desired.
You will thus need to be explicit about it. The C++11 std::atomic<> standard addition is a good way. Your operating system or compiler will have intrinsics for it, usually named something like "Interlocked" or "__built_in".
Simple answer: NO.
because i++; is the same as i = i + 1;, it contains load, math-operation, and saving the value. so it is in general not atomic.
However the really executed operations depend on the instruction set of the CPU and might be atomic, depending on the CPU architecture. But the default is still not atomic.
Generally speaking, it is not thread-safe because the ++ operator consists in one read and one write, the pair of which is not atomic and can be interrupted in between.
Then, it probably also depends on the language/compiler/architecture, but in a typical case the increment operation is probably not thread safe.
(edited)
As for the read and write operations themselves, as long as you are not in 64-bit mode, they should be atomic, so if you don't care about other threads having the value wrong by 1, it may be ok for your case.
No. Increment operation is not atomic, hence not thread-safe.
In your case it is safe to use this operation if you don't care about its value in specific time (if you only read this variable from another thread and not trying to write to it). It will eventually increment receiveCounter's value, you just don't have any guarantees about operations order.
++ is equivalent to i=i+1.
++ is not an atomic operation so its NOT thread safe. Only read and write of primitive variables (except long and double) are atomic and hence thread safe.

std::atomic treat a pair of atomic int32 as one atomic int64?

I have a pair of unsigned int32
std::atomic<u32> _start;
std::atomic<u32> _end;
Sometimes I want to set start or end with compare exchange, so I don't want spurious failures that could be caused by using CAS on the entire 64bit pair. I just want to use 32 bit CAS.
_end.compare_exchange_strong(old_end, new_end);
Now I could fetch both start and end as one atomic 64bit read. Or two separate 32 bit reads. Wouldn't it be faster to do one 64bit atomic fetch (with the compiler adding the appropriate memory fence) rather than two separate 32 atomic bit reads with two memory fences (or would the compiler optimize that away?)
If so, how would I do that in c++11?
The standard doesn't guarantee that std::atomics have the same size as the underlying type, nor that the operations on atomic are lockfree (although they are likely to be for uint32 at least). Therefore I'm pretty sure there isn't any conforming way to combine them into one 64bit atomic operation. So you need to decide whether you want to manually combine the two variables into a 64bit one (and only work with 64bit operations) or not.
As an example the platform might not support 64bit CAS (for x86 that was added with the first Pentium IIRC, so it would not be availible when compiling 486 compatible. In that case it needs to lock somehow, so the atomic might contain both the 64bit variable and the lock. Of
Concerning the fences: Well that depends on the memory_order you specify for your operation. If the memory order specifies that the two operations need to be visible in the order they are excuted in, the compiler will obviously not be able to optimize a fence away, otherwise it might. Of course assuming you target x86 only memory_order_seq_cst will actually emit a barrierinstruction from what I remember, so anything less would impede with instruction reordering done by the compiler, but wouldn't have an actual penalty).
Of course depending on your platform you might get away with treating two std::atomic<int32> as one of int64 doing the casting either via union or reinterpret_cast, just be advised that this behaviour is not required by the standard and can (at least theoretically) stop working at anytime (new compiler verison, different optimization settings,...)
If your two ints require atomic updates, then you must treat them as a single atomic 64-bit value, you really have no other choice. Separate integer updates are not atomic and not viable. I concur that Unions are not relevent here, and suggest that instead you simply cast the pair of integers as (INT64) and perform your Cas64.
Using a critical section is overkill--use the Cas64, they only cost about 33 machine cycles (unoposed), while critical sections cost more like 100 cycles unoposed.
Note that it is commonplace to perform this exact same operation on versioned pointers, which in 32-bit consist of a 32-bit pointer and a 32-bit pointer, updated together as one, using Cas64 as described. Also, they actually do have to "line up right", because you never want such values to overhang a cache line boundary.

Synchronizing access to variable

I need to provide synchronization to some members of a structure.
If the structure is something like this
struct SharedStruct {
int Value1;
int Value2;
}
and I have a global variable
SharedStruct obj;
I want that the write from a processor
obj.Value1 = 5; // Processor B
to be immediately visible to the other processors, so that when I test the value
if(obj.Value1 == 5) { DoSmth(); } // Processor A
else DoSmthElse();
to get the new value, not some old value from the cache.
First I though that if I use volatile when writing/reading the values, it is enough. But I read that volatile can't solve this kind o issues.
The members are guaranteed to be properly aligned on 2/4/8 byte boundaries, and writes should be atomic in this case, but I'm not sure how the cache could interfere with this.
Using memory barriers (mfence, sfence, etc.) would be enough ? Or some interlocked operations are required ?
Or maybe something like
lock mov addr, REGISTER
?
The easiest would obviously be some locking mechanism, but speed is critical and can't afford locks :(
Edit
Maybe I should clarify a bit. The value is set only once (behaves like a flag). All the other threads need just to read it. That's why I think that it may be a way to force the read of this new value without using locks.
Thanks in advance!
There Ain't No Such Thing As A Free Lunch. If your data is being accessed from multiple threads, and it is necessary that updates are immediately visible by those other threads, then you have to protect the shared struct by a mutex, or a readers/writers lock, or some similar mechanism.
Performance is a valid concern when synchronizing code, but it is trumped by correctness. Generally speaking, aim for correctness first and then profile your code. Worrying about performance when you haven't yet nailed down correctness is premature optimization.
Use explicitly atomic instructions. I believe most compilers offer these as intrinsics. Compare and Exchange is another good one.
If you intend to write a lockless algorithm, you need to write it so that your changes only take effect when conditions are as expected.
For example, if you intend to insert a linked list object, use the compare/exchange stuff so that it only inserts if the pointer still points at the same location when you actually do the update.
Or if you are going to decrement a reference count and free the memory at count 0, you will want to pre-free it by making it unavailable somehow, check that the count is still 0 and then really free it. Or something like that.
Using a lock, operate, unlock design is generally a lot easier. The lock-free algorithms are really difficult to get right.
All the other answers here seem to hand wave about the complexities of updating shared variables using mutexes, etc. It is true that you want the update to be atomic.
And you could use various OS primitives to ensure that, and that would be good
programming style.
However, on most modern processors (certainly the x86), writes of small, aligned scalar values is atomic and immediately visible to other processors due to cache coherency.
So in this special case, you don't need all the synchronizing junk; the hardware does the
atomic operation for you. Certainly this is safe with 4 byte values (e.g., "int" in 32 bit C compilers).
So you could just initialize Value1 with an uninteresting value (say 0) before you start the parallel threads, and simply write other values there. If the question is exiting the loop on a fixed value (e.g., if value1 == 5) this will be perfectly safe.
If you insist on capturing the first value written, this won't work. But if you have a parallel set of threads, and any value written other than the uninteresting one will do, this is also fine.
I second peterb's answer to aim for correctness first. Yes, you can use memory barriers here, but they will not do what you want.
You said immediately. However, how immediate this update ever can be, you could (and will) end up with the if() clause being executed, then the flag being set, and than the DoSmthElse() being executed afterwards. This is called a race condition...
You want to synchronize something, it seems, but it is not this flag.
Making the field volatile should make the change "immediately" visible in other threads, but there is no guarantee that the instant at which thread A executes the update doesn't occur after thread B tests the value but before thread B executes the body of the if/else statement.
It sounds like what you really want to do is make that if/else statement atomic, and that will require either a lock, or an algorithm that is tolerant of this sort of situation.

Is this code thread-safe?

This is a simplified version of some code I'm currently maintaining:
int SomeFunc()
{
const long lIndex = m_lCurrentIndex;
int nSum = 0;
nSum += m_someArray[lIndex];
nSum += m_someArray[lIndex];
return nSum;
}
lCurrentIndex is updated periodically by another thread. The question is; will making a local copy of m_CurrentIndex make sure both accesses to m_someArray uses the same index?
Please note that this is a simplified example; I'm thinking about the concept of making a local copy, not the exact piece of code shown here. I know the compiler will put the value in a register, but that is still a local copy as opposed to reading from lCurrentIndex twice.
Thanks!
Edit: The initial assignment is safe, both are guaranteed to be 32 bit in our setup.
Edit2: And they are correctly aligned on a 32bit boundary (forgot about that one)
No, the initialisation of the local, which reads a shared variable, is not necessarily atomic. (consider what code would be needed on an 8-bit platform, for example) In general, the only way to write thread safe code is to use atomic operations specified by your compiler and/or OS, or to use OS locking features.
will making a local copy of
m_CurrentIndex make sure both accesses
to m_someArray uses the same index?
In the same execution of SomeFunc, yes, of course. A local integer variable (lIndex) will not magically change its value in the middle of the function.
Of course, the following are also true: the actual value of m_someArray[lIndex] (as opposed to that of lIndex) might change; m_someArray in itself might change; and what Neil said about the validity of lIndex's initial value.
Yes the copy of the current index will make sure both array accesses use the same index. That is not really what I would have thought of as "thread safe" though. You need to concern yourself with concurrent access to shared variables. To me that looks like the access to the array could be an area of potential concern as well.
This should be thread safe (at least it will on all the compilers/OSes I've worked with). However, to be extra doubly sure you could declare m_lCurrentIndex as volatile. Then the compiler will know that it might change at any time.
Another question to ask here is: is the copy operation from m_lCurrentIndex to lIndex an atomic operation? If it isn’t you might end up using very weird values which will probably do nothing good. :)
Point is: when you are using multiple threads there won’t be a way around some kind of locking or synchronization.
Concept of local copy
I'm thinking about the concept of making a local copy, not the exact piece of code shown here.
This question cannot be answered without knowing more details. It boils down into the questions if this "making a local copy" of m_lCurrentIndex into lIndex is atomic.
Assuming x86 and assuming m_lCurrentIndex is DWORD aligned and assuming long represents DWORD (which is true on most x86 compilers), then yes, this is atomic. Assuming x64 and assuming long represents DWORD and m_lCurrentIndex is DWORD aligned or long represents 64b word and m_lCurrentIndex is 64b aligned again yes, this is atomic. On other platforms or without the alignment guarantee two or more physical reads may be required for the copy.
Even without local copy being atomic you still may be able to make it lock-less and thread safe using CAS style loop (be optimistic and assume you can do without locking, after doing the operation check if everything went OK, if not, rollback and try again), but it may be a lot more work and the result will be lock-less, but not wait-less.
Memory barries
A word of caution: once you will move one step forward, you will most likely be handling multiple variables simultaneously, or handling shared variables which act as pointers or indices to access other shared variables. At that point things will start more and more complicated, as from that point you need to consider things like read / write reordering and memory barriers. See also How can I write a lock free structure
Yes, it will access the same element of the array. It is like you are taking a snapshot of the value of m_lCurrentIndex into the local variable. Since the local variable has its own memory whatever you do to m_lCurrentIndex will have no effect on the local variable. However, note that since the assignment operation is not guaranteed to be atomic you may very well end up with an invalid value in lIndex. This happens if you update m_lCurrentIndex from one thread and at the same time try the assignement into lIndex from the other thread.