Could optimization break thread safety? - c++

class Foo{
public:
void fetch(void)
{
int temp=-1;
someSlowFunction(&temp);
bar=temp;
}
int getBar(void)
{
return bar;
}
void someSlowFunction(int *ptr)
{
usleep(10000);
*ptr=0;
}
private:
int bar;
};
I'm new to atomic operations so I may get some concepts wrong.
Considering above code, assuming loading and storing int type are atomic[Note 1], then getBar() could only get the bar before or after a fetch().
However, if a compiler is smart enough, it could optimize away temp and change it to:
void Foo::fetch(void)
{
bar=-1;
someSlowFunction(&bar);
}
Then in this case getBar() could get -1 or other intermediate state inside someSlowFunction() under certain timing conditions.
Is this risk possible? Does the standard prevent such optimizations?
Note 1: http://preshing.com/20130618/atomic-vs-non-atomic-operations/
The language standards have nothing to say about atomicity in this
case. Maybe integer assignment is atomic, maybe it isn’t. Since
non-atomic operations don’t make any guarantees, plain integer
assignment in C is non-atomic by definition.
In practice, we usually know more about our target platforms than
that. For example, it’s common knowledge that on all modern x86, x64,
Itanium, SPARC, ARM and PowerPC processors, plain 32-bit integer
assignment is atomic as long as the target variable is naturally
aligned. You can verify it by consulting your processor manual and/or
compiler documentation. In the games industry, I can tell you that a
lot of 32-bit integer assignments rely on this particular guarantee.
I'm targeting ARM Cortex-A8 here, so I consider this a safe assumption.

Compiler optimization can not break thread safety!
You might however experience issues with optimizations in code that appeared to be thread safe but really only worked because of pure luck.
If you access data from multiple threads, you must either
Protect the appropriate sections using std::mutex or the like.
or, use std::atomic.
If not, the compiler might do optimizations that is next to impossible to expect.
I recommend watching CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades), Part I" and Part II

After answering question in comments, it makes more sense. Let's analyze thread-safety here given that fetch() and getBar() are called from different threads. Several points will need to be considered:
'Dirty reads', or garabage reading due to interrupted write. While a general possibility, does not happen on 3 chip families I am familiar with for aligned ints. Let's discard this possibility for now, and just assume read values are alwats clean.
'Improper reads', or an option of reading something from bar which was never written there. Would it be possible? Optimizing away temp on the compiler part is, in my opinion, possible, but I am no expert in this matter. Let's assume it does not happen. The caveat would still be there - you might NEVER see the new value of bar. Not in a reasonable time, simply never.

The compiler can apply any transformation that results in the same observable behavior. Assignments to local non-volatile variables are not part of the observable behavior. The compiler may just decide to eliminate temp completely and just use bar directly. It may also decide that bar will always end up with the value zero, and set at the beginning of the function (at least in your simplified example).
However, as you can read in James' answer on a related question the situation is more complex because modern hardware also optimizes the executed code. This means that the CPU re-orders instructions, and neither the programmer or the compiler has influence on that without using special instructions. You need either a std::atomic, you memory fences explicitly (I wouldn't recommend it because it is quite tricky), or use a mutex which also acts as a memory fence.

It probably wouldn't optimize that way because of the function call in the middle, but you can define temp as volatile, this will tell the compiler not to perform these kinds of optimizations.
Depending on the platform, you can certainly have cases where multibyte quantities are in an inconsistent state. It doesn't even need to be thread related. For example, a device experiencing low voltage during a power brown-out can leave memory in an inconsistent state. If you have pointers getting corrupted, then it's usually bad news.
One way I approached this on a system without mutexes was to ensure every piece of data could be verified. For example, for every datum T, there would be a validation checksum C and a backup U.
A set operation would be as follows:
U = T
T = new value
C = checksum(T)
And a get operation would be as follows:
is checksum(T) == C
yes: return T
no: return U
This guarantees that the whatever is returned is in a consistent state. I would apply this algorithm to the entire OS, so for example, entire files could be restored.
If you want to ensure atomicity without getting into complex mutexes and whatever, try to use the smallest types possible. For example, does bar need to be an int or will unsigned char or bool suffice?

Related

Two Different Processes With 2 std::atomic Variables on Same Address?

I read C++ Standard (n4713)'s § 32.6.1 3:
Operations that are lock-free should also be address-free. That is,
atomic operations on the same memory location via two different
addresses will communicate atomically. The implementation should not
depend on any per-process state. This restriction enables
communication by memory that is mapped into a process more than once
and by memory that is shared between two processes.
So it sounds like it is possible to perform a lock-free atomic operation on the same memory location. I wonder how it can be done.
Let's say I have a named shared memory segment on Linux (via shm_open() and mmap()). How can I perform a lockfree operation on the first 4 bytes of the shared memory segment for example?
At first, I thought I could just reinterpret_cast the pointer to std::atomic<int32_t>*. But then I read this. It first points out that std::atomic might not have the same size of T or alignment:
When we designed the C++11 atomics, I was under the misimpression that
it would be possible to semi-portably apply atomic operations to data
not declared to be atomic, using code such as
int x; reinterpret_cast<atomic<int>&>(x).fetch_add(1);
This would clearly fail if the representations of atomic and int
differ, or if their alignments differ. But I know that this is not an
issue on platforms I care about. And, in practice, I can easily test
for a problem by checking at compile time that sizes and alignments
match.
Tho, it is fine with me in this case because I use a shared memory on the same machine and casting the pointer in two different processes will "acquire" the same location. However, the article states that the compiler might not treat the casted pointer as a pointer to an atomic type:
However this is not guaranteed to be reliable, even on platforms on
which one might expect it to work, since it may confuse type-based
alias analysis in the compiler. A compiler may assume that an int is
not also accessed as an atomic<int>. (See 3.10, [Basic.lval], last
paragraph.)
Any input is welcome!
The C++ standard doesn't concern itself with multiple processes and no guarantees were given outside of a multi-threaded environment.
However, the standard does recommend that implementations of lock-free atomics be usable across processes, which is the case in most real implementations.
This answer will assume atomics behave more or less the same with processes as with threads.
The first solution requires C++20 atomic_ref
void* shared_mem = /* something */
auto p1 = new (shared_mem) int; // For creating the shared object
auto p2 = (int*)shared_mem; // For getting the shared object
std::atomic_ref<int> i{p2}; // Use i as if atomic<int>
You need to make sure the shared int has std::atomic_ref<int>::required_alignment alignment; typically the same as sizeof(int). Normally you'd use alignas() on a struct member or variable, but in shared memory the layout is up to you (relative to a known page boundary).
This prevents the presence of opaque atomic types existing in the shared memory, which gives you precise control over what exactly goes in there.
A solution prior C++20 would be
auto p1 = new (shared_mem) atomic<int>; // For creating the shared object
auto p2 = (atomic<int>*)shared_mem; // For getting the shared object
auto& i = *p2;
Or using C11 atomic_load and atomic_store
_Atomic int* i = (_Atomic int*)shared_mem;
atomic_store(i, 42);
int i2 = atomic_load(i);
Alignment requirements are the same here, alignof(std::atomic<int>) or _Alignof(atomic_int).
Yes, the C++ standard is a bit mealy-mouthed about all this.
If you are on Windows (which you probably aren't) then you can use InterlockedExchange() etc, which offer all the required semantics and don't care where the referenced object is (it's a LONG *).
On other platforms, gcc has some atomic builtins which might help with this. They might free you from the tyranny of the standards writers. Trouble is, it's hard to test if the resulting code is bullet-proof.
On all mainstream platforms, std::atomic<T> does have the same size as T, although possibly higher alignment requirement if T has alignof < sizeof.
You can check these assumptions with:
static_assert(sizeof(T) == sizeof(std::atomic<T>),
"atomic<T> isn't the same size as T");
static_assert(std::atomic<T>::is_always_lock_free, // C++17
"atomic<T> isn't lock-free, unusable on shared mem");
auto atomic_ptr = static_cast<atomic<int>*>(some_ptr);
// beware strict-aliasing violations
// don't also access the same memory via int*
// unless you're aware of possible issues
// also make sure that the ptr is aligned to alignof(atomic<T>)
// otherwise you might get tearing (non-atomicity)
On exotic C++ implementations where these aren't true, people that want to use your code on shared memory will need to do something else.
Or if all accesses to shared memory from all processes consistently use atomic<T> then there's no problem, you only need lock-free to guarantee address-free. (You do need to check this: std::atomic uses a hash table of locks for non-lock-free. This is address-dependent, and separate processes will have separate hash tables of locks.)

Should a pointer to stack variable be volatile?

I know that I should use the volatile keyword to tell the compiler not to optimize memory read\write to variables. I also know that in most cases it should only be used to talk to non-C++ memory.
However, I would like to know if I have to use volatile when holding a pointer to some local (stack) variable.
For example:
//global or member variable
/* volatile? */bool* p_stop;
void worker()
{
/* volatile? */ bool stop = false;
p_stop = &stop;
while(!stop)
{
//Do some work
//No usage of "stop" or p_stop" here
}
}
void stop_worker()
{
*p_stop = true;
}
It looks to me like a compiler with some optimization level might see that stop is a local variable, that is never changed and could replace while(!stop) with a while(true) and thus changing *p_stop while do nothing.
So, is it required to mark the pointer as volatile in such a case?
P.S: Please do not lecture me on why not to use this, the real code that uses this hack does so for a (complex-to-explain) reason.
EDIT:
I failed to mention that these two functions run on different threads.
The worker() is a function of the first thread, and it should be stopped from another thread using the p_stop pointer.
I am not interested in knowing what better ways there are to solve the real reason that is behind this sort of hack. I simply want to know if this is defined\undefined behavior in C++ (11 for that matter), and also if this is compiler\platform\etc dependent. So far I see #Puppy saying that everyone is wrong and that this is wrong, but without referencing a specific standard that denoted this.
I understand that some of you are offended by the "don't lecture me" part, but please stick to the real question - Should I use volatile or not? or is this UB? and if you can please help me (and others) learn something new by providing a complete answer.
I simply want to know if this is defined\undefined behavior in C++ (11 for that matter)
Ta-da (from N3337, "quasi C++11")
Two expression evaluations conflict if one of them modifies a memory location [..] and the other one accesses or modifies the same memory location.
§1.10/4
and:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [..]
§1.10/21
You're accessing the (memory location of) object stop from different threads, both accesses are not atomic, thus also in no "happens before" relation. Simply put, you have a data race and thus undefined behavior.
I am not interested in knowing what better ways there are to solve the real reason that is behind this sort of hack.
Atomic operations (as defined by the C++ standard) are the only way to (reliably) solve this.
So, is it required to mark the pointer as volatile in such a case?
No. It's not required, principally because volatile doesn't even remotely cover what you need it to do in this case. You must use an actual synchronization primitive, like an atomic operation or mutex. Using volatile here is undefined behaviour and your program will explode.
volatile is NOT useful for concurrency. It may be useful for implementing concurrent primitives but it is far from sufficient.
Frankly, whether or not you want to use actual synchronization primitives is irrelevant. If you want to write correct code, you have no choice.
P.S: Please do not lecture me on why not to use this,
I am not sure what we are supposed to say. The compiler manages the stack, so anything you are doing with it is technically undefined behavior and may not work when you upgrade to the next version of the compiler.
You are also making assumptions that may be different than the compiler's assumptions when it optimizes. This is the real reason to use (or not use) volatile; you give guidance to the compiler that helps it decide whether optimizations are safe. The use of volatile tells the compiler that it should assume that these variables may change due to external influences (other threads or special hardware behavior).
So yes, in this case, it looks like you would need to mark both p_stop and stop with a volatile qualifier.
(Note: this is necessary but not sufficient, as it does not cause the appropriate behaviors to happen in a language implementation with a relaxed memory model that requires barriers to ensure correctness. See https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering )
This question simply cannot be answered from the details provided.
As is stated in the question this is an entirely unsupported way of communicating between threads.
So the only answer is:
Specify the compiler versions you're using and hope someone knows its darkest secrets or refer to your documentation. All the C++ standard will tell you is this won't work and all anyone can tell you is "might work but don't".
There isn't a "oh, come on guys everyone knows it pretty much works what do I do as the workaround? wink wink" answer.
Unless your compiler doesn't support atomics or suitably concurrent mechanisms there is no justifiable reason for doing this.
"It's not supported" isn't "complex-to-explain" so I'd be fascinated based on that code fragment to understand what possible reason there is for not doing this properly (other than ancient compiler).

C++ memory protect pointers

Is it possible to "memory protect" pointers so that it's actually impossible to change them in the code, so that attempting to change them in the code results in a bus error? I am not referring to const but some type of deeper immutability assurance on the OS level. The question applies to any OS.
(Carmack mentions something like that here: https://youtu.be/Uooh0Y9fC_M?t=1h41m)
Since your question is quite general, here are some general ideas.
From within your application, the language gives you a number of constructs to protect your data. For example, data that is not meant to be exposed should be flagged as private. This does not prevent clients from reverse engineering your class though, for example if you have
class C {
public: int a;
private: int b;
};
then usually you will be able to access b through int* pB = &(c.a) + 1. This is not standard by any definition, but on most systems this will probably work. In general, because C++ allows very low-level memory management you can basically access any part of your applications memory from anywhere within the application, though making sensible (ab)use of this requires some reverse engineering.
When you expose public data, you can return const references and pointers, but of course it is very easy to still change this memory:
const* T myImmutable = new T();
const_cast<T*>(myImmutable)->change(); // Oops
Again, this is not standard C++ as compilers will use the const qualifier to perform optimizations, and things may break when you circumvent that, but the language does not stop you from doing this.
If you want to protect your memory from external changes, things become a bit trickier. In general, the OS makes sure that the memory assigned to different processes is separated and you cannot just go and write in the memory space of other processes. However, there are some functions in the Windows API (Read/WriteProcessMemory) that one may use. Of course, this requires heavy reverse engineering to determine exactly where in memory the pointer to be changed is located, but it is possible. Some ways of protecting against this are VirtualProtect, as mentioned in Dani's answer. There are increasingly complex things that you can do, like keeping checksums of important data or
[writing a] driver that monitors the SSDT and then catches when WriteProcessMemory or ReadProcessMemory is executed and you can squash those calls if they're pointed at the game
but as the first answer on the page I found that on correctly points out:
Safety doesn't exist. The only thing you can do is make it as hard as possible to crack
which is a lesson that you should never forget!

Has anyone ever considered a more "strict" flavor of C++ in which variables are / need to be initialized by default?

While C++ is an extraordinary language it's also a language with many pitfalls, especially for inexperienced programmers. I'm talking about things like uninitialized variables of primitive types in a class, e.g.
class Data {
std::string name;
unsigned int version;
};
// ...
Data data;
if (data.version) { ... } // use of uninitialized member
I know this example is oversimplified but in practice even experienced developers sometimes forget to initialize their member variables in constructors. While leaving primitives uninitialized by default is probably a relic from C, it provides us the choice between performance (leave some data uninitialized) and correctness (initialize all data).
OK, but what if the logic was inverted? I mean what if all primitives were either initialized with zeros? Or would require explicit initialization whose lack would generate a compile error. Of course for full flexibility one would have a special syntax/type to leave a variable/member uninitialized, e.g.
unsigned int x = std::uninitialized_value;
or
Data::Data() : name(), version(std::uninitialized_value) {}
I understand this could cause problems with existing C++ code which allows uninitialized data but the new code could be wrapped in a special block (extern "C" comes to me mind as an example) to let the compiler know a particular piece of code shall be strictly checked for uninitialized data.
Putting compatibility issues aside, such an approach would result in less bugs in our code, which is what we are all interested in.
Have you ever heard about any proposal like this?
Does such a proposal make sense at all?
Do you see any downsides of this approach?
Note 1: I used the term "strict" as this idea is related to the "strict mode" from JavaScript language which as mentioned on Mozilla Developer Network site
eliminates some JavaScript silent errors by changing them to throw
errors
Note 2: Please don't pay attention to the proposed syntax used in the proposal, it's there just to make a point.
Note 3: I'm aware of the fact that tools like cppcheck can easily find uninitialized member variables but my idea is about compile-time support for this kind of checks.
Use -Werror=uninitialized, it does exactly what you want.
You can then “unitialize” a variable with unsigned int x = x;
Have you ever heard about any proposal like this?
Does such a proposal make sense at all?
Do you see any downsides of this approach?
I haven't heard of any such proposal. To answer the next two, I'll first broadly claim that neither makes sense. To more specifically answer, we need to break the proposal down into two separate proposals:
No uninitialized variables of any kind are allowed.
"Uninitialized" variables are secretly initialized to some well-defined value (e.g. as in Java, which uses false, 0, or null).
Case 1 has a simple counterexample: performance. Examples abound; here's a (simplified) pattern I use in my code:
Foo x;
if (/*something*/) {
x = Foo(...);
} else {
x = Foo(...);
}
//[do something with x]
x is "uninitialized" here. The default constructor does get called (if Foo is an object), but this might just do nothing. If it's a big structure, then all I really pay for is the cost of the assignment operator--not the cost of some nontrivial constructor plus the cost of the assignment operator. If it's a typedef for an int or something, I need to load it into cache before xor eax,eaxing or whatever to clear it. If there's more code in the if-blocks, that's potentially a valuable cache miss if the compiler can't elide it.
Case 2 is trickier.
It turns out that modern operating systems actually do change the value of memory when it is allocated to processes (it's a security thing). So, there is a rudimentary form of this happening anyway.
You mention in particular that such an approach would result in [fewer] bugs in our code. I fundamentally disagree. Why would magically initializing things to some well-defined value make our code more robust? In fact, this semi-automatic initialization causes an enormous quantity of errors.
Storytime!
Exactly how this works depends on how the program is compiled (so debug vs. release and compiler). When I was first learning C++, I wrote a software rasterizer and concluded it worked--since it worked perfectly in debug mode. But, the instant I switched to release mode, I got a completely different result. Had the OS not initialized everything to zero so consistently in debug mode, I might have realized this sooner. This is an example of a bug caused by what you suggest.
By some miracle I managed to re-find this depressing question, which demonstrates similar confusions. This is a widespread problem.
Nowadays, some debugging environments put debug values into memory. E.g. MSVC puts 0xDEADBEEF and 0xFEEEFEEE. Lots more here. This, in combination with some OS sorcery, allows them to find use of uninitialized values. Running your code in a VM (e.g. Valgrind) gives you the same effect for free.
The larger point here is that, in my opinion, automatically initializing something to a well-defined value when you forget to initialize it is just as bad (if not worse) than getting some bogus value. The problem is the programmer is expecting something when he has no justification to--not that the value he's expecting is well-defined.

Understanding cost of multiple . and -> operator use?

Out of habit, when accessing values via . or ->, I assign them to variables anytime the value is going to be used more than once. My understanding is that in scripting languages like actionscript, this is pretty important. However, in C/C++, I'm wondering if this is a meaningless chore; am I wasting effort that the compiler is going to handle for me, or am I exercising a good practice, and why?
public struct Foo
{
public:
Foo(int val){m_intVal = val;)
int GetInt(){return m_intVal;}
int m_intVal; // not private for sake of last example
};
public void Bar()
{
Foo* foo = GetFooFromSomewhere();
SomeFuncUsingIntValA(foo->GetInt()); // accessing via dereference then function
SomeFuncUsingIntValB(foo->GetInt()); // accessing via dereference then function
SomeFuncUsingIntValC(foo->GetInt()); // accessing via dereference then function
// Is this better?
int val = foo->GetInt();
SomeFuncUsingIntValA(val);
SomeFuncUsingIntValB(val);
SomeFuncUsingIntValC(val);
///////////////////////////////////////////////
// And likewise with . operator
Foo fooDot(5);
SomeFuncUsingIntValA(fooDot.GetInt()); // accessing via function
SomeFuncUsingIntValB(fooDot.GetInt()); // accessing via function
SomeFuncUsingIntValC(fooDot.GetInt()); // accessing via function
// Is this better?
int valDot = foo.GetInt();
SomeFuncUsingIntValA(valDot);
SomeFuncUsingIntValB(valDot);
SomeFuncUsingIntValC(valDot);
///////////////////////////////////////////////
// And lastly, a dot operator to a member, not a function
SomeFuncUsingIntValA(fooDot.m_intVal); // accessing via member
SomeFuncUsingIntValB(fooDot.m_intVal); // accessing via member
SomeFuncUsingIntValC(fooDot.m_intVal); // accessing via member
// Is this better?
int valAsMember = foo.m_intVal;
SomeFuncUsingIntValA(valAsMember);
SomeFuncUsingIntValB(valAsMember);
SomeFuncUsingIntValC(valAsMember);
}
Ok so I try to go for an answer here.
Short version: you definitely don’t need to to this.
Long version: you might need to do this.
So here it goes: in interpreted programs like Javascript theese kind of things might have a noticeable impact. In compiled programs, like C++, not so much to the point of not at all.
Most of the times you don’t need to worry with these things because an immense amount of resources have been pulled into compiler optimization algorithms (and actual implementations) that the compiler will correctly decide what to do: allocate an extra register and save the result in order to reuse it or recompute every time and save that register space, etc.
There are instances where the compiler can’t do this. That is when it can’t prove multiple calls produce the same result. Then it has no choice but to make all the calls.
Now let’s assume that the compiler makes the wrong choice and you as a precaution make the effort of micro–optimizations. You make the optimization and you squish a 10% performance increase (which is already an overly overly optimistic figure for this kind of optimization) on that portion of code. But what do you know, your code spends only 1% of his time in that portion of code. The rest of the time is most likely spend in some hot loops and waiting for data fetch. So you spend a non-negligible amount of effort to optimize yourself the code only to get a 0.1% performance increase in total time, which won’t even be observable due to the external factors that vary the execution time by way more than that amount.
So don’t spend time with micro-optimizations in C++.
However there are cases where you might need to do this and even crazier things. But this is only after properly profiling your code and this is another discussion.
So worry about readability, don’t worry about micro–optimizations.
The question is not really related to -> and . operators, but rather about repetitive expressions in general. Yes, it is true that most modern compilers are smart enough to optimize the code that evaluates the same expression repeatedly (assuming it has no observable side-effects).
However, using an explicit intermediate variable typically makes the program much more readable, since it explicitly exposes the fact that the same value is supposed to be used in all contexts. It exposes the fact the it was your intent to use the same value in all contexts.
If you repeat using the same expression to generate that value again and again, this fact becomes much less obvious. Firstly, it is difficult to say at the first sight whether the expressions are really identical (especially when they are long). Secondly, it is not obvious whether sequential evaluations of the seemingly the same expression produce identical results.
Finally, slicing long expressions into smaller ones by using intermediate variables can significantly simply debugging the code in step-by-step debugger, since it give the user much greater degree of control through "step in" and "step over" commands.
It's for sure better in terms of readability and maintainability to have such temporary variable.
In terms of performance, you shouldn't worry about such micro-optimization at this stage (premature optimization). Moreover, modern C++ compilers can optimize it anyway, so you really shouldn't worry about it.