OpenMP: how to flush pointer target?

OpenMP: how to flush pointer target? - c++

I’ve just noticed that the following code doesn’t compile in OpenMP (under GCC 4.5.1):
struct job {
unsigned busy_children;
};
job* j = allocateJob(…);
// …
#pragma omp flush(j->busy_children)
The compiler complains about the -> in the argument list to flush, and according to the OpenMP specification it’s right: flush expects as arguments a list of “id-expression”s, which basically means only (qualified) IDs are allowed, no expressions.
Furthermore, the spec says this about flush and pointers:
If a pointer is present in the list, the pointer itself is flushed, not the memory block to which the pointer refers.
Of course. However, since OpenMP also doesn’t allow me to dereference the pointers I basically cannot flush a pointee (pointer target).
– So what about references? The spec doesn’t mention them but I’m not confident that the following is conformant, and will actually flush the pointee.
unsigned& busy_children = j->busy_children;
#pragma omp flush(busy_children)
Is this guaranteed to work?
If not, how can I flush a pointee?

The flush directive has given the OpenMP ARB a headache for a long time. So much so, that there has been talk about removing it completely - though that creates other problems. Using flush(list), is extremely difficult to get correct and even the OpenMP experts have a great deal of trouble getting it correct. The problem with it, is that the way it is defined it can be moved around in your code by the compiler. That means that you should stay away from using flush(list).
As for your question about being able to flush a pointee, there is only one way to do that and that is to use flush (without a list). This will flush your entire thread environment and as such, can not be moved by the compiler. It seems "heavy handed", but the compilers are actually pretty good about flushing what is necessary when using flush without a list.

OpenMP specification doesn't directly says about the type of a variable, but MSDN says that "A variable specified in a flush directive must not have a reference type". This make me think that this is not guaranteed to work. The flush directive with empty variable list should flush all memory so this is what you can safely use.

The reason you cannot flush a dereferenced pointer is because flush is only needed for values in hardware registers. There is never any need under OpenMP to flush something that is NOT in a hardware register (for example, in cache or memory), since OpenMP assumes a coherent cache memory that guarantees that all threads will always see the same value when the same address is dereferenced. Hardware protocols guarantee cache coherency, making the multiple local caches behave like one shared global cache.

Related

Should a pointer to stack variable be volatile?

I know that I should use the volatile keyword to tell the compiler not to optimize memory read\write to variables. I also know that in most cases it should only be used to talk to non-C++ memory.
However, I would like to know if I have to use volatile when holding a pointer to some local (stack) variable.
For example:
//global or member variable
/* volatile? */bool* p_stop;
void worker()
{
/* volatile? */ bool stop = false;
p_stop = &stop;
while(!stop)
{
//Do some work
//No usage of "stop" or p_stop" here
}
}
void stop_worker()
{
*p_stop = true;
}
It looks to me like a compiler with some optimization level might see that stop is a local variable, that is never changed and could replace while(!stop) with a while(true) and thus changing *p_stop while do nothing.
So, is it required to mark the pointer as volatile in such a case?
P.S: Please do not lecture me on why not to use this, the real code that uses this hack does so for a (complex-to-explain) reason.
EDIT:
I failed to mention that these two functions run on different threads.
The worker() is a function of the first thread, and it should be stopped from another thread using the p_stop pointer.
I am not interested in knowing what better ways there are to solve the real reason that is behind this sort of hack. I simply want to know if this is defined\undefined behavior in C++ (11 for that matter), and also if this is compiler\platform\etc dependent. So far I see #Puppy saying that everyone is wrong and that this is wrong, but without referencing a specific standard that denoted this.
I understand that some of you are offended by the "don't lecture me" part, but please stick to the real question - Should I use volatile or not? or is this UB? and if you can please help me (and others) learn something new by providing a complete answer.

I simply want to know if this is defined\undefined behavior in C++ (11 for that matter)
Ta-da (from N3337, "quasi C++11")
Two expression evaluations conflict if one of them modifies a memory location [..] and the other one accesses or modifies the same memory location.
§1.10/4
and:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [..]
§1.10/21
You're accessing the (memory location of) object stop from different threads, both accesses are not atomic, thus also in no "happens before" relation. Simply put, you have a data race and thus undefined behavior.
I am not interested in knowing what better ways there are to solve the real reason that is behind this sort of hack.
Atomic operations (as defined by the C++ standard) are the only way to (reliably) solve this.

So, is it required to mark the pointer as volatile in such a case?
No. It's not required, principally because volatile doesn't even remotely cover what you need it to do in this case. You must use an actual synchronization primitive, like an atomic operation or mutex. Using volatile here is undefined behaviour and your program will explode.
volatile is NOT useful for concurrency. It may be useful for implementing concurrent primitives but it is far from sufficient.
Frankly, whether or not you want to use actual synchronization primitives is irrelevant. If you want to write correct code, you have no choice.

P.S: Please do not lecture me on why not to use this,
I am not sure what we are supposed to say. The compiler manages the stack, so anything you are doing with it is technically undefined behavior and may not work when you upgrade to the next version of the compiler.
You are also making assumptions that may be different than the compiler's assumptions when it optimizes. This is the real reason to use (or not use) volatile; you give guidance to the compiler that helps it decide whether optimizations are safe. The use of volatile tells the compiler that it should assume that these variables may change due to external influences (other threads or special hardware behavior).
So yes, in this case, it looks like you would need to mark both p_stop and stop with a volatile qualifier.
(Note: this is necessary but not sufficient, as it does not cause the appropriate behaviors to happen in a language implementation with a relaxed memory model that requires barriers to ensure correctness. See https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering )

This question simply cannot be answered from the details provided.
As is stated in the question this is an entirely unsupported way of communicating between threads.
So the only answer is:
Specify the compiler versions you're using and hope someone knows its darkest secrets or refer to your documentation. All the C++ standard will tell you is this won't work and all anyone can tell you is "might work but don't".
There isn't a "oh, come on guys everyone knows it pretty much works what do I do as the workaround? wink wink" answer.
Unless your compiler doesn't support atomics or suitably concurrent mechanisms there is no justifiable reason for doing this.
"It's not supported" isn't "complex-to-explain" so I'd be fascinated based on that code fragment to understand what possible reason there is for not doing this properly (other than ancient compiler).

Could optimization break thread safety?

class Foo{
public:
void fetch(void)
{
int temp=-1;
someSlowFunction(&temp);
bar=temp;
}
int getBar(void)
{
return bar;
}
void someSlowFunction(int *ptr)
{
usleep(10000);
*ptr=0;
}
private:
int bar;
};
I'm new to atomic operations so I may get some concepts wrong.
Considering above code, assuming loading and storing int type are atomic[Note 1], then getBar() could only get the bar before or after a fetch().
However, if a compiler is smart enough, it could optimize away temp and change it to:
void Foo::fetch(void)
{
bar=-1;
someSlowFunction(&bar);
}
Then in this case getBar() could get -1 or other intermediate state inside someSlowFunction() under certain timing conditions.
Is this risk possible? Does the standard prevent such optimizations?
Note 1: http://preshing.com/20130618/atomic-vs-non-atomic-operations/
The language standards have nothing to say about atomicity in this
case. Maybe integer assignment is atomic, maybe it isn’t. Since
non-atomic operations don’t make any guarantees, plain integer
assignment in C is non-atomic by definition.
In practice, we usually know more about our target platforms than
that. For example, it’s common knowledge that on all modern x86, x64,
Itanium, SPARC, ARM and PowerPC processors, plain 32-bit integer
assignment is atomic as long as the target variable is naturally
aligned. You can verify it by consulting your processor manual and/or
compiler documentation. In the games industry, I can tell you that a
lot of 32-bit integer assignments rely on this particular guarantee.
I'm targeting ARM Cortex-A8 here, so I consider this a safe assumption.

Compiler optimization can not break thread safety!
You might however experience issues with optimizations in code that appeared to be thread safe but really only worked because of pure luck.
If you access data from multiple threads, you must either
Protect the appropriate sections using std::mutex or the like.
or, use std::atomic.
If not, the compiler might do optimizations that is next to impossible to expect.
I recommend watching CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades), Part I" and Part II

After answering question in comments, it makes more sense. Let's analyze thread-safety here given that fetch() and getBar() are called from different threads. Several points will need to be considered:
'Dirty reads', or garabage reading due to interrupted write. While a general possibility, does not happen on 3 chip families I am familiar with for aligned ints. Let's discard this possibility for now, and just assume read values are alwats clean.
'Improper reads', or an option of reading something from bar which was never written there. Would it be possible? Optimizing away temp on the compiler part is, in my opinion, possible, but I am no expert in this matter. Let's assume it does not happen. The caveat would still be there - you might NEVER see the new value of bar. Not in a reasonable time, simply never.

The compiler can apply any transformation that results in the same observable behavior. Assignments to local non-volatile variables are not part of the observable behavior. The compiler may just decide to eliminate temp completely and just use bar directly. It may also decide that bar will always end up with the value zero, and set at the beginning of the function (at least in your simplified example).
However, as you can read in James' answer on a related question the situation is more complex because modern hardware also optimizes the executed code. This means that the CPU re-orders instructions, and neither the programmer or the compiler has influence on that without using special instructions. You need either a std::atomic, you memory fences explicitly (I wouldn't recommend it because it is quite tricky), or use a mutex which also acts as a memory fence.

It probably wouldn't optimize that way because of the function call in the middle, but you can define temp as volatile, this will tell the compiler not to perform these kinds of optimizations.
Depending on the platform, you can certainly have cases where multibyte quantities are in an inconsistent state. It doesn't even need to be thread related. For example, a device experiencing low voltage during a power brown-out can leave memory in an inconsistent state. If you have pointers getting corrupted, then it's usually bad news.
One way I approached this on a system without mutexes was to ensure every piece of data could be verified. For example, for every datum T, there would be a validation checksum C and a backup U.
A set operation would be as follows:
U = T
T = new value
C = checksum(T)
And a get operation would be as follows:
is checksum(T) == C
yes: return T
no: return U
This guarantees that the whatever is returned is in a consistent state. I would apply this algorithm to the entire OS, so for example, entire files could be restored.
If you want to ensure atomicity without getting into complex mutexes and whatever, try to use the smallest types possible. For example, does bar need to be an int or will unsigned char or bool suffice?

is it possible to use function pointers this way?

This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.

If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.

The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.

While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.

It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.

Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.

Can you force a crash if a write occurs to a given memory location with finer than page granularity?

I'm writing a program that for performance reasons uses shared memory (sockets and pipes as alternatives have been evaluated, and they are not fast enough for my task, generally speaking any IPC method that involves copies is too slow). In the shared memory region I am writing many structs of a fixed size. There is one program responsible for writing the structs into shared memory, and many clients that read from it. However, there is one member of each struct that clients need to write to (a reference count, which they will update atomically). All of the other members should be read only to the clients.
Because clients need to change that one member, they can't map the shared memory region as read only. But they shouldn't be tinkering with the other members either, and since these programs are written in C++, memory corruption is possible. Ideally, it should be as difficult as possible for one client to crash another. I'm only worried about buggy clients, not malicious ones, so imperfect solutions are allowed.
I can try to stop clients from overwriting by declaring the members in the header they use as const, but that won't prevent memory corruption (buffer overflows, bad casts, etc.) from overwriting. I can insert canaries, but then I have to constantly pay the cost of checking them.
Instead of storing the reference count member directly, I could store a pointer to the actual data in a separate mapped write only page, while keeping the structs in read only mapped pages. This will work, the OS will force my application to crash if I try to write to the pointed to data, but indirect storage can be undesirable when trying to write lock free algorithms, because needing to follow another level of indirection can change whether something can be done atomically.
Is there any way to mark smaller areas of memory such that writing them will cause your app to blow up? Some platforms have hardware watchpoints, and maybe I could activate one of those with inline assembly, but I'd be limited to only 4 at a time on 32-bit x86 and each one could only cover part of the struct because they're limited to 4 bytes. It'd also make my program painful to debug ;)
Edit: I found this rather eye popping paper, but unfortunately it requires using ECC memory and a modified Linux kernel.

I don't think its possible to make a few bits read only like that at the OS level.
One thing that occurred to me just now is that you could put the reference counts in a different page like you suggested. If the structs are a common size, and are all in sequential memory locations you could use pointer arithmetic to locate a reference count from the structures pointer, rather than having a pointer within the structure. This might be better than having a pointer for your use case.
long *refCountersBase;//The start address of the ref counters page
MyStruct *structsBase;//The start address of your structures page
//get address to reference counter
long *getRefCounter(MyStruct *myStruct )
{
size_t n = myStruct - structsBase;
long *ref = refCountersBase + n;
return ref;
}

You would need to add a signal handler for SIGSEGV which recovers from the exception, but only for certain addresses. A starting point might be http://www.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html and the corresponding documentation for your OS.
Edit: I believe what you want is to perform the write and return if the write address is actually OK, and tail-call the previous exception handler (the pointer you get when you install your exception handler) if you want to propagate the exception. I'm not experienced in these things though.

I have never heard of enforcing read-only at less than a page granularity, so you might be out of luck in that direction unless you can put each struct on two pages. If you can afford two pages per struct you can put the ref count on one of the pages and make the other read-only.
You could write an API rather than just use headers. Forcing clients to use the API would remove most corruption issues.
Keeping the data with the reference count rather than on a different page will help with locality of data and so improve cache performance.
You need to consider that a reader may have a problem and fail to properly update its ref count. Also that the writer may fail to complete an update. Coping with these things requires extra checks. You can combine such checks with the API. It may be worth experimenting to measure the performance implications of some kind of integrity checking. It may be fast enough to keep a checksum, something as simple as adler32.

slightly weird C++ code

Sorry if this is simple, my C++ is rusty.
What is this doing? There is no assignment or function call as far as I can see. This code pattern is repeated many times in some code I inherited. If it matters it's embedded code.
*(volatile UINT16 *)&someVar->something;
edit: continuing from there, does the following additional code confirm Heaths suspicions? (exactly from code, including the repetition, except the names have been changed to protect the innocent)
if (!WaitForNotBusy(50))
return ERROR_CODE_X;
*(volatile UINT16 *)& someVar->something;
if (!WaitForNotBusy(50))
return ERROR_CODE_X;
*(volatile UINT16 *)& someVar->something;
x = SomeData;

This is a fairly common idiom in embedded programming (though it should be encapsulated in a set of functions or macros) where a device register needs to be accessed. In many architectures, device registers are mapped to a memory address and are accessed like any other variable (though at a fixed address - either pointers can be used or the linker or a compiler extension can help with fixing the address). However, if the C compiler doesn't see a side effect to a variable access it can optimize it away - unless the variable (or the pointer used to access the variable) is marked as volatile.
So the expression;
*(volatile UINT16 *)&someVar->something;
will issue a 16-bit read at some offset (provided by the something structure element's offset) from the address stored in the someVar pointer. This read will occur and cannot be optimized away by the compiler due to the volatile keyword.
Note that some device registers perform some functionality even if they are simply read - even if the data read isn't otherwise used. This is quite common with status registers, where an error condition might be cleared after the read of the register that indicates the error state in a particular bit.
This is probably one of the more common reasons for the use of the volatile keyword.

So here's a long shot.
If that address points to a memory mapped region on a FPGA or other device, then the device might actually be doing something when you read that address.

I think the author's intent was to cause the compiler to emit memory barriers at these points. By evaluating the expression result of a volatile, the indication to the compiler is that this expression should not be optimized away, and should 'instantiate' the semantics of access to a volatile location (memory barriers, restrictions on optimizations) at each line where this idiom occurs.
This type of idiom could be "encapsulated" in a pre-processor macro (#define) in case another compile has a different way to cause the same effect. For example, a compiler with the ability to directly encode read or write memory barriers might use the built-in mechanism rather than this idiom. Implementing this type of code inside a macro enables changing the method all over your code base.
EDIT: User sharth has a great point that if this code runs in an environment where the address of the pointer is a physical rather than virtual address (or a virtual address mapped to a specific physical address), then performing this read operation might cause some action at a peripheral device.

Generally this is bad code.
In C and C++ volatile means very few and does not provide implicit memory barrier. So this code is just quite wrong uness it is written as
memory_barrier();
*(volatile UINT16 *)&someVar->something;
It is just bad code.
Expenation: volatile does not make variable atomic!
Reed this article: http://www.mjmwired.net/kernel/Documentation/volatile-considered-harmful.txt
This is why volatile should almost never be used in proper code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js