Volatile keyword in GLSL - opengl

In the OpenGL Wiki, it is write :
The compiler normally is free to assume that values accessed through variables will only change after memory barriers or other synchronization. With this qualifier, the compiler assumes that the contents of the storage represented by the variable could be changed at any time.
Going that way, I understand that when you are using synchronization like memory barriers or atomic functions you do not need to use volatile variable.
However, when volatile variables are useful? In my understanding, it seems to never be useful... Or maybe if the host or another commands make a, update, but I do not see which kind of algorithm will do such things...

Related

Should a pointer to stack variable be volatile?

I know that I should use the volatile keyword to tell the compiler not to optimize memory read\write to variables. I also know that in most cases it should only be used to talk to non-C++ memory.
However, I would like to know if I have to use volatile when holding a pointer to some local (stack) variable.
For example:
//global or member variable
/* volatile? */bool* p_stop;
void worker()
{
/* volatile? */ bool stop = false;
p_stop = &stop;
while(!stop)
{
//Do some work
//No usage of "stop" or p_stop" here
}
}
void stop_worker()
{
*p_stop = true;
}
It looks to me like a compiler with some optimization level might see that stop is a local variable, that is never changed and could replace while(!stop) with a while(true) and thus changing *p_stop while do nothing.
So, is it required to mark the pointer as volatile in such a case?
P.S: Please do not lecture me on why not to use this, the real code that uses this hack does so for a (complex-to-explain) reason.
EDIT:
I failed to mention that these two functions run on different threads.
The worker() is a function of the first thread, and it should be stopped from another thread using the p_stop pointer.
I am not interested in knowing what better ways there are to solve the real reason that is behind this sort of hack. I simply want to know if this is defined\undefined behavior in C++ (11 for that matter), and also if this is compiler\platform\etc dependent. So far I see #Puppy saying that everyone is wrong and that this is wrong, but without referencing a specific standard that denoted this.
I understand that some of you are offended by the "don't lecture me" part, but please stick to the real question - Should I use volatile or not? or is this UB? and if you can please help me (and others) learn something new by providing a complete answer.
I simply want to know if this is defined\undefined behavior in C++ (11 for that matter)
Ta-da (from N3337, "quasi C++11")
Two expression evaluations conflict if one of them modifies a memory location [..] and the other one accesses or modifies the same memory location.
§1.10/4
and:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [..]
§1.10/21
You're accessing the (memory location of) object stop from different threads, both accesses are not atomic, thus also in no "happens before" relation. Simply put, you have a data race and thus undefined behavior.
I am not interested in knowing what better ways there are to solve the real reason that is behind this sort of hack.
Atomic operations (as defined by the C++ standard) are the only way to (reliably) solve this.
So, is it required to mark the pointer as volatile in such a case?
No. It's not required, principally because volatile doesn't even remotely cover what you need it to do in this case. You must use an actual synchronization primitive, like an atomic operation or mutex. Using volatile here is undefined behaviour and your program will explode.
volatile is NOT useful for concurrency. It may be useful for implementing concurrent primitives but it is far from sufficient.
Frankly, whether or not you want to use actual synchronization primitives is irrelevant. If you want to write correct code, you have no choice.
P.S: Please do not lecture me on why not to use this,
I am not sure what we are supposed to say. The compiler manages the stack, so anything you are doing with it is technically undefined behavior and may not work when you upgrade to the next version of the compiler.
You are also making assumptions that may be different than the compiler's assumptions when it optimizes. This is the real reason to use (or not use) volatile; you give guidance to the compiler that helps it decide whether optimizations are safe. The use of volatile tells the compiler that it should assume that these variables may change due to external influences (other threads or special hardware behavior).
So yes, in this case, it looks like you would need to mark both p_stop and stop with a volatile qualifier.
(Note: this is necessary but not sufficient, as it does not cause the appropriate behaviors to happen in a language implementation with a relaxed memory model that requires barriers to ensure correctness. See https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering )
This question simply cannot be answered from the details provided.
As is stated in the question this is an entirely unsupported way of communicating between threads.
So the only answer is:
Specify the compiler versions you're using and hope someone knows its darkest secrets or refer to your documentation. All the C++ standard will tell you is this won't work and all anyone can tell you is "might work but don't".
There isn't a "oh, come on guys everyone knows it pretty much works what do I do as the workaround? wink wink" answer.
Unless your compiler doesn't support atomics or suitably concurrent mechanisms there is no justifiable reason for doing this.
"It's not supported" isn't "complex-to-explain" so I'd be fascinated based on that code fragment to understand what possible reason there is for not doing this properly (other than ancient compiler).

Embedded C++11 code — do I need volatile?

Embedded device with Cortex M3 MCU(STM32F1). It has embedded flash(64K).
MCU firmware can reprogram flash sectors at runtime; this is done by Flash Memory Controller(FMC) registers (so it's not as easy as a=b). FMC gets buffer pointer and burns data to some flash sector.
I want to use the last flash sector for device configuration parameters.
Parameters are stored in a packed struct with arrays and contain some custom classes.
Parameters can be changed at runtime (copy to RAM, change and burn back to flash using FMC).
So there are some questions about that:
State (bitwise) of parameters struct is changed by FMC hardware.
C++ compiler does not know if it was changed or not.
Does this mean I should declare all struct members as volatile?
I think YES.
Struct should be statically initialized (default parameters) at compile time. Struct should be POD (TriviallyCopyable and has standard layout). Remember, there are some custom classes in there, so I keep in mind these classes should be POD too.
BUT there are some problems:
cppreference.com
The only trivially copyable types are scalar types, trivially copyable
classes, and arrays of such types/classes (possibly const-qualified,
but not volatile-qualified).
That means I can't keep my class both POD and volatile?
So how would I solve the problem?
It is possible to use only scalar types in parameters struct but it may result in much less clean code around config processing...
P.S.
It works even without volatile, but I am afraid someday, some smart LTO compiler will see static initialized, not changing (by C++) struct and optimize out some access to underlying memory adresses. That means fresh programmed parameters will not be applied because they were inlined by the compiler.
EDIT: It is possible to solve problem without using volatile. And it seems to be more correct.
You need define config struct variable in separate translation unit(.cpp file) and do not initialize variable to avoid values substitution during LTO. If not using LTO - all be OK because optimizations are done in one translation unit at a time, so variables with static storage duration and external linkage defined in dedicated translation unit should not be optimized out. Only LTO can throw it away or make values substitution without issuing memory fetches. Especially when defining variable as a const. I think it is OK to initialize variable if not using LTO.
You have some choices depending on your compiler:
You can declare a pointer to the structure and initialize the pointer
to the region.
Tell the compiler where the variable should reside
Pointer to Flash
Declare a pointer, of the structure.
Assign the pointer to the proper address in Flash.
Access the variables by dereferencing the pointer.
The pointer should be declared, and assigned, as a constant pointer to constant data.
Telling compiler address of variable.
Some compilers allow you to place a variable in a specific memory region. The first step is to create a region in the linker command file. Next step is to tell the compiler that the variable is in that region.
Again, the variable should be declared as "static const". The "static" because there is only 1 instance. The "const" because Flash memory is read-only for most of the time.
Flash Memory: Volatile vs. Const
In most cases, the Flash memory, however programmed, is read-only. In fact, the only way you can read the data in Flash is to lock it, a.k.a. make it read-only. In general, it won't be changed without concensus of the program.
Most Flash memories are programmed by the software. Normally, this is your program. If your program is going to reprogram the Flash, it knows the values have been changed. This is akin to writing to RAM. The program changed the value, not the hardware. Thus the Flash is not volatile.
My experience is that Flash can be programmed by another means, usually when your program is not running. In that case, it is still not volatile because your program is not running. The Flash is still read-only.
The Flash will be volatile, if and only if, another task or thread of execution programs the flash while your thread of execution is active. I still would not consider this case as volatile. This would be a case in syncronicity -- if the flash is modified, then some listeners should be notified.
Summary
The Flash memory is best treated as read-only memory. Variables residing in Flash are accessed via pointer for best portability, although some compilers and linkers allow you to declare variables at specific, hard-coded addresses. The variables should be declared as const static so that the compiler can emit code to access the variables directly, vs. copying on the stack. If the Flash is programmed by another task or thread of execution, this is a synchronicity issue, not one of volatile. In rare cases, the Flash is programmed by an external source while your program is executed.
Your program should provide checksums or other methods to determine if the content has changed, since the last time it was checked.
DO NOT HAVE COMPILER INITIALIZE VARIABLES FROM FLASH.
This is not really portable. A better method is to have your initialization code load the variable(s) from flash. Making the compiler load your variable from a different segment requires a lot of work with the internals of the compiler and linker; a lot more than initializing a pointer to the address in the Flash.
By reprogramming the flash, you are changing the underlying object's representation. The volatile qualifier is the appropriate solution for the
situation to ensure the changes in data are not optimized away.
You would like a declaration to be: const volatile Settings settings;
The drawback is that volatile prevents static initialization of your object. This stops you from using the linker to put the initialized object in its appropriate memory address.
You would like the definition to be: const Settings settings = { ... };
Luckily, you can initialize a const object and access it as a const volatile.
// Header file
struct Settings { ... };
extern const volatile Settings& settings;
// Source file
static const Settings init_settings = { ... };
const volatile Settings& settings = init_settings;
The init_settings object is statically initialized, but all accesses through the settings reference are treated as volatile.
Please note, though, modifying an object defined as const is undefined behavior.

C++ - Global variable performance when it is likely in the cache

I'm trying to understand if my global variable usage which is being done for convenience and ease of assembly generation has a positive side-effect or not (I guess I'm looking to rid myself of the guilt of having these globals).
Program Details:
Broken up into "operations". Each operation reads I/O then does heavy mathematical compute, lots of special casing of code paths via hand-written assembly.
Single-threaded, will never be multi-threaded
One global variable, a fixed-size pre-allocated array (128K)
One global variable, an integer that acts as a pointer
My justification for using global variables here is primarily that I can then just generate call instructions without having to pass parameters, setting up the stack, etc.
The calls will to be functions like this:
DoSomething1()
{
access global1's memory ...
increment global2 ...
reset code;
}
I can ofcourse generate code for parameters, but then I thought maybe the global variables will likely have a perf benefit as well, since the compiler is going to be a constant address for the access. Of course, my global is extremely likely to be in the cache as well.
Am I thinking about this right? Is it possible that using the global the way I describe will make the compiler try to do load/store as opposed to en register them? In fact, can the compiler en register a global variable?

OpenMP: how to flush pointer target?

I’ve just noticed that the following code doesn’t compile in OpenMP (under GCC 4.5.1):
struct job {
unsigned busy_children;
};
job* j = allocateJob(…);
// …
#pragma omp flush(j->busy_children)
The compiler complains about the -> in the argument list to flush, and according to the OpenMP specification it’s right: flush expects as arguments a list of “id-expression”s, which basically means only (qualified) IDs are allowed, no expressions.
Furthermore, the spec says this about flush and pointers:
If a pointer is present in the list, the pointer itself is flushed, not the memory block to which the pointer refers.
Of course. However, since OpenMP also doesn’t allow me to dereference the pointers I basically cannot flush a pointee (pointer target).
– So what about references? The spec doesn’t mention them but I’m not confident that the following is conformant, and will actually flush the pointee.
unsigned& busy_children = j->busy_children;
#pragma omp flush(busy_children)
Is this guaranteed to work?
If not, how can I flush a pointee?
The flush directive has given the OpenMP ARB a headache for a long time. So much so, that there has been talk about removing it completely - though that creates other problems. Using flush(list), is extremely difficult to get correct and even the OpenMP experts have a great deal of trouble getting it correct. The problem with it, is that the way it is defined it can be moved around in your code by the compiler. That means that you should stay away from using flush(list).
As for your question about being able to flush a pointee, there is only one way to do that and that is to use flush (without a list). This will flush your entire thread environment and as such, can not be moved by the compiler. It seems "heavy handed", but the compilers are actually pretty good about flushing what is necessary when using flush without a list.
OpenMP specification doesn't directly says about the type of a variable, but MSDN says that "A variable specified in a flush directive must not have a reference type". This make me think that this is not guaranteed to work. The flush directive with empty variable list should flush all memory so this is what you can safely use.
The reason you cannot flush a dereferenced pointer is because flush is only needed for values in hardware registers. There is never any need under OpenMP to flush something that is NOT in a hardware register (for example, in cache or memory), since OpenMP assumes a coherent cache memory that guarantees that all threads will always see the same value when the same address is dereferenced. Hardware protocols guarantee cache coherency, making the multiple local caches behave like one shared global cache.

slightly weird C++ code

Sorry if this is simple, my C++ is rusty.
What is this doing? There is no assignment or function call as far as I can see. This code pattern is repeated many times in some code I inherited. If it matters it's embedded code.
*(volatile UINT16 *)&someVar->something;
edit: continuing from there, does the following additional code confirm Heaths suspicions? (exactly from code, including the repetition, except the names have been changed to protect the innocent)
if (!WaitForNotBusy(50))
return ERROR_CODE_X;
*(volatile UINT16 *)& someVar->something;
if (!WaitForNotBusy(50))
return ERROR_CODE_X;
*(volatile UINT16 *)& someVar->something;
x = SomeData;
This is a fairly common idiom in embedded programming (though it should be encapsulated in a set of functions or macros) where a device register needs to be accessed. In many architectures, device registers are mapped to a memory address and are accessed like any other variable (though at a fixed address - either pointers can be used or the linker or a compiler extension can help with fixing the address). However, if the C compiler doesn't see a side effect to a variable access it can optimize it away - unless the variable (or the pointer used to access the variable) is marked as volatile.
So the expression;
*(volatile UINT16 *)&someVar->something;
will issue a 16-bit read at some offset (provided by the something structure element's offset) from the address stored in the someVar pointer. This read will occur and cannot be optimized away by the compiler due to the volatile keyword.
Note that some device registers perform some functionality even if they are simply read - even if the data read isn't otherwise used. This is quite common with status registers, where an error condition might be cleared after the read of the register that indicates the error state in a particular bit.
This is probably one of the more common reasons for the use of the volatile keyword.
So here's a long shot.
If that address points to a memory mapped region on a FPGA or other device, then the device might actually be doing something when you read that address.
I think the author's intent was to cause the compiler to emit memory barriers at these points. By evaluating the expression result of a volatile, the indication to the compiler is that this expression should not be optimized away, and should 'instantiate' the semantics of access to a volatile location (memory barriers, restrictions on optimizations) at each line where this idiom occurs.
This type of idiom could be "encapsulated" in a pre-processor macro (#define) in case another compile has a different way to cause the same effect. For example, a compiler with the ability to directly encode read or write memory barriers might use the built-in mechanism rather than this idiom. Implementing this type of code inside a macro enables changing the method all over your code base.
EDIT: User sharth has a great point that if this code runs in an environment where the address of the pointer is a physical rather than virtual address (or a virtual address mapped to a specific physical address), then performing this read operation might cause some action at a peripheral device.
Generally this is bad code.
In C and C++ volatile means very few and does not provide implicit memory barrier. So this code is just quite wrong uness it is written as
memory_barrier();
*(volatile UINT16 *)&someVar->something;
It is just bad code.
Expenation: volatile does not make variable atomic!
Reed this article: http://www.mjmwired.net/kernel/Documentation/volatile-considered-harmful.txt
This is why volatile should almost never be used in proper code.