slightly weird C++ code - c++

Sorry if this is simple, my C++ is rusty.
What is this doing? There is no assignment or function call as far as I can see. This code pattern is repeated many times in some code I inherited. If it matters it's embedded code.
*(volatile UINT16 *)&someVar->something;
edit: continuing from there, does the following additional code confirm Heaths suspicions? (exactly from code, including the repetition, except the names have been changed to protect the innocent)
if (!WaitForNotBusy(50))
return ERROR_CODE_X;
*(volatile UINT16 *)& someVar->something;
if (!WaitForNotBusy(50))
return ERROR_CODE_X;
*(volatile UINT16 *)& someVar->something;
x = SomeData;

This is a fairly common idiom in embedded programming (though it should be encapsulated in a set of functions or macros) where a device register needs to be accessed. In many architectures, device registers are mapped to a memory address and are accessed like any other variable (though at a fixed address - either pointers can be used or the linker or a compiler extension can help with fixing the address). However, if the C compiler doesn't see a side effect to a variable access it can optimize it away - unless the variable (or the pointer used to access the variable) is marked as volatile.
So the expression;
*(volatile UINT16 *)&someVar->something;
will issue a 16-bit read at some offset (provided by the something structure element's offset) from the address stored in the someVar pointer. This read will occur and cannot be optimized away by the compiler due to the volatile keyword.
Note that some device registers perform some functionality even if they are simply read - even if the data read isn't otherwise used. This is quite common with status registers, where an error condition might be cleared after the read of the register that indicates the error state in a particular bit.
This is probably one of the more common reasons for the use of the volatile keyword.

So here's a long shot.
If that address points to a memory mapped region on a FPGA or other device, then the device might actually be doing something when you read that address.

I think the author's intent was to cause the compiler to emit memory barriers at these points. By evaluating the expression result of a volatile, the indication to the compiler is that this expression should not be optimized away, and should 'instantiate' the semantics of access to a volatile location (memory barriers, restrictions on optimizations) at each line where this idiom occurs.
This type of idiom could be "encapsulated" in a pre-processor macro (#define) in case another compile has a different way to cause the same effect. For example, a compiler with the ability to directly encode read or write memory barriers might use the built-in mechanism rather than this idiom. Implementing this type of code inside a macro enables changing the method all over your code base.
EDIT: User sharth has a great point that if this code runs in an environment where the address of the pointer is a physical rather than virtual address (or a virtual address mapped to a specific physical address), then performing this read operation might cause some action at a peripheral device.

Generally this is bad code.
In C and C++ volatile means very few and does not provide implicit memory barrier. So this code is just quite wrong uness it is written as
memory_barrier();
*(volatile UINT16 *)&someVar->something;
It is just bad code.
Expenation: volatile does not make variable atomic!
Reed this article: http://www.mjmwired.net/kernel/Documentation/volatile-considered-harmful.txt
This is why volatile should almost never be used in proper code.

Related

Volatile keyword in GLSL

In the OpenGL Wiki, it is write :
The compiler normally is free to assume that values accessed through variables will only change after memory barriers or other synchronization. With this qualifier, the compiler assumes that the contents of the storage represented by the variable could be changed at any time.
Going that way, I understand that when you are using synchronization like memory barriers or atomic functions you do not need to use volatile variable.
However, when volatile variables are useful? In my understanding, it seems to never be useful... Or maybe if the host or another commands make a, update, but I do not see which kind of algorithm will do such things...

How to set a constexpr pointer to a physical Address

In embedded programming you often need to set pointers that point to a physical address. The address is non relocatable and fixed. These are not set by the linker as typically they represent registers or in this case calibration data located at a predetermined address in OPT memory. This data is set when the device is first tested in production by the chip manufacturer.
so the first attempt was:
static constexpr uint16_t *T30_CAL = reinterpret_cast<uint16_t *>(0x1FFFF7B8u);
But that leads to following warning / error under GCC and is 'illegal' according to the standard (c++ 14) .
..xyz/xxxx/calibration.cpp:23:40: error: reinterpret_cast from integer to pointer
Now i can fudge it by
constexpr uint32_t T30_ADDR = 0x1FFFF7B8u;
static constexpr inline uint16_t *T30_CAL(){
return reinterpret_cast<uint16_t *>(T30_ADDR);
}
which compiles without warnings but ......
I suppose GCC can optionally compile this to a function instead of a constexpr, though it does inline this every time.
Is there a simpler and more standards compliant way of doing this ?
For embedded code these definitions are required all the time, so it would be nice if there was a simple way of doing this that does not require function definitions.
The answers to the previous questions generally resulted in a answer that says this is not allowed in the standard and leaves it at that.
That is not really what I want. I need to get a compliant way of using C++ to generate compile time constant pointers to a fixed address. I want to do it without using Macros as that sprinkles my code with casts that cause problems with compliance checkers. It results in the need to get compliance exceptions in multiple places rather then one. Each exception is a process and takes time and effort.
Constexpr guarantees, on embedded systems, that the constant is placed in .text section (flash) whilst const does not. It may be placed in valuable ram and initialised by the .bss startup code. Typically embedded devices have much more flash then RAM. Also the code to access variables in RAM is often much more inefficient as it typically involves at least two memory access on embedded targets such as ARM. One to load the variable's RAM address and the second to load the actual constant pointer value from the variable's location. Constexpr results in the constant pointer being coded directly into the instruction stream or results in a single constant load.
If this was just a single instance it would not be an issue, but you generally have many different peripherals each controlled via there own register sets and then this becomes a problem.
A lot of the embedded code ends up reading and writing peripheral registers.
Use this instead:
static uint16_t * const T30_CAL = reinterpret_cast<uint16_t *>(0x1FFFF7B8u);
GCC will store T30_CAL in flash on an ARM target, not RAM. The point is that the 'const' must come after the '*' because it is T30_CAL that is const, not what T30_CAL points to.
As was already pointed out in the comments: reinterpret_cast is not allowed in a constant expression This is because the compiler has to be able to evaluate constexpr at compile time, but the reinterpret_cast might use runtime instructions to do its job.
You already suggested to use macros. This seems a fine way for me, because the compiler will definitely not produce any overhead. However, I would not suggest using your second way of hiding the reinterpret_cast, because as you said, a function is generated. This function will likely take way more memory away than an additional pointer.
In any case, the most reasonable way seems to me, to just declare a const pointer. As soon as you use optimizations, the compiler will just insert the memory location into your executable instead of using a variable. (See https://godbolt.org/g/8KnUKg )

Why address-of operator ('&') can be used with objects that are declared with the register storage class specifier in C++?

In C programming language we are not allowed to use address-of operator(&) with variables which are declared with register storage class specifier.
It gives error: address of register variable ‘var_name’ requested
But if we make a c++ program and perform the same task (i.e use the & with register storage variable) it doesn't gives us any error.
eg.
#include <iostream>
using namespace std;
int main()
{
register int a;
int * ptr;
a = 5;
ptr = &a;
cout << ptr << endl;
return 0;
}
Output :-
0x7ffcfed93624
Well this must be an extra feature of C++, but the question is on the difference between register class storage in C and C++.
The restriction on taking the address was deliberately removed in C++ - there was no benefit to it, and it made the language more complicated. (E.g. what would happen if you bound a reference to a register variable?)
The register keyword hasn't been much use for many years - compilers are very good at figuring out what to put in registers by themselves. Indeed in C++ the keyword is currently deprecated and will eventually be removed.
The register storage class originally hinted to the compiler that the variable so qualified was to be used so frequently that keeping its value in memory would be a performance drawback. The vast majority of CPU architectures (maybe not SPARC? Not even certain there's a counterexample) cannot perform any operation between two variables without first loading one or both from memory into its registers. Loading variables from memory into registers and writing them back to memory once operated upon takes many times more CPU cycles than the operations themselves. Thus, if a variable is used frequently, one can achieve a performance gain by setting aside a register for it and not bothering with memory at all.
Doing so, however, has a variety of requirements. Many are different for every CPU architecture:
All processors have a fixed number of registers, but each processor model has a different number. In the 80s you might have had 4 that could reasonably be used for a register variable.
Most processors do not support the use of every register for every instruction. In the 80s it was not uncommon to have only one register that you could use for addition and subtraction, and you probably couldn't use that same register as a pointer.
Calling conventions dictated differing sets of registers that could be expected to be overwritten by subroutines i.e. function calls.
The size of a register differs between processors, so there are cases where a register variable will not fit in a register.
Because C is intended to be independent of platform, these restrictions could not be enforced by the standard. In other words, while it may be impossible to compile a procedure with 20 register variables for a system that only had 4 machine registers, the C program itself should not be "wrong", as there is no logical reason a machine cannot have 20 registers. Thus, the register storage class was always just a hint that the compiler could ignore if the specific target platform would not support it.
The inability to reference a register is different. A register is specifically not kept updated in memory and not kept current if changes are made to memory; that's the whole point of the storage class. Since they are not intended to have a guaranteed representation in memory, they cannot logically have an address in memory that will be meaningful to external code that may obtain the pointer. Registers have no address to their own CPU, and they almost never have an address accessible to any coprocessor. Therefore, any attempt to obtain a reference to a register is always a mistake. The C standard could comfortably enforce this rule.
As computing evolved, however, some trends developed that weakened the purpose of the register storage class itself:
Processors came with greater numbers of registers. Today you probably have at least 16, and they can probably all be used interchangeably for most purposes.
Multi-core processors and distributed code execution has become very common; only one core has access to any one register and they never share without involving memory anyway.
Algorithms for allocating registers to variables became very effective.
Indeed, compilers are now so good at allocating variables to registers that they will usually do a better job at optimization than any human. They certainly know which ones you are using most frequently without you telling them. It would be more complicated for the compiler (i.e. not for the standard or for the programmer) to produce these optimizations if they were required to honor your manual register hints. It became increasingly common for compilers to categorically ignore them. By the time C++ existed, it was obsolete. It is included in the standard for backward compatibility, to keep C++ as close as possible to a proper superset of C. The requirements of a compiler to honor the hint and thus the requirements to enforce the conditions under which the hint could be honored were weakened accordingly. Today, the storage class itself is deprecated.
Therefore, even though it is still the case today (and will be until computers don't even have registers) that you cannot logically have a reference to a CPU register, the expectation that the register storage class will be honored is so long gone that it is unreasonable for the standard to require compilers to require you to be logical in your use of it.
A referenced register would be the register itself. If the calling function passed ESI as a referenced parameter, then the called function would use ESI as the parameter. As pointed out by Alan Stokes, the issue is if another function also calls the same function, but this time with EDI as the same referenced parameter.
In order for this to work, two overloaded like instances of the called function would need to be created, one taking ESI as a parameter, one taking EDI as a parameter. I don't know if any actual C++ compiler actually implements such an optimization in general, but that is how this could be done.
One example of register by reference is the way std::swap() gets optimized (both parameters are references), which often ends up as inlined code. Sometimes no swap takes place: for example, std::swap(a, b), no swap takes place, instead the sense of a and b is swapped in the code that follows (references to what was a become references to b and vice versa).
Otherwise, a reference parameter will force the variable to be located in memory instead of a register.

Could optimization break thread safety?

class Foo{
public:
void fetch(void)
{
int temp=-1;
someSlowFunction(&temp);
bar=temp;
}
int getBar(void)
{
return bar;
}
void someSlowFunction(int *ptr)
{
usleep(10000);
*ptr=0;
}
private:
int bar;
};
I'm new to atomic operations so I may get some concepts wrong.
Considering above code, assuming loading and storing int type are atomic[Note 1], then getBar() could only get the bar before or after a fetch().
However, if a compiler is smart enough, it could optimize away temp and change it to:
void Foo::fetch(void)
{
bar=-1;
someSlowFunction(&bar);
}
Then in this case getBar() could get -1 or other intermediate state inside someSlowFunction() under certain timing conditions.
Is this risk possible? Does the standard prevent such optimizations?
Note 1: http://preshing.com/20130618/atomic-vs-non-atomic-operations/
The language standards have nothing to say about atomicity in this
case. Maybe integer assignment is atomic, maybe it isn’t. Since
non-atomic operations don’t make any guarantees, plain integer
assignment in C is non-atomic by definition.
In practice, we usually know more about our target platforms than
that. For example, it’s common knowledge that on all modern x86, x64,
Itanium, SPARC, ARM and PowerPC processors, plain 32-bit integer
assignment is atomic as long as the target variable is naturally
aligned. You can verify it by consulting your processor manual and/or
compiler documentation. In the games industry, I can tell you that a
lot of 32-bit integer assignments rely on this particular guarantee.
I'm targeting ARM Cortex-A8 here, so I consider this a safe assumption.
Compiler optimization can not break thread safety!
You might however experience issues with optimizations in code that appeared to be thread safe but really only worked because of pure luck.
If you access data from multiple threads, you must either
Protect the appropriate sections using std::mutex or the like.
or, use std::atomic.
If not, the compiler might do optimizations that is next to impossible to expect.
I recommend watching CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades), Part I" and Part II
After answering question in comments, it makes more sense. Let's analyze thread-safety here given that fetch() and getBar() are called from different threads. Several points will need to be considered:
'Dirty reads', or garabage reading due to interrupted write. While a general possibility, does not happen on 3 chip families I am familiar with for aligned ints. Let's discard this possibility for now, and just assume read values are alwats clean.
'Improper reads', or an option of reading something from bar which was never written there. Would it be possible? Optimizing away temp on the compiler part is, in my opinion, possible, but I am no expert in this matter. Let's assume it does not happen. The caveat would still be there - you might NEVER see the new value of bar. Not in a reasonable time, simply never.
The compiler can apply any transformation that results in the same observable behavior. Assignments to local non-volatile variables are not part of the observable behavior. The compiler may just decide to eliminate temp completely and just use bar directly. It may also decide that bar will always end up with the value zero, and set at the beginning of the function (at least in your simplified example).
However, as you can read in James' answer on a related question the situation is more complex because modern hardware also optimizes the executed code. This means that the CPU re-orders instructions, and neither the programmer or the compiler has influence on that without using special instructions. You need either a std::atomic, you memory fences explicitly (I wouldn't recommend it because it is quite tricky), or use a mutex which also acts as a memory fence.
It probably wouldn't optimize that way because of the function call in the middle, but you can define temp as volatile, this will tell the compiler not to perform these kinds of optimizations.
Depending on the platform, you can certainly have cases where multibyte quantities are in an inconsistent state. It doesn't even need to be thread related. For example, a device experiencing low voltage during a power brown-out can leave memory in an inconsistent state. If you have pointers getting corrupted, then it's usually bad news.
One way I approached this on a system without mutexes was to ensure every piece of data could be verified. For example, for every datum T, there would be a validation checksum C and a backup U.
A set operation would be as follows:
U = T
T = new value
C = checksum(T)
And a get operation would be as follows:
is checksum(T) == C
yes: return T
no: return U
This guarantees that the whatever is returned is in a consistent state. I would apply this algorithm to the entire OS, so for example, entire files could be restored.
If you want to ensure atomicity without getting into complex mutexes and whatever, try to use the smallest types possible. For example, does bar need to be an int or will unsigned char or bool suffice?

Is there anyway a valgrind message "Conditional jump or move depends on uninitialized value" can be a so called 'false positive'

Most questions I find here provide a piece of code and get answered by someone pointing to the actual error. My question is about conditional jumps on uninitialized values in general. I can understand that a piece of memory should not necessarily be cleaned at the end of a program if one is sure this allocation is done only once and will probably be needed during the lifetime of a program. As far as I remember the GType system leaves a lot of unfreed memory when the program terminates. These unfreed blocks can be seen as 'false positives'. But can a 'conditional jump or move on uninitialized value' be a false positive? The only thing I can come up with is someone implementing a (bad) randomize function by just reading a random address (where the random address itself is the tricky part ;). Another example could be hardware mapped to a part of the memory which is then read, but this is mostly done by drivers and not by normal user applications. Is there any other example (preferably C) which could cause such a false positive?
What valgrind is reporting is that it sees a jump based on a read from a location for which it knows that it was allocated by the program but for which it hasn't seen an initialization. This might happen if the object is initialized by some magic that valgrind doesn't know about. Architectures evolve constantly and maybe you have an instruction or register type that valgrind doesn't know enough about.
Another difficult source of such non-initializations are unions. Two sources:
Per default, for these only the first member is initialized and so
when another field goes beyond that first member that part might be
uninitialized.
If the members of the union are struct they may have padding
bytes at different places, and so part of a member may be
uninitialized if you assigned to a different member.
In some cases it might be legitimate to even read these things (through a unsigned char[] for example) so if you consider such things as a bug (false positive) or not is a matter of perspective.
Absolutely! I once had C code of the form
// compute a and, possibly, b
if (a && b) {
// do stuff
}
in which b was guaranteed to be initialized if a were true. Thus, there was no way that an uninitialized value of b could cause a problem. However, gcc, when optimizing sufficiently aggressively, decided to check the value of b first. This was acceptable since neither check had any side effects, but it still caused valgrind to complain.