C++ volatile: guaranteed 32-bit accesses? - c++

In my Linux C++ project, I have a hardware memory region mapped somewhere in the physical address space which I access using uint32_t pointers, after doing a mmap.
The release build of the application crashes with a SIGBUS (bus error).
This is happening because the compiler optimizes accesses to the aforementioned hardware memory using 64-bit accesses, instead of sticking to 32-bit => bus error, the hardware memory can only be accessed using 32-bit reads/writes.
I marked the uint32_t pointer as volatile.
It works. For this one specific code portion at least. Because the compiler is told not to do reordering. And most of the time it would have to reorder to optimize.
I know that volatile controls when the compiler accesses the memory. The question is: does volatile also tell the compiler how to access the memory, i.e. to access it exactly as the programmer instructs? Am I guaranteed that the compiler will always stick to doing 32-bit accesses to volatile uint32_t buffers?
e.g. Does volatile also guarantee that the compiler will be accessing the 2 consecutive writes to the 2 consecutive 32-bit values in the following code-snippet using 32-bit reads/writes as well?
void aFunction(volatile uint32_t* hwmem_array)
{
[...]
// Are we guaranteed by volatile that the following 2 consecutive writes, in consecutive memory regions
// are not merged into a single 64-bit write by the compiler?
hwmem_array[0] = 0x11223344u;
hwmem_array[1] = 0xaabbccddu;
[...]
}

I think I answered my own question, please correct me if I'm wrong.
C99 standard draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
Quotes:
“
6 An object that has volatile-qualified type may be modified in ways
unknown to the implementation or have other unknown side effects.
Therefore any expression referring to such an object shall be
evaluated strictly according to the rules of the abstract machine, as
described in 5.1.2.3. Furthermore, at every sequence point the value
last stored in the object shall agree with that prescribed by the
abstract machine, except as modified by the unknown factors mentioned
previously.
”
Section 5.1.2.3:
“
2 Accessing a volatile object, modifying an object, modifying a file,
or calling a function that does any of those operations are all side
effects, which are changes in the state of the execution environment.
Evaluation of an expression may produce side effects. At certain
specified points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and no side
effects of subsequent evaluations shall have taken place. (A summary
of the sequence points is given in annex C.)
5 The least requirements on a conforming implementation are: — At
sequence points, volatile objects are stable in the sense that
previous accesses are complete and subsequent accesses have not yet
occurred.
”
Annex C (informative) Sequence points:
“
1 The following are the sequence points described in 5.1.2.3:
[…]
— The end of a full expression: an initializer (6.7.8); the expression
in an expression statement (6.8.3); the controlling expression of a
selection statement (if or switch) (6.8.4); the controlling expression
of a while or do statement (6.8.5); each of the expressions of a for
statement (6.8.5.3); the expression in a return statement (6.8.6.4).
”
So theoretically we are guaranteed that at the end of any expression involving a volatile object, the volatile object is written/read, as the compiler was instructed to do.

Related

Why is *ptr = (*ptr)++ Undefined Behavior

I am trying to learn how to explain the cause of(if any) of undefined behavior in the following cases(given below).
int i = 0, *ptr = &i;
i = ++i; //is this UB? If yes then why according to C++11
*ptr = (*ptr)++; //i think this is UB but i am unable to explain exactly why is this so
*ptr = ++(*ptr); //i think this is not UB but can't explain why
I have looked at many SO posts describing UB for different pointer cases similar to the cases above, but still i am unable to explain why exactly(like using which point(s) from the standard we can prove that they will result in UB) they result in UB.
I am looking for explanations according to C++11(or C++14) but not C++17 and not Pre-C++11.
Undefined behavior stems from this:
C++11 [intro.execution]/15 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
C++17 [intro.execution]/17 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced... If a side effect on a memory location (4.4) is unsequenced relative to either another side effect on the same memory location or a value computation using the value of any object in the same memory location, and they are not potentially concurrent (4.7), the behavior is undefined.
This text is similar. The main difference lies in "except where noted" part; in C++17, the order of evaluation of operands is specified for more operators than in C++11. Thus:
C++17 [expr.ass]/1 In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand.
C++11 lacks the bolded part. This part is what makes i = i++ well-defined in C++17, but undefined in C++11. That's because for postfix increment, the side effect is not part of a value computation of the expression:
C++11 and C++17 [expr.post.incr]/1 The value computation of the ++ expression is sequenced before the modification of the operand object.
So "the assignment is sequenced after the value computation of the right and left operands" is not by itself sufficient: the assignment is sequenced after the value computation of i++, and the side effect is also sequenced after that same value computation, but nothing says how they are sequenced relative to each other. Therefore, they are unsequenced, and they are both modifying the same object (here, i). This exhibits undefined behavior.
The addition of "the right operand is sequenced before the left operand" in C++17 means that the side effect of i++ is sequenced before the value computation of i, and both are sequenced before the assignment.
On the other hand, for pre-increment the side effect is necessarily part of the evaluation of the expression:
C++11 and C++17 [expr.pre.incr]/1 ... The result is the updated operand; it is an lvalue ...
So the value computation of ++i involves incrementing i first, and then applying an lvalue-to-rvalue conversion to obtain the updated value. This value computation is sequenced before the assignment in both C++11 and C++17, and so the two side effects on i are sequenced relative to each other; no undefined behavior.
Nothing changes in this analysis if i is replaced with (*ptr). That's just another way to refer to the same object or memory location.
The C++ Standard is based upon the C Standard, whose authors didn't need any particular "reason" to say that implementations may process a construct in whatever fashion would be most useful to their customers [which is what they intended the phrase "Undefined Behavior" to mean]. Many platforms can cheaply guarantee, for small primitive types, that race conditions involving a read and conflicting write to the same object will always yield either old or new data, and that race conditions involving conflicting writes will result every individual subsequent read seeing one of the written values. Rather than trying to identify all of the cases where implementations should or should not be expected to uphold guarantee, the Standard allows implementations to, at their leisure, process code "in a documented manner characteristic of the environment". Because it's not practical for all implementations to offer such guarantees in all possible scenarios, and because the range of scenarios where such guarantees would be practical would be different on different platforms, the authors of the Standard allowed implementations to weigh the pros and cons of offering various behavioral guarantees on their particular target platforms, rather than trying to write precise rules that would be appropriate for all possible implementations.
Note also that if one were to do something like:
*p = (*q)++;
return q[0] + q[i]; // where 'i' is some object of type `int`.
when p and q are equal and i is zero, a compiler might quite plausibly generate code where the assignment would undo the effect of the increment, but which would return the sum of the old value of q, plus 1, plus the actual stored value of q (which would be the old value, rather than the incremented value). Although this would be a logical consequence of the specified race-condition semantics, trying to specify it precisely would have been sufficiently awkward that the Standard simply allows implementations to specify the behavior as tightly or loosely as they see fit.

C/C++ unions and undefined behaviour

Is the following undefined behaviour?
union {
int foo;
float bar;
} baz;
baz.foo = 3.14 * baz.bar;
I remember that writing and reading from the same underlying memory between two sequence points is UB, but I am not certain.
I remember that writing and reading from the same underlying memory between two sequence points is UB, but I am not certain.
Reading and writing to the same memory location in the same expression does not invoke undefined behavior until and unless that location is modified more than once between two sequence points or the side effect is unsequenced relative to the value computation using the value at the same location.
C11: 6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
The expression
baz.foo = 3.14 * baz.bar;
has well defined behaviour if bar is initialized before. The reason is that the side effect to baz.foo is sequenced relative to the value computations of the objects baz.foo and baz.bar.
C11: 6.5.16/3 Assignment operators:
[...] The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
Disclaimer: This answer addresses C++.
You're accessing an object whose lifetime hasn't begun yet - baz.bar - which induces UB by [basic.life]/(6.1).
Assuming bar has been brought to life (e.g. by initializing it), your code is fine; before the assignment, foo need not be alive as no operation is performed that depends on its value, and during it, the active member is changed by reusing the memory and effectively initializing it. The current rules aren't clear about the latter; see CWG #1116. However, the status quo is that such assignments are indeed setting the target member as active (=alive).
Note that the assignment is sequenced (i.e. guaranteed to happen) after the value computation of the operands - see [expr.ass]/1.
Answering for C, not C++
I thought this was Defined Behavior, but then I read the following paragraph from ISO C2x (which I guess is also present in older C standards, but didn't check):
6.5.16.1/3 (Assignment operators::Simple Assignment::Semantics):
If the value being stored in an object is read from another object
that overlaps in any way the storage of the first object, then the
overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior is
undefined.
So, let's consider the following:
union {
int a;
const int b;
} db;
union {
int a;
float b;
} ub1;
union {
uint32_t a;
int32_t b;
} ub2;
Then, it is Defined Behavior to do:
db.a = db.b + 1;
But it is Undefined Behavior to do:
ub1.a = ub1.b + 1;
or
ub2.a = ub2.b + 1;
The definition of compatible types is in 6.2.7/1 (Compatible type and composite type). See also: __builtin_types_compatible_p().
The Standard uses the phrase "Undefined Behavior", among other things, as a catch-all for situations where many implementations would process a construct in at least somewhat predictable fashion (e.g. yielding a not-necessarily-predictable value without side effects), but where the authors of the Standard thought it impractical to try to anticipate everything that implementations might do. It wasn't intended as an invitation for implementations to behave gratuitously nonsensically, nor as an indication that code was erroneous (the phrase "non-portable or erroneous" was very much intended to include constructs that might fail on some machines, but would be correct on code which was not intended to be suitable for use with those machines).
On some platforms like the 8051, if a compiler were given a construct like someInt16 += *someUnsignedCharPtr << 4; the most efficient way to process it if it didn't have to accommodate the possibility that the pointer might point to the lower byte of someInt16 would be to fetch *someUnsignedCharPtr, shift it left four bits, add it to the LSB of someInt16 (capturing the carry), reload *someUnsignedCharPtr, shift it right four bits, and add it along with the earlier carry to the MSB of someInt16. Loading the value from *someUnsignedCharPtr twice would be faster than loading it, storing its value to a temporary location before doing the shift, and then having to load its value from that temporary location. If, however, someUnsignedCharPtr were to point to the lower byte of someInt16, then the modification of that lower byte before the second load of someUnsignedCharPtr would corrupt the upper bits of that byte which would, after shifing, be added to the upper byte of someInt16.
The Standard would allow a compiler to generate such code, even though character pointers are exempt from aliasing rules, because it does not require that compilers handle all situations where unsequenced reads and writes affect regions of storage that partially overlap. If such accesses were performed usinng a union instead of a character pointer, a compiler might recognize that the character-type access would always overlap the least significant byte of the 16-bit value, but I don't think the authors of the Standard wanted to require that compilers invest the time and effort that might be necessary to handle such obscure cases meaningfully.

Is store reordering allowed by C++ as-if rule?

The "as-if" rule basically defines what transformations an implementation is allowed to perform on a legal C++ program. In short, all transformations that do not affect a program's observable behavior are allowed.
As to what exactly "observable behavior" stands for, cppreference.com seems to have a different definition with the one given by the Standard, regarding input/output. I'm not sure if that's an reinterpretation of the Standard, or a mistake.
"as-if" rule by cppreference.com:
All input and output operations occur in the same order and with the
same content as if the program was executed as written.
"as-if" rule by the Standard:
The input and output dynamics of interactive devices shall take place
in such a fashion that prompting output is actually delivered before a
program waits for input. What constitutes an interactive device is
implementation-defined
This difference is important to me because I want to know if a normal store reordering is a valid compiler optimization or not. Per cppreference's wording, a memory store should belong to output operations it mentions. But according to the Standard, a memory store doesn't seem to be the output dynamics of interactive devices. (What's interactive devices anyway?)
An example to follow.
int A = 0;
int B = 0;
void foo()
{
A = B + 1; // (1)
B = 1; // (2)
}
A modern compiler may generate the following code for function foo:
mov 0x804a018, %eax
movl $0x1, 0x804a018 ; store 1 to B
add $0x1, %eax
mov %eax, 0x804a01c ; store 1 to A
ret
As seen, the store to A is reordered with the store to B. Is it compliant to the "as-if" rule? Is this kind of reordering permitted by the Standard?
If cppreference.com disagrees with the actual text of the C++ standard, cppreference.com is wrong. The only things that can supersede the text of the standard are a newer version of the standard, and official resolutions of defect reports (which sometimes get rolled up into documents called "technical corrigienda", which is a fancy name for a minor release of the standard).
However, in this case, you have misunderstood what cppreference.com means by "input and output operations". (If memory serves, that text is taken verbatim from an older version of the standard.) Stores to memory are NOT output operations. Only writing to a file (that is, any stdio.h or iostream output stream, or other implementation-defined mechanism, e.g. a Unix file descriptor) counts as output for purposes of this rule.
The C and C++ standards, prior to their 2011 revisions, assumed a single-threaded abstract machine, and therefore did not bother specifying anything about store ordering, because there was no way to observe stores out of program order. C(++)11 added a whole bunch of rules for store ordering as part of the new multithreading specification.
The real articulation of the as-if rule is found at §1.9/8 in the standard:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to one of the possible results that
execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting
output is actually delivered before a program waits for input. What constitutes an interactive device
is implementation-defined.
Since A and B are not volatile, this reordering is allowed.

Is it undefined behaviour to allocate overlarge stack structures?

This is a C specification question.
We all know this is legal C and should work fine on any platform:
/* Stupid way to count the length of a number */
int count_len(int val) {
char buf[256];
return sprintf(buf, "%d", val);
}
But this is almost guaranteed to crash:
/* Stupid way to count the length of a number */
int count_len(int val) {
char buf[256000000];
return sprintf(buf, "%d", val);
}
The difference is that the latter program blows the stack and will probably crash. But, purely semantically, it really isn't any different than the previous program.
According to the C spec, is the latter program actually undefined behavior? If so, what distinguishes it from the former? If not, what in the C spec says it's OK for a conforming implementation to crash?
(If this differs between C89/C99/C11/C++*, this would be interesting too).
Language standards for C(89,99,11) begin with a scope section with this wording (also found in some C++, C#, Fortran and Pascal standards):
This International Standard does not specify
the size or complexity of a program and its data that will exceed the capacity of any specific data-processing system or the capacity of a particular processor;
all minimal requirements of a data-processing system that is capable of supporting a conforming implementation.
The gcc compiler does offer an option to check for stack overflow at runtime
21.1 Stack Overflow Checking
For most operating systems, gcc does not perform stack overflow checking by default. This means that if the main environment task or some other task exceeds the available stack space, then unpredictable behavior will occur. Most native systems offer some level of protection by adding a guard page at the end of each task stack. This mechanism is usually not enough for dealing properly with stack overflow situations because a large local variable could “jump” above the guard page. Furthermore, when the guard page is hit, there may not be any space left on the stack for executing the exception propagation code. Enabling stack checking avoids such situations.
To activate stack checking, compile all units with the gcc option -fstack-check. For example:
gcc -c -fstack-check package1.adb
Units compiled with this option will generate extra instructions to check that any use of the stack (for procedure calls or for declaring local variables in declare blocks) does not exceed the available stack space. If the space is exceeded, then a Storage_Error exception is raised.
There was an attempt during the standardization process for C99 to make a stronger statement within the standard that while size and complexity are beyond the scope of the standard the implementer has a responsibility to document the limits.
The rationale was
The definition of conformance has always been a problem with the C
Standard, being described by one author as "not even rubber teeth, more
like rubber gums". Though there are improvements in C9X compared with C89,
many of the issues still remain.
This paper proposes changes which, while not perfect, hopefully improve the
situation.
The following wording was suggested for inclusion to section 5.2.4.1
translation or execution might fail if the size or complexity of a program or its data exceeds the capacity of the implementation.
The implementation shall document a way to determine if the size or complexity of a correct program exceeds or might exceed the capacity of the implementation.
5.2.4.1. An implementation is always free to state that a given program is
too large or too complex to be translated or executed. However, to stop
this being a way to claim conformance while providing no useful facilities
whatsoever, the implementer must show provide a way to determine whether a
program is likely to exceed the limits. The method need not be perfect, so
long as it errs on the side of caution.
One way to do this would be to have a formula which converted values such
as the number of variables into, say, the amount of memory the compiler
would need. Similarly, if there is a limit on stack space, the formula need
only show how to determine the stack requirements for each function call
(assuming this is the only place the stack is allocated) and need not work
through every possible execution path (which would be impossible in the
face of recursion). The compiler could even have a mode which output a
value for each function in the program.
The proposed wording did not make it into the C99 standard, and therefore this area remained outside the scope of the standard. Section 5.2.4.1 of C99 does list these limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
127 nesting levels of blocks
63 nesting levels of conditional inclusion
12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or incomplete type in a declaration
63 nesting levels of parenthesized declarators within a full declarator
63 nesting levels of parenthesized expressions within a full expression
63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character)
31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)
4095 external identifiers in one translation unit
511 identifiers with block scope declared in one block
4095 macro identifiers simultaneously defined in one preprocessing translation unit
127 parameters in one function definition
127 arguments in one function call
127 parameters in one macro definition
127 arguments in one macro invocation
4095 characters in a logical source line
4095 characters in a character string literal or wide string literal (after concatenation)
65535 bytes in an object (in a hosted environment only)
15 nesting levels for #included files
1023 case labels for a switch statement (excluding those for any nested switch statements)
1023 members in a single structure or union
1023 enumeration constants in a single enumeration
63 levels of nested structure or union definitions in a single struct-declaration-list
In C++, Annex B indicates that the maximum size of an object is an implementation-specific finite number. That would tend to limit arrays with automatic storage class.
However, I'm not seeing something specifically for space accumulated by all automatic variables on the call stack, which is where a stack overflow ought to be triggered. I'm also not seeing a recursion limit in Annex B, although that would be closely related.
The C standard is silent on all issues relating to stack overflow. This is a bit strange since it's very vocal in just about every other corner of C programming. AFAIK there is no specification that a certain amount of automatic storage must be available, and no way of detecting or recovering from exhaustion of the space available for automatic storage. The abstract machine is assumed to have an unlimited amount of automatic storage.
I believe the behavior is undefined by omission -- if a 250,000,000-byte local object actually exceeds the implementation's capacity.
Quoting the 2011 ISO C standard, section 1 (Scope), paragraph 2:
This International Standard does not specify
[...]
- the size or complexity of a program and its data that will exceed the capacity of any
specific data-processing system or the capacity of a particular processor
So the standard explicitly acknowledges that a program may exceed the capacity of an implementation.
I think we can safely assume that a program that exceeds the capacity of an implementation is not required to behave the same way as one that does not; otherwise there would be no point in mentioning it.
Since nothing in the standard defines the behavior of such a program, the behavior is undefined. This is specified by section 4 (Conformance), paragraph 2:
[...] Undefined behavior is otherwise indicated in this
International Standard by the words "undefined behavior" or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe "behavior
that is undefined".
Of course an implementation on most modern computers could easily allocate 250 million bytes of memory; that's only a small fraction of the available RAM on the computer I'm typing this on, for example. But many operating systems place a fairly low limit on the amount of stack space that a program can allocate.
(Incidentally, I'm assuming that the code in the question is a fragment of some complete program that actually calls the function. As it stands, the code has no behavior, since there's no call to count_len, nor is there a main function. I might not normally mention that, but you did use the "language-lawyer" tag.)
Anyone who would argue that the behavior is not undefined should explain either (a) why having the program crash does not make the implementation non-conforming, or (b) how a program crash is within the scope of defined behavior (even if it's implementation-defined or unspecified).

Why C/C++ does not defined expression evaluation order?

As you may know C/C++ does not specified expression evaluation order. What are the reasons to left them undefined.
It allows compiler optimizations.
One example would be the reordering of arithmetic instructions for the maximum usage of the ALUs or hiding memory latency with calculations.
One of the C/C++ design goals is the efficient implementation for complilers. So the compilers are given relatively free rein in choosing the evaluation order of various subexpression within a complicated expression; this order is not constrained by operator precedence and associativity as we think. In such situation when we a modifying the same vairiable in multiple sub expression, the behaviour become undefied. An increment or decrement operation is not guaranteed to be performed immediately after giving upthe previous value and before any other part of the expression is evalauted. The only guarantee is that the update will be performed before the expression is considered finished.
Undefined behavious means undefined and anything can happen.
Source:C programmig FAQ by Steve Summit