Why GCC warns when casting an uninitialized volatile pointer to `void`? - c++

Consider following:
int *volatile x;
(void)x;
GCC (from 5.x to 7.x) complains about it when -Wall is enabled:
warning: 'x' is used uninitialized in this function [-Wuninitialized]
The clang is silent about it.
For some reason, removing the volatile eliminates the warning.
Does the standard say that casting a volatile pointer even to void is undefined, while casting a normal pointer is fine? Or is that a GCC bug?
Disclaimer: The question is tagged as C/C++ on purpose. The GCC gives the same warning for both languages, and I'm interested of there is any difference.

One of the behaviours of volatile for plain old data type like int * is to prevent the compiler from optimizing away the reading and writing to the variable. Please notice that int * here could be whatever like float or int.
So (void)x is meaning "read x and do nothing with the result" because x is volatile. If you read x and it's not pinned to a fixed position in memory (which the compiler might not know, only the linker does), then you're actually using it uninitialized.
If it's not volatile, although the compiler might read x anyway, it will likely avoid/optimize this (since it's a no-op), and silent the warning.
clang takes the safe road here, and since the linker directive could pin the variable x to some position (without clang knowing about it), consider that it's not worth triggering a warning without more evidence it's an issue.

If the variable is declared volatile then to cast away the volatile, just as it is undefined behaviour to cast away the const from a variable declared const. See Annex J.2 of the C Standard:
The behavior is undefined in the following circumstances:
— An attempt is made to refer to an object defined with a volatile-qualified type through
use of an lvalue with non-volatile-qualified type (6.7.3).
Somewhere I have read and noted down about rules of using volatile are:
Use volatile for variables that might change "unexpectedly".
Use volatile for automatic variables in routines that use setjmp().
To force volatile semantics on a particular access, take the address of the variable and cast it to (volatile WHATEVER *), dereferencing the cast expression to get the value.
Sometimes volatile is a reasonable way to get around problems with code generation in compilers that have conformance problems in some areas, eg, the gcc compiler on x86 with semantics of assigning or casting to double. Don't do this just haphazardly, since if it's unnecessary code quality will very likely go down.
Unless you really know what you're doing and why you're doing it, if you're using volatile you're likely doing something wrong. Try to find another way to solve the problem, and if you still have to use volatile code
up a nice small example and post to comp.lang.c and ask for helpful suggestions.

Any access to a volatile object is part of your program's observable behavior in both C and C++. Observable behavior is an important concept in both C and C++.
Your code formally reads a volatile pointer x. I would guess that GCC considers it a rather serious issue when part of program observable behavior involves an uninitialized value.
The moment you remove volatile, reading of x ceases to be a part of observable behavior. Hence the warning disappears as well.

For C++, this is controlled by [expr]/12, which says that (void) x applies the lvalue-to-rvalue conversion to x (i.e., reads the value of x) only if x is a glvalue of volatile-qualified type. Therefore, if x is volatile-qualified, then (void) x reads its value (which yields an indeterminate value and triggers undefined behavior). If x isn't volatile-qualified, then (void) x doesn't apply the lvalue-to-rvalue conversion and the behavior is well-defined.

If the misaligned pointer is dereferenced, the program may terminate abnormally. On some architectures, the cast alone may cause a loss of information even if the value is not dereferenced if the types involved have differing alignment requirements.
The C Standard, 6.3.2.3, paragraph 7 say's:
A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned for the referenced type, the behavior
is undefined.

Related

Unaligned access through reinterpret_cast

I'm in the middle of a discussion trying to figure out whether unaligned access is allowable in C++ through reinterpret_cast. I think not, but I'm having trouble finding the right part(s) of the standard which confirm or refute that. I have been looking at C++11, but I would be okay with another version if it is more clear.
Unaligned access is undefined in C11. The relevant part of the C11 standard (§ 6.3.2.3, paragraph 7):
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Since the behavior of an unaligned access is undefined, some compilers (at least GCC) take that to mean that it is okay to generate instructions which require aligned data. Most of the time the code still works for unaligned data because most x86 and ARM instructions these days work with unaligned data, but some don't. In particular, some vector instructions don't, which means that as the compiler gets better at generating optimized instructions code which worked with older versions of the compiler may not work with newer versions. And, of course, some architectures (like MIPS) don't do as well with unaligned data.
C++11 is, of course, more complicated. § 5.2.10, paragraph 7 says:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.
Note that the last word is "unspecified", not "undefined". § 1.3.25 defines "unspecified behavior" as:
behavior, for a well-formed program construct and correct data, that depends on the implementation
[Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard. — end note]
Unless I'm missing something, the standard doesn't actually delineate the range of possible behaviors in this case, which seems to indicate to me that one very reasonable behavior is that which is implemented for C (at least by GCC): not supporting them. That would mean the compiler is free to assume unaligned accesses do not occur and emit instructions which may not work with unaligned memory, just like it does for C.
The person I'm discussing this with, however, has a different interpretation. They cite § 1.9, paragraph 5:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
Since there is no undefined behavior, they argue that the C++ compiler has no right to assume unaligned access don't occur.
So, are unaligned accesses through reinterpret_cast safe in C++? Where in the specification (any version) does it say?
Edit: By "access", I mean actually loading and storing. Something like
void unaligned_cp(void* a, void* b) {
*reinterpret_cast<volatile uint32_t*>(a) =
*reinterpret_cast<volatile uint32_t*>(b);
}
How the memory is allocated is actually outside my scope (it is for a library which can be called with data from anywhere), but malloc and an array on the stack are both likely candidates. I don't want to place any restrictions on how the memory is allocated.
Edit 2: Please cite sources (i.e., the C++ standard, section and paragraph) in answers.
Looking at 3.11/1:
Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated.
There's some debate in comments about exactly what constitutes allocating an object of a type. However I believe the following argument works regardless of how that discussion is resolved:
Take *reinterpret_cast<uint32_t*>(a) for example. If this expression does not cause UB, then (according to the strict aliasing rule) there must be an object of type uint32_t (or int32_t) at the given location after this statement. Whether the object was already there, or this write created it, does not matter.
According to the above Standard quote, objects with alignment requirement can only exist in a correctly aligned state.
Therefore any attempt to create or write an object that is not correctly aligned causes UB.
EDIT This answers the OP's original question, which was "is accessing a misaligned pointer safe". The OP has since edited their question to "is dereferencing a misaligned pointer safe", a far more practical and less interesting question.
The round-trip cast result of the pointer value is unspecified under those circumstances. Under certain limited circumstances (involving alignment), converting a pointer to A to a pointer to B, and then back again, results in the original pointer, even if you didn't have a B in that location.
If the alignment requirements are not met, than that round trip -- the pointer-to-A to pointer-to-B to pointer-to-A results in a pointer with an unspecified value.
As there are invalid pointer values, dereferencing a pointer with an unspecified value can result in undefined behavior. It is no different than *(int*)0xDEADBEEF in a sense.
Simply storing that pointer is not, however, undefined behavior.
None of the above C++ quotes talk about actually using a pointer-to-A as a pointer-to-B. Using a pointer to the "wrong type" in all but a very limited number of circumstances is undefined behavior, period.
An example of this involves creating a std::aligned_storage_t<sizeof(T), alignof(T)>. You can construct your T in that spot, and it will live their happily, even though it "actually" is an aligned_storage_t<sizeof(T), alignof(T)>. (You may, however, have to use the pointer returned from the placement new for full standard compliance; I am uncertain. See strict aliasing.)
Sadly, the standard is a bit lacking in terms of what object lifetime is. It refers to it, but does not define it well enough last I checked. You can only use a T at a particular location while a T lives there, but what that means is not made clear in all circumstances.
All of your quotes are about the pointer value, not the act of dereferencing.
5.2.10, paragraph 7 says that, assuming int has a stricter alignment than char, then the round trip of char* to int* to char* generates an unspecified value for the resulting char*.
On the other hand, if you convert int* to char* to int*, you are guaranteed to get back the exact same pointer as you started with.
It doesn't talk about what you get when you dereference said pointer. It simply states that in one case, you must be able to round trip. It washes its hands of the other way around.
Suppose you have some ints, and alignof(int) > 1:
int some_ints[3] ={0};
then you have an int pointer that is offset:
int* some_ptr = (int*)(((char*)&some_ints[0])+1);
We'll presume that copying this misaligned pointer doesn't cause undefined behavior for now.
The value of some_ptr is not specified by the standard. We'll be generous and presume it actually points to some chunk of bytes within some_bytes.
Now we have a int* that points to somewhere an int cannot be allocated (3.11/1). Under (3.8) the use of a pointer to an int is restricted in a number of ways. Usual use is restricted to a pointer to an T whose lifetime has begun allocated properly (/3). Some limited use is permitted on a pointer to a T which has been allocated properly, but whose lifetime has not begun (/5 and /6).
There is no way to create an int object that does not obey the alignment restrictions of int in the standard.
So the theoretical int* which claims to point to a misaligned int does not point to an int. No restrictions are placed on the behavior of said pointer when dereferenced; usual dereferencing rules provide behavior of a valid pointer to an object (including an int) and how it behaves.
And now our other assumptions. No restrictions on the value of some_ptr here are made by the standard: int* some_ptr = (int*)(((char*)&some_ints[0])+1);.
It is not a pointer to an int, much like (int*)nullptr is not a pointer to an int. Round tripping it back to a char* results in a pointer with unspecified value (it could be 0xbaadf00d or nullptr) explicitly in the standard.
The standard defines what you must do. There are (nearly? I guess evaluating it in a boolean context must return a bool) no requirements placed on the behavior of some_ptr by the standard, other than converting it back to char* results in an unspecified value (of the pointer).

Optimizer removing pointer de-reference lines

I have a problem where the optimizer seems to be removing lines of code that are quite necessary. Some background: I have a program that interfaces a PCIe driver. I have an integer pointer UINT32 *bar_reg; that points to the user space address of the BAR register I am communicating to. To write to the register I just de-reference the pointer. *(bar_reg + OFFSET) = value;
With no optimizations, this works fine. However as soon as I turn on any level of optimization, all the lines that de-reference the pointer get removed. The way I finally discovered this was by stepping through in Visual Studio. However it happens independent of platform. I've been able to get by up to now with the optimizer off, but someone using my library code in Linux wants to turn on optimization now. So I'm curious as to why this problem occurs and what the most reasonable fix/workaround is.
Use volatile keyword in order to prevent the optimization of that variable.
For example:
volatile UINT32 *bar_reg;
The issue here is that the compiler assumes that since the memory is not accessed by the program, then it means that the memory will remain unchanged and hence it might try to optimize some of the writes to that memory.
The issue you are running into involves the as-if rule which allows the optimizer to transform your code in any way as long as it does not effect the observable behavior of the program.
So if you only write to the variable but never actually use in your program the optimizer believes there no observable behavior and assumes it can validly optimize away the writes.
In your case the data is being observed outside of your program and the way to indicate this to the compiler and optimizer is through the volatile qualifier, cppreference tells us (emphasis mine going forwrd):
an object whose type is volatile-qualified, or a subobject of a
volatile object, or a mutable subobject of a const-volatile object.
Every access (read or write operation, member function call, etc.) on
the volatile object is treated as a visible side-effect for the
purposes of optimization [...]
For reference the as-if rule is covered in the draft C++ standard in section 1.9 which says:
[...]Rather, conforming implementations are required to emulate (only)
the observable behavior of the abstract machine as explained below.
and with respect to the as-if rule volatile is also covered in section 1.9 and it says:
Access to volatile objects are evaluated strictly according to the
rules of the abstract machine.

Initialization of boolean variables

This question might seem naive (hell, I think it is) but I am unable to find an answer that satisfies me.
Take this simple C++ program:
#include<iostream>
using namespace std;
int main ()
{
bool b;
cout << b;
return 0;
}
When compiled and executed, it always prints 0.
The problem is that is not what I'm expecting it to do: as far as I know, a local variable has no initialization value, and I believe that a random byte has more chances of being different rather than equal to 0.
What am I missing?
That is undefined behavior, because you are using the value of an uninitialized variable. You cannot expect anything out of a program with undefined behavior.
In particular, your program necessitates a so-called lvalue-to-rvalue conversion when initializing the parameter of operator << from b. Paragraph 4.1/1 of the C++11 Standard specifies:
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete
type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program
that necessitates this conversion has undefined behavior. If T is a non-class type, the type of the prvalue is
the cv-unqualified version of T. Otherwise, the type of the prvalue is T.
The behaviour is undefined; there is no requirement for it to be assigned a random value, and certainly not a uniformly-distributed one.
What is probably happening is that the memory allocated to the process is zero-initialised by the operating system, and this is the first time that that byte is used, so it still contains zero.
But, like all undefined behaviour, you can't rely on it and there's little point speculating about the details.
As Andy said, it's undefined behaviour. I think the fact that you are so lucky and always receive 0 is implementation defined. Probably the stack is empty and clean (initialized with zeros) when you program starts. So it happens that you get zero when allocation a variable there.
This may be guaranteed to succeed in your current implementation (as others said, maybe it initializes the stack with zeros) but it's also guaranteed to fail in, say, a Visual C++ debug build (which initializes local variables to 0xCCCCCCCC).
C++ static and global variables are initialized by default as C89 and C99 says - and specifically arithmetic variables are initialized by 0.
Auto variables are indeterminate which by C89 and C99 at 3.17.2 means that they are either an unspecified value or a trap representation. Trap representation in the context of bool type might mean 0 - this is compiler specific.

Undefined behaviour with const_cast

I was hoping that someone could clarify exactly what is meant by undefined behaviour in C++. Given the following class definition:
class Foo
{
public:
explicit Foo(int Value): m_Int(Value) { }
void SetValue(int Value) { m_Int = Value; }
private:
Foo(const Foo& rhs);
const Foo& operator=(const Foo& rhs);
private:
int m_Int;
};
If I've understood correctly the two const_casts to both a reference and a pointer in the following code will remove the const-ness of the original object of type Foo, but any attempts made to modify this object through either the pointer or the reference will result in undefined behaviour.
int main()
{
const Foo MyConstFoo(0);
Foo& rFoo = const_cast<Foo&>(MyConstFoo);
Foo* pFoo = const_cast<Foo*>(&MyConstFoo);
//MyConstFoo.SetValue(1); //Error as MyConstFoo is const
rFoo.SetValue(2); //Undefined behaviour
pFoo->SetValue(3); //Undefined behaviour
return 0;
}
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined. I know that const_casts are, broadly speaking, frowned upon, but I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed, for example:
Foo& rAnotherFoo = (Foo&)MyConstFoo;
Foo* pAnotherFoo = (Foo*)&MyConstFoo;
rAnotherFoo->SetValue(4);
pAnotherFoo->SetValue(5);
In what circumstances might this behaviour cause a fatal runtime error? Is there some compiler setting that I can set to warn me of this (potentially) dangerous behaviour?
NB: I use MSVC2008.
I was hoping that someone could clarify exactly what is meant by undefined behaviour in C++.
Technically, "Undefined Behaviour" means that the language defines no semantics for doing such a thing.
In practice, this usually means "don't do it; it can break when your compiler performs optimisations, or for other reasons".
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
In this specific example, attempting to modify any non-mutable object may "appear to work", or it may overwrite memory that doesn't belong to the program or that belongs to [part of] some other object, because the non-mutable object might have been optimised away at compile-time, or it may exist in some read-only data segment in memory.
The factors that may lead to these things happening are simply too complex to list. Consider the case of dereferencing an uninitialised pointer (also UB): the "object" you're then working with will have some arbitrary memory address that depends on whatever value happened to be in memory at the pointer's location; that "value" is potentially dependent on previous program invocations, previous work in the same program, storage of user-provided input etc. It's simply not feasible to try to rationalise the possible outcomes of invoking Undefined Behaviour so, again, we usually don't bother and instead just say "don't do it".
What is puzzling me is why this appears to work and will modify the original const object but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
As a further complication, compilers are not required to diagnose (emit warnings/errors) for Undefined Behaviour, because code that invokes Undefined Behaviour is not the same as code that is ill-formed (i.e. explicitly illegal). In many cases, it's not tractible for the compiler to even detect UB, so this is an area where it is the programmer's responsibility to write the code properly.
The type system — including the existence and semantics of the const keyword — presents basic protection against writing code that will break; a C++ programmer should always remain aware that subverting this system — e.g. by hacking away constness — is done at your own risk, and is generally A Bad Idea.™
I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed.
Absolutely. With warning levels set high enough, a sane compiler may choose to warn you about this, but it doesn't have to and it may not. In general, this is a good reason why C-style casts are frowned upon, but they are still supported for backwards compatibility with C. It's just one of those unfortunate things.
Undefined behaviour depends on the way the object was born, you can see Stephan explaining it at around 00:10:00 but essentially, follow the code below:
void f(int const &arg)
{
int &danger( const_cast<int&>(arg);
danger = 23; // When is this UB?
}
Now there are two cases for calling f
int K(1);
f(k); // OK
const int AK(1);
f(AK); // triggers undefined behaviour
To sum up, K was born a non const, so the cast is ok when calling f, whereas AK was born a const so ... UB it is.
Undefined behaviour literally means just that: behaviour which is not defined by the language standard. It typically occurs in situations where the code is doing something wrong, but the error can't be detected by the compiler. The only way to catch the error would be to introduce a run-time test - which would hurt performance. So instead, the language specification tells you that you mustn't do certain things and, if you do, then anything could happen.
In the case of writing to a constant object, using const_cast to subvert the compile-time checks, there are three likely scenarios:
it is treated just like a non-constant object, and writing to it modifies it;
it is placed in write-protected memory, and writing to it causes a protection fault;
it is replaced (during optimisation) by constant values embedded in the compiled code, so after writing to it, it will still have its initial value.
In your test, you ended up in the first scenario - the object was (almost certainly) created on the stack, which is not write protected. You may find that you get the second scenario if the object is static, and the third if you enable more optimisation.
In general, the compiler can't diagnose this error - there is no way to tell (except in very simple examples like yours) whether the target of a reference or pointer is constant or not. It's up to you to make sure that you only use const_cast when you can guarantee that it's safe - either when the object isn't constant, or when you're not actually going to modify it anyway.
What is puzzling me is why this appears to work
That is what undefined behavior means.
It can do anything including appear to work.
If you increase your optimization level to its top value it will probably stop working.
but doesn't even prompt me with a warning to notify me that this behaviour is undefined.
At the point it were it does the modification the object is not const. In the general case it can not tell that the object was originally a const, therefore it is not possible to warn you. Even if it was each statement is evaluated on its own without reference to the others (when looking at that kind of warning generation).
Secondly by using cast you are telling the compiler "I know what I am doing override all your safety features and just do it".
For example the following works just fine: (or will seem too (in the nasal deamon type of way))
float aFloat;
int& anIntRef = (int&)aFloat; // I know what I am doing ignore the fact that this is sensable
int* anIntPtr = (int*)&aFloat;
anIntRef = 12;
*anIntPtr = 13;
I know that const_casts are, broadly speaking, frowned upon
That is the wrong way to look at them. They are a way of documenting in the code that you are doing something strange that needs to be validated by smart people (as the compiler will obey the cast without question). The reason you need a smart person to validate is that it can lead to undefined behavior, but the good thing you have now explicitly documented this in your code (and people will definitely look closely at what you have done).
but I can imagine a case where lack of awareness that C-style cast can result in a const_cast being made could occur without being noticed, for example:
In C++ there is no need to use a C style cast.
In the worst case the C-Style cast can be replaced by reinterpret_cast<> but when porting code you want to see if you could have used static_cast<>. The point of the C++ casts is to make them stand out so you can see them and at a glance spot the difference between the dangerous casts the benign casts.
A classic example would be trying to modify a const string literal, which may exist in a protected data segment.
Compilers may place const data in read only parts of memory for optimization reasons and attempt to modify this data will result in UB.
Static and const data are often stored in another part of you program than local variables. For const variables, these areas are often in read-only mode to enforce the constness of the variables. Attempting to write in a read-only memory results in an "undefined behavior" because the reaction depends on your operating system. "Undefined beheavior" means that the language doesn't specify how this case is to be handled.
If you want a more detailed explanation about memory, I suggest you read this. It's an explanation based on UNIX but similar mecanism are used on all OS.

Removing volatile nature using const_cast

I heard that volatile nature of a variable can be removed using const_cast operator.
In which scenarios we need to remove volatile nature of a variable ?
are there any good use cases ?
Is it dangerours operation, because we declared it as volatile thinking that it would be modified by external factors and removing volatile nature could stop modifications to it. Specially when volatile pointers are registers etc.
The moment you do that, behavior is undefined. Note that removing volatile from an expression that really refers to a non-volatile variable and removing volatile from an expression that refers to a volatile variable are different. The latter thing is what you asked about, and it causes undefined behavior. The Standard laws
If an attempt is made to refer to an object defined with a volatile-qualified type through the use of an lvalue with a non-volatile-qualified type, the program behaviour is undefined.