I'm fiddling a bit with Any and casting just to get a deeper understanding of Rust. From C# I'm used to the fact that casting can lead to runtime exceptions because casting in C# basically means Dear compiler, trust me, I know what I'm doing please cast this into an int32 because I know it will work.
However, if you're doing an invalid cast the program will blow up with an Exception at runtime. So I was wondering if casting in (safe) Rust can equally lead to a runtime exception.
So, I came up with this code to try it out.
use std::any::Any;
fn main() {
let some_int = 4;
let some_str = "foo";
{
let mut v = Vec::<&Any>::new();
v.push(&some_int);
v.push(&some_str);
// this gives a None
let x = v[0].downcast_ref::<String>();
println!("foo {:?}", x);
//this gives Some(4)
let y = v[0].downcast_ref::<i32>();
println!("foo {:?}", y);
//the compiler doesn't let me do this cast (which would lead to a runtime error)
//let z = v[1] as i32;
}
}
My observation so far is that the compiler seems to prevent me from this kind of runtime exceptions because I have to cast through downcast_ref which returns an Option which makes it safe again. Sure, I can unwrap on a None to blow it up but that's not my point ;)
The compiler prevents me from writing let z = v[1] as i32; which could lead to a runtime error. So, is it correct to assume that casting in safe Rust will never result in a runtime error?
I know that preventing runtime errors is exactly what Rust is all about so it makes perfect sense, I just want to validate my observation :)
Casting with as in Rust is very limited. It only allows casting between primitive numeric and character types, between pointers and references and for creating trait objects out of values of concrete types, and that's all - as is not overloadable, for example. Therefore, casting with as is always panic-free, though you can observe numeric overflows if you're casting a value which can't be represented in the target type, which may or may not be desirable.
In Rust there is no such thing as the cast operator from C# or Java. The closest thing to it would be std::mem::transmute() which is a lot like reinterpret_cast from C++. It is unsafe, however, and even it has its limitations - it can only transform values of types having the same size.
Well, that depends on how you define "runtime error" and "result in".
As Vladimir said as is just for primitive conversions that can't really fail. But there is currently (Rust 1.3) an evil little hole in this: casting floating point values to an integer.
If you try to cast a floating point value that doesn't have a corresponding integer value, the result is that you end up with an integer containing "something". A weird "something" that LLVM assumes cannot exist (because of course you checked that the conversion made sense before you did it). And compilers optimise based on things that can't happen.
The net result is that you can crash or corrupt memory by using really big values to create weird undefined integers that produce inconsistent results at runtime, which might include both controlled and uncontrolled crashes.
I mean, it's not supposed to do that, and the solution is probably going to involve making as panic, because what else do you do when someone asks the compiler to evaluate f32::NAN as i32?
Related
Here is a MWE of something I came across in some C++ code.
int a = (int)(b/c);
Why is (int) after the assignment operator?
Is it not recommended to use it like this?
This is simply a C-style typecast. It is used to make the author's intentions explicit, especially when the result of b/c is of another type (such as unsigned or float).
Without the cast, you will often get a compiler warning about an implicit conversion which can sometimes have consequences. By using the explicit cast, you are stating that you accept this conversion is fine within whatever other limits your program enforces, and the compiler will perform the conversion without emitting a warning.
In C++, we use static_cast<int>(b/c) to make the cast even more explicit and intentional.
This is a cast used to convert a variable or expression to a given type. In this case if b and c were floating point numbers, adding the cast (int) forces the result to an integer.
Specifically this is a "C style cast", modern C++ has some additional casts to given even more control (static_cast, dynamic_cast, const_cast etc)
It is not "(int) after the assignment operator".
It is "(int) before a float - the result of b/c".
It casts the float to an int.
This is a mistake. In the code:
int a = b/c;
then it may cause undefined behaviour if the result of the division is a floating point value that is out of range of int (e.g. it exceeds INT_MAX after truncation). Compilers may warn about this if you use warning flags.
Changing the code to int a = (int)(b/c); has no effect on the behaviour of the code, but it may cause the compiler to suppress the warning (compilers sometimes treat a cast as the programmer expressing the intent that they do not want to see the warning).
So now you just have silent undefined behaviour, unless the previous code is designed in such a way that the division result can never be out of range.
A better solution to the problem would be:
long a = std::lrint(b/c);
If the quotient is out of range then this will store an unspecified value in a and you can detect the error using floating point error handling. Reference for std::lrint
My code is something like:
BYTE *byteVar;
_CERT_TEMPLATE_EXT *structVar;
// assign byteVar
structVar = (_CERT_TEMPLATE_EXT*)byteVar;
// Here I would like to check if in the byteVar there is a _CERT_TEMPLATE_EXT * or not;
// (if the cast was successfull or not.)
I don't know what kind of cast I should use.
Assuming that you are trying to do crypto on Windows, and _CERT_TEMPLATE_EXT is as defined here, then there is no way you can reliably check.
You can look at structVar->pszObjId, and check if it looks like a valid OID. If not, things have gone badly wrong with your application, and you need to debug it. (If the pointer wasn't a CERT_TEMPLATE_EXT, the check will involve undefined behaviour - but so will using the converted value, so you are no worse off).
In general, you need reinterpret_cast to cast between unrelated pointer types, but at least GCC appears to allow just static_cast from unsigned char * (I don't have MSVC here to check readily.). Personally, I think I would stick to a big scary reinterpret_cast.
I was writing some code recently and found myself doing a lot of c-style casts, such as the following:
Client* client = (Client*)GetWindowLong(hWnd, GWL_USERDATA);
I thought to myself; why do we actually need to do these?
I can somewhat understand why this would be needed in circumstances where there is lot of code where the compiler may not what types can be converted to what, such as when using reflection.
but when casting from a long to a pointer where both types are of the same size, I don't understand why the compiler would not allow us to do this?
when casting from a long to a pointer where both types are of the same size, I don't understand why the compiler would not allow us to do this?
Ironically, this is the place where compiler's intervention is most important!
In vast majority of situations, converting between long and a pointer is a programming error which you don't want to go unnoticed, even if your platform allows it.
For example, when you write this
unsigned long *ptr = getLongPtr();
unsigned long val = ptr; // Probably an error
it is almost a certainty that you are missing an asterisk in front of ptr:
unsigned long val = *ptr; // This is what it should be
Finding errors like this without compiler's help is very hard, hence the compiler wants you to tell it that you know when you are doing on conversions like that.
Moreover, something that is fine on one platform may not work on other platforms. For example, an integral type and a pointer may have the same size on 32-bit platforms, but have different sizes on 64-bit platform. If you want to maintain any degree of portability, the compiler should warn you of the conversion even on the 32-bit platform, where the sizes are identical. Compiler warning will help you identify an error, and switch to a portable pointer-as-integer type intptr_t.
I think the idea is that we want compiler to tell us when we are doing something dodgy and/or potentially unintended. That way we don't do it by accident. So the compiler complains unless we explicitly tell the compiler that this is what we really, really want. We do that by using the a cast.
Edited to add:
It might be better to ask why we are allowed to cast between types. Originally C was created as a strongly typed language. Although it allows promotion/conversion between related object types (like between ints and floats) it is supposed to prevent access and assignment to the wrong type as a language feature, a safety measure. However occasionally this is useful so casting was put in the language to allow us to circumvent the type rules on those occasions when we need to.
Does there exist any implementation of C++ (and/or C) that guarantees that anytime undefined behavior is invoked, it will signal an error? Obviously, such an implementation could not be as efficient as a standard C++ implementation, but it could be a useful debugging/testing tool.
If such an implementation does not exist, then are there any practical reasons that would make it impossible to implement? Or is it just that no one has done the work to implement it yet?
Edit: To make this a little more precise: I would like to have a compiler that allows me to make the assertion, for a given run of a C++ program that ran to completion, that no part of that run involved undefined behavior.
Yes, and no.
I am fairly certain that for practical purposes, an implementation could make C++ a safe language, meaning every operation has well-defined behavior. Of course, this comes at a huge overhead and there is probably some cases where it's simply unfeasible, such as race conditions in multithreaded code.
Now, the problem is that this can't guarantee your code is defined in other implementations! That is, it could still invoke UB. For instance, observe the following code:
int a;
int* b;
int foo() {
a = 5;
b = &a;
return 0;
}
int bar() {
*b = a;
return 0;
}
int main() {
std::cout << foo() << bar() << std::endl;
}
According to the standard, the order that foo and bar are called is up to the implementation to decide. Now, in a safe implementation this order would have to be defined, likely being left-to-right evaluation. The problem is that evaluating right-to-left invokes UB, which wouldn't be caught until you ran it on an unsafe implementation. The safe implementation could simply compile each permutation of evaluation order or do some static analysis, but this quickly becomes unfeasible and possibly undecidable.
So in conclusion, if such an implementation existed it would give you a false sense of security.
The new C standard has an interesting list in the new Annex L with the crude title "Analyzability". It talks about UB that is so-called critical UB. This includes among others:
An object is referred to outside of its lifetime (6.2.4).
A pointer is used to call a function whose type is not compatible with the referenced
type
The program attempts to modify a string literal
All of these are UB that are impossible or very hard to capture, since they usually can't be completely tested at compile time. This is due to the fact that a valid C (or C++) program is composed of several compilation units that may not know much of each other. E.g if one program passes a pointer to a string literal into a function with a char* parameter, or even worse, a program that casts away const-ness from a static variable.
Two C interpreters that detect a large class of undefined behaviors for a large subset of sequential C are KCC
and Frama-C's value analysis. They are both used to make sure that automatically generated, automatically reduced random C programs are appropriate to report bugs in C compilers.
From the webpage for KCC:
One of the main aims of this work is the ability to detect undefined
programs (e.g., programs that read invalid memory).
A third interpreter for a dialect of C is CompCert's interpreter mode (a writeup). This one detects all behaviors that are undefined in the input language of the certified C compiler CompCert. The input language of CompCert is essentially C, but it renders defined some behaviors that are undefined in the standard (signed arithmetic overflow is defined as computing 2's complement results, for instance).
In truth, all three of the interpreters mentioned in this answer have had difficult choices to make in the name of pragmatism.
The whole point of defining something as "undefined behaviour" is to avoid having to detect this situation in the compiler. It is defined that way, so that compilers can be built for a wide variety of platforms and architectures, and so that the hardware and software doesn't have to have specific features "just to detect undefined behaviour". Imagine that you have a memory subsystem that can't detect whether you are writing to real memory or not - how would the compiler or runtime system detect that you have just done somepointer = rand(); *somepointer = 42;
You can detect SOME situations. But to require that ALL are detected, would make life very difficult.
Given the Edit in the original question: I still don't think this is plausible to achieve in C. There is so much freedom to do almost anything (making pointers to almost anything, these pointers can be converted, indexed, recalculated, and all manner of other things), and will be able to cause all manner of undefined behaviour.
There is a list of all undefined behaviour in C here - it lists 186 different circumstances of undefined behaviour, ranging from a backslash as the last character of the file (likely to cause compiler error, but not defined as one) to "The comparison function called by the bsearch or qsort function returns ordering values inconsistently".
How on earth do you write a compiler to check that the function passed into bsearch or qsort is ordering values consistently? Of course, if the data passed into the comparison function is of a simple type, such as integers, then it's not that difficult, but if the data type is a complex type such as
struct {
char name[20];
char street[20];
int age;
char post_code[10];
};
and the programmer decides to sort the data based on ascending name, ascending street, descending age and ascending postcode, in that order? If that's what you want, but somehow the code got messed up and post code comparison returns some inconsistant result, things will go wrong, but it's very hard to formally inspect that case. There are lots of others that are similarly obscure and complex. Sure, YOUR code may not sort names and addresses etc, but someone will probably write somethng like that at some point or another.
This is a C++ disaster, check out this code sample:
#include <iostream>
void func(const int* shouldnotChange)
{
int* canChange = (int*) shouldnotChange;
*canChange += 2;
return;
}
int main() {
int i = 5;
func(&i);
std::cout << i;
return 0;
}
The output was 7!
So, how can we make sure of the behavior of C++ functions, if it was able to change a supposed-to-be-constant parameter!?
EDIT: I am not asking how can I make sure that my code is working as expected, rather I am wondering how to believe that someone else's function (for instance some function in some dll library) isn't going to change a parameter or posses some behavior...
Based on your edit, your question is "how can I trust 3rd party code not to be stupid?"
The short answer is "you can't." If you don't have access to the source, or don't have time to inspect it, you can only trust the author to have written sane code. In your example, the author of the function declaration specifically claims that the code will not change the contents of the pointer by using the const keyword. You can either trust that claim, or not. There are ways of testing this, as suggested by others, but if you need to test large amounts of code, it will be very labour intensive. Perhaps moreso than reading the code.
If you are working on a team and you have a team member writing stuff like this, then you can talk to them about it and explain why it is bad.
By writing sane code.
If you write code you can't trust, then obviously your code won't be trustworthy.
Similar stupid tricks are possible in pretty much any language. In C#, you can modify the code at runtime through reflection. You can inspect and change private class members. How do you protect against that? You don't, you just have to write code that behaves as you expect.
Apart from that, write a unittest testing that the function does not change its parameter.
The general rule in C++ is that the language is designed to protect you from Murphy, not Machiavelli. In other words, its meant to keep a maintainance programmer from accidentally changing a variable marked as const, not to keep someone from deliberatly changing it, which can be done in many ways.
A C-style cast means all bets are off. It's sort of like telling the compiler "Trust me, I know this looks bad, but I need to do this, so don't tell me I'm wrong." Also, what you've done is actually undefined. Casting off const-ness and then modifying the value means the compiler/runtime can do anything, including e.g. crash your program.
The only thing I can suggest is to allocate the variable shouldNotChange from a memory page that is marked as read-only. This will force the OS/CPU to raise an error if the application attempts to write to that memory. I don't really recommend this as a general method of validating functions just as an idea you may find useful.
The simplest way to enforce this would be to just not pass a pointer:
void func(int shouldnotChange);
Now a copy will be made of the argument. The function can change the value all it likes, but the original value will not be modified.
If you can't change the function's interface then you could make a copy of the value before calling the function:
int i = 5;
int copy = i
func(©);
Don't use C style casts in C++.
We have 4 cast operators in C++ (listed here in order of danger)
static_cast<> Safe (When used to 'convert numeric data types').
dynamic_cast<> Safe (but throws exceptions/returns NULL)
const_cast<> Dangerous (when removing const).
static_cast<> Very Dangerous (When used to cast pointer types. Not a very good idea!!!!!)
reinterpret_cast<> Very Dangerous. Use this only if you understand the consequences.
You can always tell the compiler that you know better than it does and the compiler will accept you at face value (the reason being that you don't want the compiler getting in the way when you actually do know better).
Power over the compiler is a two edged sword. If you know what you are doing it is a powerful tool the will help, but if you get things wrong it will blow up in your face.
Unfortunately, the compiler has reasons for most things so if you over-ride its default behavior then you better know what you are doing. Cast is one the things. A lot of the time it is fine. But if you start casting away const(ness) then you better know what you are doing.
(int*) is the casting syntax from C. C++ supports it fully, but it is not recommended.
In C++ the equivalent cast should've been written like this:
int* canChange = static_cast<int*>(shouldnotChange);
And indeed, if you wrote that, the compiler would NOT have allowed such a cast.
What you're doing is writing C code and expecting the C++ compiler to catch your mistake, which is sort of unfair if you think about it.