How do I convert between numeric types safely and idiomatically? - casting

Editor's note: This question is from a version of Rust prior to 1.0 and references some items that are not present in Rust 1.0. The answers still contain valuable information.
What's the idiomatic way to convert from (say) a usize to a u32?
For example, casting using 4294967295us as u32 works and the Rust 0.12 reference docs on type casting say
A numeric value can be cast to any numeric type. A raw pointer value can be cast to or from any integral type or raw pointer type. Any other cast is unsupported and will fail to compile.
but 4294967296us as u32 will silently overflow and give a result of 0.
I found ToPrimitive and FromPrimitive which provide nice functions like to_u32() -> Option<u32>, but they're marked as unstable:
#[unstable(feature = "core", reason = "trait is likely to be removed")]
What's the idiomatic (and safe) way to convert between numeric (and pointer) types?
The platform-dependent size of isize / usize is one reason why I'm asking this question - the original scenario was I wanted to convert from u32 to usize so I could represent a tree in a Vec<u32> (e.g. let t = Vec![0u32, 0u32, 1u32], then to get the grand-parent of node 2 would be t[t[2us] as usize]), and I wondered how it would fail if usize was less than 32 bits.

Converting values
From a type that fits completely within another
There's no problem here. Use the From trait to be explicit that there's no loss occurring:
fn example(v: i8) -> i32 {
i32::from(v) // or v.into()
}
You could choose to use as, but it's recommended to avoid it when you don't need it (see below):
fn example(v: i8) -> i32 {
v as i32
}
From a type that doesn't fit completely in another
There isn't a single method that makes general sense - you are asking how to fit two things in a space meant for one. One good initial attempt is to use an Option — Some when the value fits and None otherwise. You can then fail your program or substitute a default value, depending on your needs.
Since Rust 1.34, you can use TryFrom:
use std::convert::TryFrom;
fn example(v: i32) -> Option<i8> {
i8::try_from(v).ok()
}
Before that, you'd have to write similar code yourself:
fn example(v: i32) -> Option<i8> {
if v > std::i8::MAX as i32 {
None
} else {
Some(v as i8)
}
}
From a type that may or may not fit completely within another
The range of numbers isize / usize can represent changes based on the platform you are compiling for. You'll need to use TryFrom regardless of your current platform.
See also:
How do I convert a usize to a u32 using TryFrom?
Why is type conversion from u64 to usize allowed using `as` but not `From`?
What as does
but 4294967296us as u32 will silently overflow and give a result of 0
When converting to a smaller type, as just takes the lower bits of the number, disregarding the upper bits, including the sign:
fn main() {
let a: u16 = 0x1234;
let b: u8 = a as u8;
println!("0x{:04x}, 0x{:02x}", a, b); // 0x1234, 0x34
let a: i16 = -257;
let b: u8 = a as u8;
println!("0x{:02x}, 0x{:02x}", a, b); // 0xfeff, 0xff
}
See also:
What is the difference between From::from and as in Rust?
About ToPrimitive / FromPrimitive
RFC 369, Num Reform, states:
Ideally [...] ToPrimitive [...] would all be removed in favor of a more principled way of working with C-like enums
In the meantime, these traits live on in the num crate:
ToPrimitive
FromPrimitive

Related

The best way in C++ to cast different signedness types each other?

There is an uint64_t data field sent by the communication peer, it carries an order ID that I need to store into a Postgresql-11 DB that do NOT support unsigned integer types. Although a real data may exceed 2^63, I think a INT8 filed in Postgresql11 can hold it, if I do some casting carefully.
Let's say there be:
uint64_t order_id = 123; // received
int64_t to_db; // to be writed into db
I plan to use one of the following methods to cast an uint64_t value into an int64_t value:
to_db = order_id; // directly assigning;
to_db = (int64_t)order_id; //c-style casting;
to_db = static_cast<int64_t>(order_id);
to_db = *reinterpret_cast<const int64_t*>( &order_id );
and when I need to load it from the db, I can do a reversed casting.
I know they all work, I'm just interested in which one meet the C++ standard the most perfectly.
In other words, which method will always work in whatever 64bit platform with whatever compiler?
Depends where it would be compiled and run... any of those not fully portable without C++20 support.
The safest way without that would be doing conversion yourself by changing range of values, something like that
int64_t to_db = (order_id > (uint64_t)LLONG_MAX)
? int64_t(order_id - (uint64_t)LLONG_MAX - 1)
: int64_t(order_id ) - LLONG_MIN;
uint64_t from_db = (to_db < 0)
? to_db + LLONG_MIN
: uint64_t(to_db) + (uint64_t)LLONG_MAX + 1;
If order_id is greater than (2^63 -1), then order_id - (uint64_t)LLONG_MAX - 1 yields a non-negative value. If not, then cast to signed is well defined and subtraction ensures values to be shifted into negative range.
During reverse conversion, to_db + LLONG_MIN places value into [0, ULLONG_MAX] range.
and do opposite on reading. Database platform or compiler you use may do something awful with binary representation of unsigned values when converting them to signed, not to mention that different format of signed do exist.
For same reason inter-platform protocols often involve use of string formatting or "least bit's value" for representing floating point values as integers, i.e. as encoded fixed point.
I would go with memcpy. It avoids (? see comments) undefined behavior and typically compilers optimize any byte copying away:
int64_t uint64_t_to_int64_t(uint64_t u)
{
int64_t i;
memcpy(&i, &u, sizeof(int64_t));
return i;
}
order_id = uint64_t_to_int64_t(to_db);
GCC with -O2 generated the optimal assembly for uint64_t_to_int64_t:
mov rax, rdi
ret
Live demo: https://godbolt.org/z/Gbvhzh
All four methods will always work, as long as the value is within range. The first will generate warnings on many compilers, so should probably not be used. The second is more a C idiom than a C++ idiom, but is widely used in C++. The last one is ugly and relies on subtle details from the standard, and should not be used.
This function seems UB-free
int64_t fromUnsignedTwosComplement(uint64_t u)
{
if (u <= std::numeric_limits<int64_t>::max()) return static_cast<int64_t>(u);
else return -static_cast<int64_t>(-u);
}
It reduces to a no-op under optimisations.
Conversion in the other direction is a straight cast to uint64_t. It is always well-defined.

Rust: How to cast from a signed integer type to a larger signed integer type *without* sign extension

Suppose we have an i8 which we want to cast to an i16 without sign extension.
We can't do us a simple as cast, as this will sign extend.
println!("{:b}", -128i8); // 10000000 <- we want this zero-extended in an i16.
println!("{:b}", -128i8 as i16); // 1111111110000000 sign extended :(
We cannot transmute, because the types are of different size:
println!("{:b}", unsafe {mem::transmute::<_, i16>(128i8)});
// ^ error[E0512]: cannot transmute between types of different sizes, or dependently-sized types :(
The best I've come up with is the following convoluted casting chain:
println!("{:b}", -128i8 as u8 as u16 as i16); // 10000000 :), but :( because convoluted.
The intermediate cast to u8 means that casting up to a u16 will zero-extend instead of sign extend, then casting from u16 to i16 is fine, as the types are the same size and no extension is required.
But there must be a better way? Is there?
Keeping in mind that for this case (printing the number's bits) you can just do the unsigned conversion.
However in general Rust doesn't really like doing implicit conversions, but you can always write
let n : i8 = -128;
let m : i32 = n as u8 as i32;
You pretty much can't get better than this in general, as double casting is common in fields like changing pointer's types. Also consider not using unsafe when the operation you are doing can be done safely without downsides (aside from maybe a slight code smell).

Encoding tagged unions (sum types) in LLVM

I'm trying to encode a tagged union (also known as a sum type) in LLVM and it doesn't seem possible while keeping the compiler frontend platform agnostic. Imagine I had this tagged union (expressed in Rust):
enum T {
C1(i32, i64),
C2(i64)
}
To encode this in LLVM I need to know know the size of the largest variant. That in turn requires that I know the alignment and size of all fields. In other words, my frontend would need to
track the size and alignment of all things,
create a dummy struct (properly padded) to represent the biggest type that can fit any variant (e.g. {[2 x i64]}, assuming the tag can fit in the same word as the i32 field),
and finally either used packed structs or tell LLVM which "data layout" I assumed, so my computations matches LLVMs
What is the best way currently to encode tagged unions in LLVM?
Conceptually, I don't think there's a better way than what you described, except that I wouldn't bother with using the constructed type at the declaration site, since actually accessing the union would be easiest to do through a bitcast anyway.
Here's a code snippet from Clang's getTypeExpansion(), showing it also does this - manually finding the largest field:
const FieldDecl *LargestFD = nullptr;
CharUnits UnionSize = CharUnits::Zero();
for (const auto *FD : RD->fields()) {
// Skip zero length bitfields.
if (FD->isBitField() && FD->getBitWidthValue(Context) == 0)
continue;
assert(!FD->isBitField() &&
"Cannot expand structure with bit-field members.");
CharUnits FieldSize = Context.getTypeSizeInChars(FD->getType());
if (UnionSize < FieldSize) {
UnionSize = FieldSize;
LargestFD = FD;
}
}
if (LargestFD)
Fields.push_back(LargestFD);

C++: Why is the value assignment interpretation always int?

I'd like to assign a value to a variable like this:
double var = 0xFFFFFFFF;
As a result var gets the value 65535.0 assigned. Since the compiler assumes a 64bit target system the number literal (i.e. all respective 32 bits) is interpreted significand precision bits. However, since 0xFFFF FFFF is just a notation for a bit pattern, without any hint about the representation, it could be quite differently interpreted w.r.t. becoming a floating point value. Thus, I was wondering if there is a way to manipulate this fixed interpretation of the value. In other words, give a hint about the desired representation. (Maybe someone could also point me to part in the standard where this implicit interpretation is defined).
So far, the default precision interpretation on my system seems to be
(int)0xFFFFFFFF x 100.
Only the fraction field is getting filled1.
So maybe (here: for 16 bit cross-compilation) I want it to be a different representation like:
(int)0xFFFFFF x 10(int)0xFF
(ignoring the sign bit for a moment).
Thus my question: How can I force a custom double interpretation of the hex literal notation?
1 Even when my hex literal would be 0xFFFF FFFF FFFF FFFF the value is only interpreted as the fraction part - so clearly, bits should be used for exponent and sign field. But it seems the literal gets just cut off.
C++ doesn't specify the in-memory representation for double, moreover, it doesn't even specify the in-memory representation of integer types (and it can really be different on systems with different endings). So if you want to interpret bytes 0xFF, 0xFF as a double, you can do something like:
uint8_t bytes[sizeof(double)] = {0xFF, 0xFF};
double var;
memcpy(&var, bytes, sizeof(double));
Note that using unions or reinterpret_casting pointers is, strictly speaking, undefined behavior, though in practice also works.
"I was wondering if there is a way to manipulate this interpretation."
Yes, you can use a reinterpret_cast<double&> via address, to force type (re-)interpretation from a certain bit pattern in memory.
"Thus my question: How can I force double interpretation of the hex notation?"
You can also use a union, to make it clearer:
union uint64_2_double {
uint64_t bits;
double dValue;
};
uint64_2_double x;
x.bits = 0x000000000000FFFF;
std::cout << x.dValue << std::endl;
There does not seem to be a direct way to initialize a double variable with an hexadecimal pattern, the c-style cast is equivalent to a C++ static_cast and the reinterpret_cast will complain it can't perform the conversion. I will give you two options, one simple solution but that will not initialize directly the variable, and a complicated one. You can do the following:
double var;
*reinterpret_cast<long *>(&var) = 0xFFFF;
Note: watch out that I would expect you to want to initialize all 64 bits of the double, your constant 0xFFFF seems small, it gives 3.23786e-319
A literal value that begins with 0x is an hexadecimal number of type unsigned int. You should use the suffix ul to make it a literal of unsigned long, which in most architectures will mean a 64 bit unsigned; or, #include <stdint.h> and do for example uint64_t(0xABCDFE13)
Now for the complicated stuff: In old C++ you can program a function that converts the integral constant to a double, but it won't be constexpr.
In constexpr functions you can't make reinterpret_cast. Then, your only choice to make a constexpr converter to double is to use an union in the middle, for example:
struct longOrDouble {
union {
unsigned long asLong;
double asDouble;
};
constexpr longOrDouble(unsigned long v) noexcept: asLong(v) {}
};
constexpr double toDouble(long v) { return longOrDouble(v).asDouble; }
This is a bit complicated, but this answers your question. Now, you can write:
double var = toDouble(0xFFFF)
And this will insert the given binary pattern into the double.
Using union to write to one member and read from another is undefined behavior in C++, there is an excellent question and excellent answers on this right here:
Accessing inactive union member and undefined behavior?

How to cast a byte array to a primitive type in Rust?

How would I cast a byte array to a primitive type in Rust?
let barry = [0, 0];
let shorty: u16 = barry;
I have already tried let shorty: u16 = barry as u16; but that didn't work due to a "non scalar cast".
You can use bitwise operations. Note that this depends on endianess.
fn main() {
let barry = [0, 0];
let shorty: u16 = barry[0] | (barry[1] << 8);
println!("{0}", shorty);
}
There is an unsafe way to do it with raw pointers too. The benefit being that it works for any type that only exists on the stack. It's completely safe as long as the byte array is correctly formatted, since no pointers are being kept around or modified. Just make sure that no mutable references also exists while this is going on, I would recommend wrapping it into a function that takes a reference to a byte array and a type parameter (with the Clone trait) so that the borrow checker can deal with that case.
let bytes = [0u8, 0u8];
let value = unsafe {
let byte_ptr = bytes.as_ptr();
let ptr = byte_ptr as *const u16; // This could be any type
(*ptr).clone() // We clone it so that no
// matter what type it is, it gets dereferenced.
};
The byteorder crate is great for this. You can specify endianness, etc.