How to interpret an unsigned integer as a signed one and vice versa in Crystal? - crystal-lang

In C, to interpret the very bits of an integer as that of an integer of another type (not casting the integer, but rather using the same bits for a different type), one can do this:
uint8_t x = 0xFF;
int8_t y = *((int8_t*) &x);
The best I can come up with in Crystal is this, but it's highly unsafe, and y only lives as long as x does (and in fact, you can't reuse the same variable).
x = 0xFF_u8
y = pointerof(x).as(Int8*).value
Is there an idiomatic and safe way to do this conversion in Crystal? I wouldn't mind the value being copied.
(And yeah, you betcha I'm writing an emulator in Crystal!)

You can use unsafe_as, for example:
x = 0xFF_u8 # 255 : Uint8
y = x.unsafe_as(Int8) # -1 : Int8
But as the name says you have be careful when using it. To quote the documentation:
This method is unsafe because it behaves unpredictably when the given type doesn't have the same bytesize as the receiver, or when the given type representation doesn't semantically match the underlying bytes.
If you want to detect overloads, prefer to_i8.
x = 0xFF_u8
z = x.to_i8 # Arithmetic overflow
You have to decide whether the first version is safe enough for your use case. While the C code in your question might not be undefined behavior, it should be implementation dependent behavior. That means, it is not portable in a strict sense, but you should get consistent results for a specific architecture. And given that two's-complement is now practically a standard, I would expect to see the same behavior on most of today's hardware. The Crystal code should fall into the same category.

Related

Why can't I put a float into a ptr of any type without any kind of conversion going on?

I'm currently writing a runtime for my compiler project and I want general and easy to use struct for encoding different types (the source language is scheme).
My current approach is:
struct SObj {
SType type;
uint64_t *value;
};
Pointer are always 64 or 32 bit wide, so shouldn't it be possible to literally put a float into my value? Then, if I want the actual value of the float, I just take the raw bytes and interprete them as a float.
Thanks in advance.
Not really.
When you write C++ you're programming an abstraction. You're describing a program. Contrary to popular belief, it's not "all just bytes".
Compilers are complex. They can, and will, assume that you follow the rules, and use that assumption to produce the most efficient "actual" code (read: machine code) possible.
One of those rules is that a uint64_t* is a pointer that points to a uint64_t. When you chuck arbitrary bits into there — whether they are identical to the bits that form a valid float, or something else — it is no longer a valid pointer, and simply evaluating it has undefined behaviour.
There are language facilities that can do what you want, like union. But you have to be careful not to violate aliasing rules. You'd store a flag (presumably, that's what your type is) that tells you which union member you're using. Make life easier and have a std::variant instead, which does all this for you.
That being said, you can std::memcpy/std::copy bits in and copy bits out, in say a uint64_t as long as they are a valid representation of the type you've chosen on your system. Just don't expect reinterpret_cast to be valid: it won't be.
Pointer are always 64 or 32 bit wide
No.
so shouldn't it be possible to literally put a float into my value?
Yes, that is possible, although that would be very strongly advised against. C++ has many, many other facilities so you do not have to resort such things yourself. Anyway, you can interpret the bytes inside a pointer as another type. Like this:
static_assert(sizeof(float*) >= sizeof(float));
static_assert(std::is_pod<float>::value == true); // overdramatic
float *ptr; // just allocate sizeof(float*) bytes on stack
float a = 5;
// use the memory of the pointer to store float value
std::memcpy(&ptr, &a, sizeof(float));
float b;
std::memcpy(&b, &ptr, sizeof(float));
a == b; // true

Correct way to initialize a complex array to zero

What would be a quick and accurate way of initializing a complex array? Are there any differences in the following:
complex(real64) :: x(2,2)
x = 0
x = 0d0
x = (0d0,0d0)
Are they the same and can I assume I get the same result on every compiler? Does the standard say anything about it? The question is specifically about initializing to zero, not any other number.
Before coming to the real answer, a note about terminology. In Fortran what you have here is not initialization but assignment. These are two distinct things. Yes, it's clear what is meant here but it's perhaps good to become aware of the distinction.
The lines like x=0 are intrinsic assignment statements. To the left of = is the variable and to the right the expression. For each of these in the question there are two considerations:
the variable and expression may differ in type or type parameter;
the variable is an array and the expression a scalar.
As stated in a related answer, when the variable and expression differ in type or type parameter (but are type conformable as in this case) the expression is converted to the type and type parameter of the variable.
The first two assignments certainly involve type conversion: 0 is an integer and 0d0 is a (double precision) real. Although (0d0, 0d0) is a complex - so there is no type conversion - this will be of different kind unless double precision is the same as real(real64).
So, for at least the first two we have conversion like the equivalent
x = CMPLX(0, KIND=real64)
x = CMPLX(0d0, KIND=real64)
and for the third perhaps
x = CMPLX((0d0, 0d0), KIND=real64)
With
x = (0._real64, 0._real64)
we can be sure the kinds of the variable and the expression are the same.
As long as CMPLX(0, KIND=real64) has the same value as (0._real64, 0._real64) you can be sure that the assignment x=0 (for now, for scalar x) has the same effect as x=(0._real64, 0._real64). This is mandated by the Fortran standard.
Zero is somewhat of a special case in that it appears in every model number set, but whenever a (mathematical) number is used which can be exactly represented in all three the effect will be the same. Equally,
x = 3.14_real64
and
x = (3.14_real64, 0._real64)
are equivalent.
Perhaps worth noting, though, is that some compilers with some options may offer warnings about implicit conversion. That is one difference to be observed even with the numeric values being equivalent.
Coming to the array/scalar aspect: the expression is treated as though it is an array of the same shape as the variable with every element equal to that scalar value.
To conclude: each of those three assignments has the same Fortran-effect. After the assignment x is an array of given shape with each element having complex value (0._real64, 0._real64).
Now, that isn't to say that there won't be a possibility of something exciting happening at a low level, and that zero-setting may be a special case. But that's something between you and your compiler (and system).

Shift-left a 'double' operand

The following function shifts-left a double operand:
double shl(double x,unsigned long long n)
{
unsigned long long* p = (unsigned long long*)&x;
*p += n << 52;
return x;
}
Is this function guaranteed to work correctly on all platforms?
You may assume a valid combination of x and n (i.e., x*2^n does not overflow).
In other words, I am asking whether or not the language standard dictates the following:
Exponent bits of type double are 52,53,54,55,56,57,58,59,60,61,62
Bit-size of double equals bit-size of unsigned long long equals 64
Strict-aliasing rule for double and unsigned long long is not broken
If the answers for C and for C++ are different, then I would like to know each one of them.
No, neither C nor C++ dictates the representation of floating-point types in the language standard, although the IEEE format is recommended and a macro (__STDC_IEC_559__) is supposed to be available to detect whether it is in use.
Your solution has multiple problems in addition to different representations. You already spotted the strict aliasing violation... the optimizer might turn your whole function into a no-op, since no double is modified between the beginning of the function and the return value, it can assume x isn't changed. You could additionally have an overflow problem -- you would need some form of saturating arithmetic that doesn't allow the result to carry into the sign bit.
However, you don't need to mess with any of that, since the standard library already contains the function you're trying to write.
It is named ldexp (and ldexpf for float).
For C++:
No. C++ does not require IEEE floating point. It doesn't even require binary exponents.
Absolutely not. Unsigned long long may be more than 64 bits. Double need not be 64 bits.
That sort of type punning is unsafe.
This code definitely doesn't shift a double operand to the left. It does some kind of bit manipulation, probably in the hope that the exponent of a double number would be changed.
As it is, the code invokes undefined behaviour, because an lvalue is written using the type long long and then read using the type double. As a result, anything could happen. That's the most unportable code you could get.

Why can I cast int and BOOL to void*, but not float?

void* is a useful feature of C and derivative languages. For example, it's possible to use void* to store objective-C object pointers in a C++ class.
I was working on a type conversion framework recently and due to time constraints was a little lazy - so I used void*... That's how this question came up:
Why can I typecast int to void*, but not float to void* ?
BOOL is not a C++ type. It's probably typedef or defined somewhere, and in these cases, it would be the same as int. Windows, for example, has this in Windef.h:
typedef int BOOL;
so your question reduces to, why can you typecast int to void*, but not float to void*?
int to void* is ok but generally not recommended (and some compilers will warn about it) because they are inherently the same in representation. A pointer is basically an integer that points to an address in memory.
float to void* is not ok because the interpretation of the float value and the actual bits representing it are different. For example, if you do:
float x = 1.0;
what it does is it sets the 32 bit memory to 00 00 80 3f (the actual representation of the float value 1.0 in IEEE single precision). When you cast a float to a void*, the interpretation is ambiguous. Do you mean the pointer that points to location 1 in memory? or do you mean the pointer that points to location 3f800000 (assuming little endian) in memory?
Of course, if you are sure which of the two cases you want, there is always a way to get around the problem. For example:
void* u = (void*)((int)x); // first case
void* u = (void*)(((unsigned short*)(&x))[0] | (((unsigned int)((unsigned short*)(&x))[1]) << 16)); // second case
Pointers are usually represented internally by the machine as integers. C allows you to cast back and forth between pointer type and integer type. (A pointer value may be converted to an integer large enough to hold it, and back.)
Using void* to hold integer values in unconventional. It's not guaranteed by the language to work, but if you want to be sloppy and constrain yourself to Intel and other commonplace platforms, it will basically scrape by.
Effectively what you're doing is using void* as a generic container of however many bytes are used by the machine for pointers. This differs between 32-bit and 64-bit machines. So converting long long to void* would lose bits on a 32-bit platform.
As for floating-point numbers, the intention of (void*) 10.5f is ambiguous. Do you want to round 10.5 to an integer, then convert that to a nonsense pointer? No, you want the bit-pattern used by the FPU to be placed into a nonsense pointer. This can be accomplished by assigning float f = 10.5f; void *vp = * (uint32_t*) &f;, but be warned that this is just nonsense: pointers aren't generic storage for bits.
The best generic storage for bits is char arrays, by the way. The language standards guarantee that memory can be manipulated through char*. But you have to mind data alignment requirements.
Standard says that 752 An integer may be converted to any pointer type. Doesn't say anything about pointer-float conversion.
Considering any of you want you transfer float value as void *, there is a workaround using type punning.
Here is an example;
struct mfloat {
union {
float fvalue;
int ivalue;
};
};
void print_float(void *data)
{
struct mfloat mf;
mf.ivalue = (int)data;
printf("%.2f\n", mf.fvalue);
}
struct mfloat mf;
mf.fvalue = 1.99f;
print_float((void *)(mf.ivalue));
we have used union to cast our float value(fvalue) as an integer(ivalue) to void*, and vice versa
The question is based on a false premise, namely that void * is somehow a "generic" or "catch-all" type in C or C++. It is not. It is a generic object pointer type, meaning that it can safely store pointers to any type of data, but it cannot itself contain any type of data.
You could use a void * pointer to generically manipulate data of any type by allocating sufficient memory to hold an object of any given type, then using a void * pointer to point to it. In some cases you could also use a union, which is of course designed to be able to contain objects of multiple types.
Now, because pointers can be thought of as integers (and indeed, on conventionally-addressed architectures, typically are integers) it is possible and in some circles fashionable to stuff an integer into a pointer. Some library API's have even documented and supported this usage — one notable example was X Windows.
Conversions between pointers and integers are implementation-defined, and these days typically draw warnings, and so typically require an explicit cast, not so much to force the conversion as simply to silence the warning. For example, both the code fragments below print 77, but the first one probably draws compiler warnings.
/* fragment 1: */
int i = 77;
void *p = i;
int j = p;
printf("%d\n", j);
/* fragment 2: */
int i = 77;
void *p = (void *)(uintptr_t)i;
int j = (int)p;
printf("%d\n", j);
In both cases, we are not really using the void * pointer p as a pointer at all: we are merely using it as a vessel for some bits. This relies on the fact that on a conventionally-addressed architecture, the implementation-defined behavior of a pointer/integer conversion is the obvious one, which to an assembly-language programmer or an old-school C programmer doesn't seem like a "conversion" at all. And if you can stuff an int into a pointer, it's not surprising if you can stuff in other integral types, like bool, as well.
But what about trying to stuff a floating-point value into a pointer? That's considerably more problematic. Stuffing an integer value into a pointer, though implementation-defined, makes perfect sense if you're doing bare-metal programming: you're taking the numeric value of the integer, and using it as a memory address. But what would it mean to try to stuff a floating-point value into a pointer?
It's so meaningless that the C Standard doesn't even label it "undefined".
It's so meaningless that a typical compiler won't even attempt it.
And if you think about it, it's not even obvious what it should do.
Would you want to use the numeric value, or the bit pattern, as the thing to try to stuff into the pointer? Stuffing in the numeric value is closer to how floating-point-to-integer conversions work, but you'd lose your fractional part. Using the bit pattern is what you'd probably want, but accessing the bit pattern of a floating-point value is never something that C makes easy, as generations of programmers who have attempted things like
uint32_t hexval = (uint32_t)3.0;
have discovered.
Nevertheless, if you were bound and determined to store a floating-point value in a void * pointer, you could probably accomplish it, using sufficiently brute-force casts, although the results are probably both undefined and machine-dependent. (That is, I think there's a strict aliasing violation here, and if pointers are bigger than floats, as of course they are on a 64-bit architecture, I think this will probably only work if the architecture is little-endian.)
float f = 77.75;
void *p = (void *)(uintptr_t)*(uint32_t *)&f;
float f2 = *(float *)&p;
printf("%f\n", f2);
dmr help me, this actually does print 77.75 on my machine.

Double data type represented by int data type (low, high) struct in a union

Is the following a valid representation? I'm aware of byte order, this is a Windows environment. If I define Int32Double myVar; will myVar.int32.low always be the same if myVar.d is a computed value?
E.G: myVar.d = 0.4 * log(4); printf("%08X\n", myVar.int32.low);
union Int32Double
{
struct
{
int low;
int high;
} int32;
double d;
};
No, it's undefined behavior writing into d and reading from int32.
Firstly, the object representations of integral types and floating-point types are typically very different. Reinterpreting any part of double object as an int object will not usually produce any value that would resemble the original double value. The result will not be meaningful, unless you really know what you are doing. And if one does know what one's doing, one uses unsigned integral types for reinterpretation.
Secondly, using unions for memory reinterpretation is illegal in C++. It leads to undefined behavior. One of the latest technical corrigendums to C99 specification actually made it legal in C language (with implementation-defined behavior, of course, and as long as we don't attempt to access a trap representation). But AFAIK it is not in C++ yet. So, use at your own risk.
P.S. I'm not sure what you mean by your "will always be the same"...