signed double to unsigned byte: ARM64 versus Win64

signed double to unsigned byte: ARM64 versus Win64 - c++

For some legacy reasons I have code that casts a double to unsigned byte and we are seeing much difference between the two platforms. Short of doing - "don't try to stuff a signed value into unsigned value; don't try to stuff a double value in integer", is there something else that can be done?
unsigned char newR1 = -40.16;
Value of newR1 is 216 on windows (as we expected from a long time); but on ARM64 it is 0.
Disassembly on Win:
00007FF75E388818 cvttsd2si eax,mmword ptr [R]
00007FF75E38881D mov byte ptr [newR1],al
On ARM64
00007FF6F9E800DC ldr d16,[sp,#0x38 |#0x38 ]
00007FF6F9E800E0 fcvtzu w8,d16
00007FF6F9E800E4 uxtb w8,w8
00007FF6F9E800E8 strb w8,[sp,#0x43 |#0x43 ]
Will try these as well, but just wanted some other opinions
unsigned char newR1 = -40.16;
unsigned char newR2 = (int)-40.16;
unsigned char newR3 = (unsigned char)-40.16;
unsigned char newR4 = static_cast<int>(-40.16);
or may be
int i = -40.16;
unsigned char c = i;

What the C standard says (and there's similar text in the C++ one):
When a finite value of real floating type is converted to an integer
type other than _Bool, the fractional part is discarded (i.e., the
value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.
So, getting 216 out of -40.16 with a single cast from double to unsigned char is already UB. In fact, getting any result in this case is UB. Which is why the compiler is free to produce anything and not 216 that you desire.
You may want to do two casts:
(unsigned char)(int)-40.16
Again, the first cast (to int) is still subject to the above restriction I quoted.

Related

Value of char c = 255 converted to unsigned int in c++ [duplicate]

I am trying to print char as positive value:
char ch = 212;
printf("%u", ch);
but I get:
4294967252
How I can get 212 in the output?

Declare your ch as
unsigned char ch = 212 ;
And your printf will work.

This is because in this case the char type is signed on your system*. When this happens, the data gets sign-extended during the default conversions while passing the data to the function with variable number of arguments. Since 212 is greater than 0x80, it's treated as negative, %u interprets the number as a large positive number:
212 = 0xD4
When it is sign-extended, FFs are pre-pended to your number, so it becomes
0xFFFFFFD4 = 4294967252
which is the number that gets printed.
Note that this behavior is specific to your implementation. According to C99 specification, all char types are promoted to (signed) int, because an int can represent all values of a char, signed or unsigned:
6.1.1.2: If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.
This results in passing an int to a format specifier %u, which expects an unsigned int.
To avoid undefined behavior in your program, add explicit type casts as follows:
unsigned char ch = (unsigned char)212;
printf("%u", (unsigned int)ch);
* In general, the standard leaves the signedness of char up to the implementation. See this question for more details.

There are two bugs in this code. First, in most C implementations with signed char, there is a problem in char ch = 212 because 212 does not fit in an 8-bit signed char, and the C standard does not fully define the behavior (it requires the implementation to define the behavior). It should instead be:
unsigned char ch = 212;
Second, in printf("%u",ch), ch will be promoted to an int in normal C implementations. However, the %u specifier expects an unsigned int, and the C standard does not define behavior when the wrong type is passed. It should instead be:
printf("%hhu", ch);
(For %hhu, printf expects an unsigned char that has, in normal C implementations, been promoted to int.)

In case you cannot change the declaration for whatever reason, you can do:
char ch = 212;
printf("%d", (unsigned char) ch);

The range of char is 127 to -128. If you assign 212, ch stores -44 (212-128-128) not 212.So if you try to print a negative number as unsigned you get (MAX value of unsigned int)-abs(number) which in this case is 4294967252
So if you want to store 212 as it is in ch the only thing you can do is declare ch as
unsigned char ch;
now the range of ch is 0 to 255.

Because char is by default signed declared that means the range of the variable is
-127 to +127>
your value is overflowed. To get the desired value you have to declared the unsigned modifier. the modifier's (unsigned) range is:
0 to 255
to get the the range of any data type follow the process 2^bit example charis 8 bit length to get its range just 2 ^(power) 8.

Printf function formatter

Having following simple C++ code:
#include <stdio.h>
int main() {
char c1 = 130;
unsigned char c2 = 130;
printf("1: %+u\n", c1);
printf("2: %+u\n", c2);
printf("3: %+d\n", c1);
printf("4: %+d\n", c2);
...
return 0;
}
the output is like that:
1: 4294967170
2: 130
3: -126
4: +130
Can someone please explain me the line 1 and 3 results?
I'm using Linux gcc compiler with all default settings.

(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.)
char c1 = 130;
Here, 130 is outside the range of numbers representable by char. The value of c1 is implementation-defined. In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126).
In
printf("1: %+u\n", c1);
c1 is promoted to int, resulting in -126. Then, it is interpreted by the %u specifier as unsigned int. This is undefined behavior. This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170.
In
printf("3: %+d\n", c1);
The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?).

In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). On most systems char and unsigned char are smaller than int, so they promote to int when passed as variadic arguments. int doesn't match the format specifier %u which requires unsigned int.
On exotic systems (which your target is not) where unsigned char is as large as int, it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int.
Explanation for 3 depends a lot on implementation specified details. The result depends on whether char is signed or not, and it depends on the representable range.
If 130 was a representable value of char, such as when it is an unsigned type, then 130 would be the correct output. That appears to not be the case, so we can assume that char is a signed type on the target system.
Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value.
On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. -126 is congruent with 130 modulo 256 and is a representable value of char.

A char is 8 bits. This means it can represent 2^8=256 unique values. A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char. The compiler sees 130 as an integer and makes an implicit conversion from int to char. On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. Uh oh! You have this in memory now 1000 0010, which when interpreted as a signed char is -128+2. My linter on my platform screams about this . .
I make that important point about interpretation because in memory, both values are identical. You can confirm this by casting the value in the printf statements, i.e., printf("3: %+d\n", (unsigned char)c1);, and you'll see 130 again.
The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int, where the char has already overflowed. The machine interprets the char as -126 first, and then casts to unsigned int, which cannot represent that negative value, so you get the max value of the signed int and subtract 126.
2^32-126 = 4294967170 . . bingo
In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int. In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. Again, it flips how it interprets the most significant bit. There are 2 steps:
Signed char is promoted to signed int, because you want to work with ints. The char (is probably copied and) has 24 bits added. Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different.
The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion.
An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;, but you still get the same garbage based on the above logic (i.e. the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) 😉.

Is it possible to return an integer to the main function from a unsigned char data type function?

I have this unsigned char sumBinaryFigure function that calculates the sum of the digits of the binary representation of an unsigned long long number. When I call this function from the main function, for an unsigned long long it should return a integer(or another numeric data type) although the data type of the function is unsigned char. Is it possible? I tried a function overloading and it didn't work. If it sounds absurd, it's not my fault.
unsigned char sumBinaryFigure(unsigned long long number)
{
unsigned int S = 0;
while (number)
{
S += number % 2;
number /= 2;
}
return S;
}

When I call this function from the main function, for an unsigned long long it should return a integer although the data type of the function is unsigned char. Is it possible?
Yes. The question is not absurd, C types are just confusing. unsigned char and int both represent integers.
Your code is correct.
unsigned char is a 1-byte datatype. It can be used to represent a letter, or it can be used to represent a number.
The following statements are equivalent.
unsigned char ch = 'A';
unsigned char ch = 65;
Whether you use unsigned char as a character or integer, the machine does not care.
char does not necessarily contain a character. It also represents small numbers
The posted implementation of sumBinaryFigure returns a number in the range of 0-255, nothing wrong with that. Because a long long is almost certainly less than 256 bits, you don't need to worry about unsigned char not being large enough.
If I can suggest one change to your program in order to make it less confusing, change this line
unsigned int S = 0;
to this...
unsigned char S = 0;
Addendum
Just to be clear, consider the following code.
int main (void) {
char ch_num = 65; // ch_num is the byte 0100 0001
char ch_char = 'A'; // ch_char is the byte 0100 0001
printf ("%d\n", ch_num); // Prints 65
printf ("%d\n", ch_char); // Prints 65
printf ("%c\n", ch_num); // Prints A
printf ("%c\n", ch_char); // Prints A
}
A char is a byte. It's a sequence of bits with no meaning except what we impose on it.
That byte can be interpreted as either a number or character, but that decision is up to the programmer. The %c format specifier says "interpret this as a character. The %d format specifier says "interpret this as a number".
Whether it's an integer or character is decided by the output function, not the data type.

unsigned char can be converted to int without narrowing on all platforms that I can think of. You don't need to overload anything, just assign the result of the function to an int variable:
int popcnt = sumBinaryFigure(1023);
In fact, taking the function semantics into account, there's no way the result value will not fit into an int, which is guaranteed to be at least 16-bit, which means the minimal numeric_limits<int>::max() value is 32767. You'd have to have a datatype capable of storing over 32767 binary digits for this to be even remotely possible (int on most platforms is 32-bit)

Visual C++ generates DIV instead of IDIV (x86, integer arithmetic)

I'm working with Visual C++ 2008 here (9.x) and I was preparing a fixed point value when I ran into the compiler generating a DIV instead of an IDIV. I collapsed the code into a tiny piece to exactly reproduce:
short a = -255;
short divisor16 = 640; // unsigned, 16-bit
unsigned int divisor32 = 640; // unsigned, 32-bit
unsigned short s_divisor16 = 640; // signed, 16-bit
int s_divisor32 = 640; // signed, 32-bit
int16_t test1 = (a<<8)/divisor16; // == -102, generates IDIV -> OK
int16_t test2 = (a<<8)/s_divisor16; // == -102, generates IDIV -> OK
int16_t test3 = (a<<8)/divisor32; // == bogus, generates DIV -> FAIL!
int16_t test4 = (a<<8)/s_divisor32; // == -102, generates IDIV -> OK
int bitte_ein_breakpoint=1;
I won't bother you with the simple disassembly.
Now instead of taking the shortcut and just changing the divisor's type (it is a function parameter, unsigned int numPixels), I wonder what makes the compiler pick DIV over IDIV in the third (test3) case, since it does not do so with an unsigned 16-bit divisor and there really isn't anything that would call for unsigned arithmetic anyway. At least that's what I think and I hope I'm wrong :)

The code that is generated for the / operator depends on the operands.
First, the expression (a << 8) has type int, since the integer promotions are performed on each of the operands (ISO C99, 6.5.7p3), and then the operation is int << int, which results in an int.
Now there are four expressions:
int / short: the right hand side is promoted to int, therefore the idiv instruction.
int / unsigned short: the right hand side is promoted to int, therefore the idiv instruction.
int / unsigned int: the left hand side is promoted to unsigned int, therefore the div instruction.
int / int: nothing is promoted, therefore the idiv instruction is appropriate.
The integer promotions are defined in ISO C99 6.3.1.1p3:
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions..

Left-shifting a negative value results in undefined behaviour. So I'm not sure you can draw many conclusions from what the compiler chooses to do in this scenario.

Conversion from unsigned to signed type safety?

Is it safe to convert, say, from an unsigned char * to a signed char * (or just a char *?

The access is well-defined, you are allowed to access an object through a pointer to signed or unsigned type corresponding to the dynamic type of the object (3.10/15).
Additionally, signed char is guaranteed not to have any trap values and as such you can safely read through the signed char pointer no matter what the value of the original unsigned char object was.
You can, of course, expect that the values you read through one pointer will be different from the values you read through the other one.
Edit: regarding sellibitze's comment, this is what 3.9.1/1 says.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.9); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers.
So indeed it seems that signed char may have trap values. Nice catch!

The conversion should be safe, as all you're doing is converting from one type of character to another, which should have the same size. Just be aware of what sort of data your code is expecting when you dereference the pointer, as the numeric ranges of the two data types are different. (i.e. if your number pointed by the pointer was originally positive as unsigned, it might become a negative number once the pointer is converted to a signed char* and you dereference it.)

Casting changes the type, but does not affect the bit representation. Casting from unsigned char to signed char does not change the value at all, but it affects the meaning of the value.
Here is an example:
#include <stdio.h>
int main(int args, char** argv) {
/* example 1 */
unsigned char a_unsigned_char = 192;
signed char b_signed_char = b_unsigned_char;
printf("%d, %d\n", a_signed_char, a_unsigned_char); //192, -64
/* example 2 */
unsigned char b_unsigned_char = 32;
signed char a_signed_char = a_unsigned_char;
printf("%d, %d\n", b_signed_char, b_unsigned_char); //32, 32
return 0;
}
In the first example, you have an unsigned char with value 192, or 110000000 in binary. After the cast to signed char, the value is still 110000000, but that happens to be the 2s-complement representation of -64. Signed values are stored in 2s-complement representation.
In the second example, our unsigned initial value (32) is less than 128, so it seems unaffected by the cast. The binary representation is 00100000, which is still 32 in 2s-complement representation.
To "safely" cast from unsigned char to signed char, ensure the value is less than 128.

It depends on how you are going to use the pointer. You are just converting the pointer type.

You can safely convert an unsigned char* to a char * as the function you are calling will be expecting the behavior from a char pointer, but, if your char value goes over 127 then you will get a result that will not be what you expected, so just make certain that what you have in your unsigned array is valid for a signed array.

I've seen it go wrong in a few ways, converting to a signed char from an unsigned char.
One, if you're using it as an index to an array, that index could go negative.
Secondly, if inputted to a switch statement, it may result in a negative input which often is something the switch isn't expecting.
Third, it has different behavior on an arithmetic right shift
int x = ...;
char c = 128
unsigned char u = 128
c >> x;
has a different result than
u >> x;
Because the former is sign-extended and the latter isn't.
Fourth, a signed character causes underflow at a different point than an unsigned character.
So a common overflow check,
(c + x > c)
could return a different result than
(u + x > u)

Safe if you are dealing with only ASCII data.

I'm astonished it hasn't been mentioned yet: Boost numeric cast should do the trick - but only for the data of course.
Pointers are always pointers. By casting them to a different type, you only change the way the compiler interprets the data pointed to.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js