uint32_t a = 0xFF << 8;
uint32_t b = 0xFF;
uint32_t c = b << 8;
I'm compiling for the Uno (1.0.x and 1.5) and it would seem obvious that a and c should be the same value, but they are not... at least not when running on the target. I compile the same code on the host and have no issues.
Right shift works fine, left shift only works when I'm shifting a variable versus a constant.
Can anyone confirm this?
I'm using Visual Micro with VS2013. Compiling with either 1.0.x or 1.5 Arduino results in the same failure.
EDIT:
On the target:
A = 0xFFFFFF00
C = 0x0000FF00
The problem is related to the signed/unsigned implicit cast.
With uint32_t a = 0xFF << 8; you mean
0xFF is declared; it is a signed char;
There is a << operation, so that variable is converted to int. Since it was a signed char (and so its value was -1) it is padded with 1, to preserve the sign. So the variable is 0xFFFFFFFF;
it is shifted, so a = 0xFFFFFF00.
NOTE: this is slightly wrong, see below for the "more correct" version
If you want to reproduce the same behaviour, try this code:
uint32_t a = 0xFF << 8;
uint32_t b = (signed char)0xFF;
uint32_t c = b << 8;
Serial.println(a, HEX);
Serial.println(b, HEX);
Serial.println(c, HEX);
The result is
FFFFFF00
FFFFFFFF
FFFFFF00
Or, in the other way, if you write
uint32_t a = (unsigned)0xFF << 8;
you get that a = 0x0000FF00.
There are just two weird things with the compiler:
uint32_t a = (unsigned char)0xFF << 8; returns a = 0xFFFFFF00
uint32_t a = 0x000000FF << 8; returns a = 0xFFFFFF00 too.
Maybe it's a wrong cast in the compiler....
EDIT:
As phuclv pointed out, the above explanation is slightly wrong. The correct explanation is that, with uint32_t a = 0xFF << 8;, the compiler does this operations:
0xFF is declared; it is an int;
There is a << operation, and thus this becomes 0xFF00; it was an int, so it is negative
it is then promoted to uint32_t. Since it was negative, 1s are prepended, resulting in a 0xFFFFFF00
The difference with the above explanation is that if you write uint32_t a = 0xFF << 7; you get 0x7F80 rather than 0xFFFFFF80.
This also explains the two "weird" things I wrote in the end of the previous answer.
For reference, in the thread linked in the comment there are some more explanations on how the compiler interpretes literals. Particularly in this answer there is a table with the types the compiler assigns to the literals. In this case (no suffix, hexadecimal value) the compiler assigns this type, according to what is the smallest type that fits the value:
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
This leads to some more considerations:
uint32_t a = 0x7FFF << 8; this means that the literal is interpreted as a signed integer; the promotion to the bigger integer extends the sign, and so the result is 0xFFFFFF00
uint32_t b = 0xFFFF << 8; the literal in this case is interpreted as an unsigned integer. The result of the promotion to the 32-bit integer is therefore 0x0000FF00
The most important thing here is that in Arduino int is a 16-bit type. That'll explain everything
For uint32_t a = 0xFF << 8: 0xFF is of type int1. 0xFF << 8 results in 0xFF00 which is a signed negative value in 16-bit int2. When assigning the int value to a uint32_t variable again it'll be sign-extended 3 when upcasting, thus the result becomes 0xFFFFFF00U
For the following lines
uint32_t b = 0xFF;
uint32_t c = b << 8;
0xFF is positive in 16-bit int, therefore b also contains 0xFF. Then shifting it left 8 bits results in 0x0000FF00, because b << 8 is an uint32_t expression. It's wider than int so there's no promotion to int happening here
Similarly with uint32_t a = (unsigned)0xFF << 8 the output is 0x0000FF00 because the positive 0xFF when converted to unsigned int is still positive. Upcasting unsigned int to uint32_t does a zero extension, but the sign bit is already zero so even if you do int32_t b = 0xFF; uint32_t c = b << 8 the high bits are still zero. Same to the "weird" uint32_t a = 0x000000FF << 8. Instead of (unsigned)0xFF you can just use the exact equivalent version (but shorter) 0xFFU
OTOH if you declare b as uint8_t b = 0xFF or int8_t b = 0xFF then things will be different, integer promotion occurs and the result will be similar to the first line (0xFFFFFF00U). And if you cast 0xFF to signed char like this
uint32_t b = (signed char)0xFF;
uint32_t c = b << 8;
then upon promoting to int it'll be sign-extended to 0xFFFF. Similarly casting it to int32_t or uint32_t will result in a sign-extension from signed char to the 32-bit wide value 0xFFFFFFFF
If you cast to unsigned char like in uint32_t a = (unsigned char)0xFF << 8; instead then the (unsigned char)0xFF will be promoted to int using zero extension4, therefore the result will be exactly the same as uint32_t a = 0xFF << 8;
In summary: When in doubt, consult the standard. The compiler rarely lies to you
1 Type of integer literals not int by default?
The type of an integer constant is the first of the corresponding list in which its value can be represented.
Suffix Decimal Constant Octal or Hexadecimal Constant
-------------------------------------------------------------------
none int int
long int unsigned int
long long int long int
unsigned long int
long long int
unsigned long long int
2 Strictly speaking shifting into sign bit like that is undefined behavior
1 << 31 produces the error, "The result of the '<<' expression is undefined"
Defining (1 << 31) or using 0x80000000? Result is different
3 The rule is to add UINT_MAX + 1
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Signed to unsigned conversion in C - is it always safe?
4A cast will always preserve the input value if the value fits in the target type, so casting a signed type to a wider signed type will be done by a sign-extension, and casting an unsigned type to a wider type will be done by a zero-extension
[Credit goes to Mats Petersson]
Using a cast operator to force the compiler to treat the 0xFF as a uint32_t addresses the issue. Seems like the Arduino xcompiler treats constants a little differently since I've never had cast before a shift.
Thanks!
Related
I need to read binary data which contain a column of numbers (time tags) and use 8bytes to record each number. I know that they are recorded in little endian order. If read correctly they should be decoded as (example)
...
2147426467
2147426635
2147512936
...
I recognize that the above numbers are on the 2^31 -1 threshold.
I try to read the data and invert the endiandness with:
(length is the total number of bytes and buffer is pointer to an array that contains the bytes)
unsigned long int tag;
//uint64_t tag;
for (int j=0; j<length; j=j+8) //read the whole file in 8-byte blocks
{ tag = 0;
for (int i=0; i<=7; i++) //read each block ,byte by byte
{tag ^= ((unsigned char)buffer[j+i])<<8*i ;} //shift each byte to invert endiandness and add them with ^=
}
}
when run, the code gives:
...
2147426467
2147426635
18446744071562097256
similar big numbers
...
The last number is not (2^64 - 1 - correct value).
Same result using uint64_t tag.
The code succeeds with declaring tag as
unsigned int tag;
but fails for tags greater than 2^32 -1. At least this makes sense.
I suppose I need some kind of casting on buffer[i+j] but I don't know how to do it.
(static_cast<uint64_t>(buffer[j+i]))
also doesn't work.
I read a similar question but still need some help.
We assume that buffer[j+i] is a char, and that chars are signed on your platform. Casting to unsigned char converts buffer[j+i] into an unsigned type. However, when applying the << operator, the unsigned char value gets promoted to int so long as an int can hold all values representable by unsigned char.
Your attempt to cast buffer[j+i] directly to uint64_t fails because if char is signed, the sign extension is still applied before the value is converted to the unsigned type.
A double cast may work (that is, cast to unsigned char and then to unsigned long), but using an unsigned long variable to hold the intermediate value should make the intention of the code more clear. For me, the code would look like:
decltype(tag) val = static_cast<unsigned char>(buffer[j+i]);
tag ^= val << 8*i;
You use a temporary value.
The computer will automatically reserve the least amount needed to store a temporary value. In your case that would be the 32 bits.
Once you shift the byte further than 32 bits it will be shifted into oblivion.
In order to fix this you need to explicitly store the value in a 64 bit integer first.
So instead of
{tag ^= ((unsigned char)buffer[j+i])<<8*i ;}
you should use something like this
{
unsigned long long tmp = (unsigned char)buffer[j+i];
tmp <<= 8*i;
tag ^= tmp;
}
I am developing C++ libraries for the Arduino 2560 Mega and I have come across an interesting bug.
uint8_t resolution = 15;
uint32_t numDiscreteLevels = (1 << resolution); //yields a value of 0xFFFF8000
uint32_t numDiscreteLevels = ((uint32_t)1 << resolution); //yields 0x8000 (correct value)
It seems that in the first line, signed bits are padded onto the value before being assigned to the variable. According to promotion rules I believe that the 1 should be cast to an unsigned integer. But even then, I thought signed padding only occurs when you shift left.
On the AVR architecture, an int is 16 bits -- not 32! This means that all numbers, including integer constants, are treated as a int16_t unless otherwise specified.
This means that 1 << 8 is (int16_t) 0x8000, not (int32_t) 0x00008000 as it would be on a 32-bit platform. Since this is a signed value and it has its high bit set, it's negative (specifically, -32768), and sign-extending it to a uint32_t gives 0xffff8000.
You could provide the mask value as an unsigned directly to see how that affects the behavior, which should be as expected.:
uint8_t resolution = 15;
uint32_t numDiscreteLevels = 1u << resolution;
1u << 15 is 0x8000u whereas 1 << 15 as a 16-bit value is -32767.
I have 3 unsigned bytes that are coming over the wire separately.
[byte1, byte2, byte3]
I need to convert these to a signed 32-bit value but I am not quite sure how to handle the sign of the negative values.
I thought of copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right but I read this may have unexpected behavior.
Is there an easier way to handle this?
The representation is using two's complement.
You could use:
uint32_t sign_extend_24_32(uint32_t x) {
const int bits = 24;
uint32_t m = 1u << (bits - 1);
return (x ^ m) - m;
}
This works because:
if the old sign was 1, then the XOR makes it zero and the subtraction will set it and borrow through all higher bits, setting them as well.
if the old sign was 0, the XOR will set it, the subtract resets it again and doesn't borrow so the upper bits stay 0.
Templated version
template<class T>
T sign_extend(T x, const int bits) {
T m = 1;
m <<= bits - 1;
return (x ^ m) - m;
}
Assuming both representations are two's complement, simply
upper_byte = (Signed_byte(incoming_msb) >= 0? 0 : Byte(-1));
where
using Signed_byte = signed char;
using Byte = unsigned char;
and upper_byte is a variable representing the missing fourth byte.
The conversion to Signed_byte is formally implementation-dependent, but a two's complement implementation doesn't have a choice, really.
You could let the compiler process itself the sign extension. Assuming that the lowest significant byte is byte1 and the high significant byte is byte3;
int val = (signed char) byte3; // C guarantees the sign extension
val << 16; // shift the byte at its definitive place
val |= ((int) (unsigned char) byte2) << 8; // place the second byte
val |= ((int) (unsigned char) byte1; // and the least significant one
I have used C style cast here when static_cast would have been more C++ish, but as an old dinosaur (and Java programmer) I find C style cast more readable for integer conversions.
This is a pretty old question, but I recently had to do the same (while dealing with 24-bit audio samples), and wrote my own solution for it. It's using a similar principle as this answer, but more generic, and potentially generates better code after compiling.
template <size_t Bits, typename T>
inline constexpr T sign_extend(const T& v) noexcept {
static_assert(std::is_integral<T>::value, "T is not integral");
static_assert((sizeof(T) * 8u) >= Bits, "T is smaller than the specified width");
if constexpr ((sizeof(T) * 8u) == Bits) return v;
else {
using S = struct { signed Val : Bits; };
return reinterpret_cast<const S*>(&v)->Val;
}
}
This has no hard-coded math, it simply lets the compiler do the work and figure out the best way to sign-extend the number. With certain widths, this can even generate a native sign-extension instruction in the assembly, such as MOVSX on x86.
This function assumes you copied your N-bit number into the lower N bits of the type you want to extend it to. So for example:
int16_t a = -42;
int32_t b{};
memcpy(&b, &a, sizeof(a));
b = sign_extend<16>(b);
Of course it works for any number of bits, extending it to the full width of the type that contained the data.
Here's a method that works for any bit count, even if it's not a multiple of 8. This assumes you've already assembled the 3 bytes into an integer value.
const int bits = 24;
int mask = (1 << bits) - 1;
bool is_negative = (value & ~(mask >> 1)) != 0;
value |= -is_negative & ~mask;
You can use a bitfield
template<size_t L>
inline int32_t sign_extend_to_32(const char *x)
{
struct {int32_t i: L;} s;
memcpy(&s, x, 3);
return s.i;
// or
return s.i = (x[2] << 16) | (x[1] << 8) | x[0]; // assume little endian
}
Easy and no undefined behavior invoked
int32_t r = sign_extend_to_32<24>(your_3byte_array);
Of course copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right as you thought is also a good idea. There's no undefined behavior if you use memcpy like above. An alternative is reinterpret_cast in C++ and union in C, which can avoid the use of memcpy. However there's an implementation defined behavior because right shift is not always a sign-extension shift (although almost all modern compilers do that)
Assuming your 24bit value is stored in variable int32_t val, you can easily extend the sign by following:
val = (val << 8) >> 8;
I'm trying to convert a byte array to integer:
QByteArray b = QByteArray::fromHex("00008000");
quint32 result = b[3];
result += b[2] << 8;
result += b[1] << 16;
result += b[0] << 24;
but I'm getting 4294934528 instead of 32768. What is the problem here?
QByteArray is an array of chars. Apparently, char on your platform is signed and 8-bit wide. Thus, your problem can be distilled to:
char c = 0x80;
quint32 = c << 8;
The standard mandates that:
N4606 ยง 4.8 [conv.integral] / 3
If the destination type is signed, the value is unchanged if it can be
represented in the destination type; otherwise, the value is
implementation-defined.
In this case (as usual on 2's complement systems), 0x80 is mapped to std::numeric_limits<char>::min() == -128, which is only logical because they share the same underlying bit pattern.
Now, -128 << 8 is defined as -128 * 28, which is -32768.
Finally, conversion from -32768 to 32 bit unsigned integer is well defined and yields 4294934528
I need to combine two signed 8 Bit _int8 values to a signed short (16 Bit) value. It is important that the sign is not lost.
My code is:
unsigned short lsb = -13;
unsigned short msb = 1;
short combined = (msb << 8 )| lsb;
The result I get is -13. However, I expect it to be 499.
For the following examples, I get the correct results with the same code:
msb = -1; lsb = -6; combined = -6;
msb = 1; lsb = 89; combined = 345;
msb = -1; lsb = 13; combined = -243;
However, msb = 1; lsb = -84; combined = -84; where I would expect 428.
It seems that if the lsb is negative and the msb is positive, something goes wrong!
What is wrong with my code? How does the computer get to these unexpected results (Win7, 64 Bit and VS2008 C++)?
Your lsb in this case contains 0xfff3. When you OR it with 1 << 8 nothing changes because there is already a 1 in that bit position.
Try short combined = (msb << 8 ) | (lsb & 0xff);
Or using a union:
#include <iostream>
union Combine
{
short target;
char dest[ sizeof( short ) ];
};
int main()
{
Combine cc;
cc.dest[0] = -13, cc.dest[1] = 1;
std::cout << cc.target << std::endl;
}
It is possible that lsb is being automatically sign-extended to 16 bits. I notice you only have a problem when it is negative and msb is positive, and that is what you would expect to happen given the way you're using the or operator. Although, you're clearly doing something very strange here. What are you actually trying to do here?
Raisonanse C complier for STM8 (and, possibly, many other compilers) generates ugly code for classic C code when writing 16-bit variables into 8-bit hardware registers.
Note - STM8 is big-endian, for little-endian CPUs code must be slightly modified. Read/Write byte order is important too.
So, standard C code piece:
unsigned int ch1Sum;
...
TIM5_CCR1H = ch1Sum >> 8;
TIM5_CCR1L = ch1Sum;
Is being compiled to:
;TIM5_CCR1H = ch1Sum >> 8;
LDW X,ch1Sum
CLR A
RRWA X,A
LD A,XL
LD TIM5_CCR1,A
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Too long, too slow.
My version:
unsigned int ch1Sum;
...
TIM5_CCR1H = ((u8*)&ch1Sum)[0];
TIM5_CCR1L = ch1Sum;
That is compiled into adequate two MOVes
;TIM5_CCR1H = ((u8*)&ch1Sum)[0];
MOV TIM5_CCR1,ch1Sum
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Opposite direction:
unsigned int uSonicRange;
...
((unsigned char *)&uSonicRange)[0] = TIM1_CCR2H;
((unsigned char *)&uSonicRange)[1] = TIM1_CCR2L;
instead of
unsigned int uSonicRange;
...
uSonicRange = TIM1_CCR2H << 8;
uSonicRange |= TIM1_CCR2L;
Some things you should know about the datatypes (un)signed short and char:
char is an 8-bit value, thats what you where looking for for lsb and msb. short is 16 bits in length.
You should also not store signed values in unsigned ones execpt you know what you are doing.
You can take a look at the two's complement. It describes the representation of negative values (for integers, not for floating-point values) in C/C++ and many other programming languages.
There are multiple versions of making your own two's complement:
int a;
// setting a
a = -a; // Clean version. Easier to understand and read. Use this one.
a = (~a)+1; // The arithmetical version. Does the same, but takes more steps.
// Don't use the last one unless you need it!
// It can be 'optimized away' by the compiler.
stdint.h (with inttypes.h) is more for the purpose of having exact lengths for your variable. If you really need a variable to have a specific byte-length you should use that (here you need it).
You should everythime use datatypes which fit your needs the best. Your code should therefore look like this:
signed char lsb; // signed 8-bit value
signed char msb; // signed 8-bit value
signed short combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
or like this:
#include <stdint.h>
int8_t lsb; // signed 8-bit value
int8_t msb; // signed 8-bit value
int_16_t combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
For the last one the compiler will use signed 8/16-bit values everytime regardless what length int has on your platform. Wikipedia got some nice explanation of the int8_t and int16_t datatypes (and all the other datatypes).
btw: cppreference.com is useful for looking up the ANSI C standards and other things that are worth to know about C/C++.
You wrote, that you need to combine two 8-bit values. Why you're using unsigned short then?
As Dan already said, lsb automatically extended to 16 bits. Try the following code:
uint8_t lsb = -13;
uint8_t msb = 1;
int16_t combined = (msb << 8) | lsb;
This gives you the expected result: 499.
If this is what you want:
msb: 1, lsb: -13, combined: 499
msb: -6, lsb: -1, combined: -1281
msb: 1, lsb: 89, combined: 345
msb: -1, lsb: 13, combined: -243
msb: 1, lsb: -84, combined: 428
Use this:
short combine(unsigned char msb, unsigned char lsb) {
return (msb<<8u)|lsb;
}
I don't understand why you would want msb -6 and lsb -1 to generate -6 though.