Mini questions on 'unsigned char' and 'shift' operation? - c++

The question should be basic but i am surprised that i had some trouble get it now. First one is when i glanced 'C++ primer' book chapter 5.3. The Bitwise Operators, when author use below code as an example to explain shift operation:
unsigned char bits = 1; // '10011011' is the corresponding bit pattern
bits << 1; // left shift
My head spin a little when i looked at this, where is '10011011' coming from? '1' is not '0x01'?
Another question comes from http://c-faq.com/strangeprob/ptralign.html, where author try to unpack structure:
struct mystruct {
char c;
long int i32;
int i16;
} s;
using
unsigned char *p = buf;
s.c = *p++;
s.i32 = (long)*p++ << 24;
s.i32 |= (long)*p++ << 16;
s.i32 |= (unsigned)(*p++ << 8); // this line !
s.i32 |= *p++;
s.i16 = *p++ << 8;
s.i16 |= *p++;
My question is, p is a pointer to unsigned char(which is 8 bits), right? when building higher bytes of s.i32(24~31, 16~23), *p++ is converted to 'long'(32bits) before doing left shift, so left shift would not lose bit in *p++, but in
s.i32 |= (unsigned)(*p++ << 8);
*p++ is shifted first, then convert to unsigned int, wouldn't the bits of *p++ all lost during the shift?
Again i realize i maybe missing some of the basics in C here. Hope someone can give a hand here.
Thanks,

To answer your second question, performing any arithmetic operation on a char promotes it to a (possibly unsigned) int (including shifting and other bitwise operations), so the integer-size value is shifted, not the 8-bit char value.

Yes the bit pattern of 1 is 0x01. 10011011 is the bit pattern for decimal 155.
As for the shifts, you're missing something called the "integral promotions", which are performed in arithmetic and binary operations like shifts. In this case, there's an unsigned char operand (*p++) and an int operand (the constant), so the unsigned char operand is converted to int. The cast (whether to long or to unsigned int) is always performed after the promotion and the shift are performed. So no, in none of the shifts are all of the bits lost.

Related

Signed extension from 24 bit to 32 bit in C++

I have 3 unsigned bytes that are coming over the wire separately.
[byte1, byte2, byte3]
I need to convert these to a signed 32-bit value but I am not quite sure how to handle the sign of the negative values.
I thought of copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right but I read this may have unexpected behavior.
Is there an easier way to handle this?
The representation is using two's complement.
You could use:
uint32_t sign_extend_24_32(uint32_t x) {
const int bits = 24;
uint32_t m = 1u << (bits - 1);
return (x ^ m) - m;
}
This works because:
if the old sign was 1, then the XOR makes it zero and the subtraction will set it and borrow through all higher bits, setting them as well.
if the old sign was 0, the XOR will set it, the subtract resets it again and doesn't borrow so the upper bits stay 0.
Templated version
template<class T>
T sign_extend(T x, const int bits) {
T m = 1;
m <<= bits - 1;
return (x ^ m) - m;
}
Assuming both representations are two's complement, simply
upper_byte = (Signed_byte(incoming_msb) >= 0? 0 : Byte(-1));
where
using Signed_byte = signed char;
using Byte = unsigned char;
and upper_byte is a variable representing the missing fourth byte.
The conversion to Signed_byte is formally implementation-dependent, but a two's complement implementation doesn't have a choice, really.
You could let the compiler process itself the sign extension. Assuming that the lowest significant byte is byte1 and the high significant byte is byte3;
int val = (signed char) byte3; // C guarantees the sign extension
val << 16; // shift the byte at its definitive place
val |= ((int) (unsigned char) byte2) << 8; // place the second byte
val |= ((int) (unsigned char) byte1; // and the least significant one
I have used C style cast here when static_cast would have been more C++ish, but as an old dinosaur (and Java programmer) I find C style cast more readable for integer conversions.
This is a pretty old question, but I recently had to do the same (while dealing with 24-bit audio samples), and wrote my own solution for it. It's using a similar principle as this answer, but more generic, and potentially generates better code after compiling.
template <size_t Bits, typename T>
inline constexpr T sign_extend(const T& v) noexcept {
static_assert(std::is_integral<T>::value, "T is not integral");
static_assert((sizeof(T) * 8u) >= Bits, "T is smaller than the specified width");
if constexpr ((sizeof(T) * 8u) == Bits) return v;
else {
using S = struct { signed Val : Bits; };
return reinterpret_cast<const S*>(&v)->Val;
}
}
This has no hard-coded math, it simply lets the compiler do the work and figure out the best way to sign-extend the number. With certain widths, this can even generate a native sign-extension instruction in the assembly, such as MOVSX on x86.
This function assumes you copied your N-bit number into the lower N bits of the type you want to extend it to. So for example:
int16_t a = -42;
int32_t b{};
memcpy(&b, &a, sizeof(a));
b = sign_extend<16>(b);
Of course it works for any number of bits, extending it to the full width of the type that contained the data.
Here's a method that works for any bit count, even if it's not a multiple of 8. This assumes you've already assembled the 3 bytes into an integer value.
const int bits = 24;
int mask = (1 << bits) - 1;
bool is_negative = (value & ~(mask >> 1)) != 0;
value |= -is_negative & ~mask;
You can use a bitfield
template<size_t L>
inline int32_t sign_extend_to_32(const char *x)
{
struct {int32_t i: L;} s;
memcpy(&s, x, 3);
return s.i;
// or
return s.i = (x[2] << 16) | (x[1] << 8) | x[0]; // assume little endian
}
Easy and no undefined behavior invoked
int32_t r = sign_extend_to_32<24>(your_3byte_array);
Of course copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right as you thought is also a good idea. There's no undefined behavior if you use memcpy like above. An alternative is reinterpret_cast in C++ and union in C, which can avoid the use of memcpy. However there's an implementation defined behavior because right shift is not always a sign-extension shift (although almost all modern compilers do that)
Assuming your 24bit value is stored in variable int32_t val, you can easily extend the sign by following:
val = (val << 8) >> 8;

Bitwise or before casting to int32

I was messing about with arrays and noticed this. EG:
int32_t array[];
int16_t value = -4000;
When I tried to write the value into the top and bottom half of the int32 array value,
array[0] = (value << 16) | value;
the compiler would cast the value into a 32 bit value first before the doing the bit shift and the bitwise OR. Thus, instead of 16 bit -4000 being written in the top and bottom halves, the top value will be -1 and the bottom will be -4000.
Is there a way to OR in the 16 bit value of -4000 so both halves are -4000? It's not really a huge problem. I am just curious to know if it can be done.
Sure thing, just undo the sign-extension:
array[0] = (value << 16) | (value & 0xFFFF);
Don't worry, the compiler should handle this reasonably.
To avoid shifting a negative number:
array[0] = ((value & 0xFFFF) << 16) | (value & 0xFFFF);
Fortunately that extra useless & (even more of a NOP than the one on the right) doesn't show up in the code.
Left shift on signed types is defined only in some cases. From standard
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
According to this definition, it seems what you have is undefined behaviour.
use unsigned value:
const uint16_t uvalue = value
array[0] = (uvalue << 16) | uvalue;
Normally when faced with this kind of issue I first set the resultant value to zero, then bitwise assign the values in.
So the code would be:
int32_t array[1];
int16_t value = -4000;
array[0] = 0x0000FFFF;
array[0] &= value;
array[0] |= (value << 16);
Cast the 16 too. Otherwise the int type is contagious.
array[0] = (value << (int16_t 16)) | value;
Edit: I don't have a compiler handy to test. Per comment below, this may not be right, but will get you in the right direction.

Bits shifted by bit shifting operators(<<, >>) in C, C++

can we access the bits shifted by bit shifting operators(<<, >>) in C, C++?
For example:
23>>1
can we access the last bit shifted(1 in this case)?
No, the shift operators only give the value after shifting. You'll need to do other bitwise operations to extract the bits that are shifted out of the value; for example:
unsigned all_lost = value & ((1 << shift)-1); // all bits to be removed by shift
unsigned last_lost = (value >> (shift-1)) & 1; // last bit to be removed by shift
unsigned remaining = value >> shift; // lose those bits
By using 23>>1, the bit 0x01 is purged - you have no way of retrieving it after the bit shift.
That said, nothing's stopping you from checking for the bit before shifting:
int value = 23;
bool bit1 = value & 0x01;
int shifted = value >> 1;
You can access the bits before shifting, e.g.
value = 23; // start with some value
lsbits = value & 1; // extract the LSB
value >>= 1; // shift
It worth signal that on MSVC compiler an intrinsic function exists: _bittest
that speeds up the operation.

Arduino left shift not working as expected, compiler bug?

uint32_t a = 0xFF << 8;
uint32_t b = 0xFF;
uint32_t c = b << 8;
I'm compiling for the Uno (1.0.x and 1.5) and it would seem obvious that a and c should be the same value, but they are not... at least not when running on the target. I compile the same code on the host and have no issues.
Right shift works fine, left shift only works when I'm shifting a variable versus a constant.
Can anyone confirm this?
I'm using Visual Micro with VS2013. Compiling with either 1.0.x or 1.5 Arduino results in the same failure.
EDIT:
On the target:
A = 0xFFFFFF00
C = 0x0000FF00
The problem is related to the signed/unsigned implicit cast.
With uint32_t a = 0xFF << 8; you mean
0xFF is declared; it is a signed char;
There is a << operation, so that variable is converted to int. Since it was a signed char (and so its value was -1) it is padded with 1, to preserve the sign. So the variable is 0xFFFFFFFF;
it is shifted, so a = 0xFFFFFF00.
NOTE: this is slightly wrong, see below for the "more correct" version
If you want to reproduce the same behaviour, try this code:
uint32_t a = 0xFF << 8;
uint32_t b = (signed char)0xFF;
uint32_t c = b << 8;
Serial.println(a, HEX);
Serial.println(b, HEX);
Serial.println(c, HEX);
The result is
FFFFFF00
FFFFFFFF
FFFFFF00
Or, in the other way, if you write
uint32_t a = (unsigned)0xFF << 8;
you get that a = 0x0000FF00.
There are just two weird things with the compiler:
uint32_t a = (unsigned char)0xFF << 8; returns a = 0xFFFFFF00
uint32_t a = 0x000000FF << 8; returns a = 0xFFFFFF00 too.
Maybe it's a wrong cast in the compiler....
EDIT:
As phuclv pointed out, the above explanation is slightly wrong. The correct explanation is that, with uint32_t a = 0xFF << 8;, the compiler does this operations:
0xFF is declared; it is an int;
There is a << operation, and thus this becomes 0xFF00; it was an int, so it is negative
it is then promoted to uint32_t. Since it was negative, 1s are prepended, resulting in a 0xFFFFFF00
The difference with the above explanation is that if you write uint32_t a = 0xFF << 7; you get 0x7F80 rather than 0xFFFFFF80.
This also explains the two "weird" things I wrote in the end of the previous answer.
For reference, in the thread linked in the comment there are some more explanations on how the compiler interpretes literals. Particularly in this answer there is a table with the types the compiler assigns to the literals. In this case (no suffix, hexadecimal value) the compiler assigns this type, according to what is the smallest type that fits the value:
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
This leads to some more considerations:
uint32_t a = 0x7FFF << 8; this means that the literal is interpreted as a signed integer; the promotion to the bigger integer extends the sign, and so the result is 0xFFFFFF00
uint32_t b = 0xFFFF << 8; the literal in this case is interpreted as an unsigned integer. The result of the promotion to the 32-bit integer is therefore 0x0000FF00
The most important thing here is that in Arduino int is a 16-bit type. That'll explain everything
For uint32_t a = 0xFF << 8: 0xFF is of type int1. 0xFF << 8 results in 0xFF00 which is a signed negative value in 16-bit int2. When assigning the int value to a uint32_t variable again it'll be sign-extended 3 when upcasting, thus the result becomes 0xFFFFFF00U
For the following lines
uint32_t b = 0xFF;
uint32_t c = b << 8;
0xFF is positive in 16-bit int, therefore b also contains 0xFF. Then shifting it left 8 bits results in 0x0000FF00, because b << 8 is an uint32_t expression. It's wider than int so there's no promotion to int happening here
Similarly with uint32_t a = (unsigned)0xFF << 8 the output is 0x0000FF00 because the positive 0xFF when converted to unsigned int is still positive. Upcasting unsigned int to uint32_t does a zero extension, but the sign bit is already zero so even if you do int32_t b = 0xFF; uint32_t c = b << 8 the high bits are still zero. Same to the "weird" uint32_t a = 0x000000FF << 8. Instead of (unsigned)0xFF you can just use the exact equivalent version (but shorter) 0xFFU
OTOH if you declare b as uint8_t b = 0xFF or int8_t b = 0xFF then things will be different, integer promotion occurs and the result will be similar to the first line (0xFFFFFF00U). And if you cast 0xFF to signed char like this
uint32_t b = (signed char)0xFF;
uint32_t c = b << 8;
then upon promoting to int it'll be sign-extended to 0xFFFF. Similarly casting it to int32_t or uint32_t will result in a sign-extension from signed char to the 32-bit wide value 0xFFFFFFFF
If you cast to unsigned char like in uint32_t a = (unsigned char)0xFF << 8; instead then the (unsigned char)0xFF will be promoted to int using zero extension4, therefore the result will be exactly the same as uint32_t a = 0xFF << 8;
In summary: When in doubt, consult the standard. The compiler rarely lies to you
1 Type of integer literals not int by default?
The type of an integer constant is the first of the corresponding list in which its value can be represented.
Suffix Decimal Constant Octal or Hexadecimal Constant
-------------------------------------------------------------------
none int int
long int unsigned int
long long int long int
unsigned long int
long long int
unsigned long long int
2 Strictly speaking shifting into sign bit like that is undefined behavior
1 << 31 produces the error, "The result of the '<<' expression is undefined"
Defining (1 << 31) or using 0x80000000? Result is different
3 The rule is to add UINT_MAX + 1
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Signed to unsigned conversion in C - is it always safe?
4A cast will always preserve the input value if the value fits in the target type, so casting a signed type to a wider signed type will be done by a sign-extension, and casting an unsigned type to a wider type will be done by a zero-extension
[Credit goes to Mats Petersson]
Using a cast operator to force the compiler to treat the 0xFF as a uint32_t addresses the issue. Seems like the Arduino xcompiler treats constants a little differently since I've never had cast before a shift.
Thanks!

Integer |= Char; operation ignoring high order byte in Integer

Just a quick and specific question, this has stumped me for half an hour almost.
char * bytes = {0x01, 0xD8};
int value = 0;
value = bytes[0]; // result is 1 (0x0001)
value <<= 8; // result is 256 (0x0100)
value |= bytes[1]; // result is -40? (0xFFD8) How is this even happening?
The last operation is the one of interest to me, how is it turning a signed integer of 256 into -40?
edit: changed a large portion of the example code for brevity
In your case the type char is equivalent to signed char, which means that when you save the value 0xD8 in a char, it will come out as a negative number.
The usual arithmetic conversions that happen during the |= operation are value-preserving, so the negative number is preserved.
To solve the problem, you can either make all your data types unsigned when you have binary arithmetics. Or you can write value |= ((unsigned char) buffer[0]) or value |= buffer[0] & 0xFF.
In order to perform the |= operation, we need the operands on both sides to be the same size. Since char is smaller than int, it has to be converted to an int. But, since char is a signed type, it's expanded to an int by sign extension.
That is, D8 becomes FFD8 before the or operation even happens.
I think I got the problem, here char is a signed character (216) but the signed character can store the value in between (-128,127) that means 216 (11011000) Most significant bit is 1 that is this is a negative number which 2's compliment is 00101000 which is equivalent to -40
when you doing this
value |= bytes[1];
in that case actually you are taking OR of 256 , -40
(256 | -40) is equal to -40