C/C++ Bitwise Operations not resulting in expected output?

C/C++ Bitwise Operations not resulting in expected output? - c++

I'm currently working on bitwise operations but I am confused right now... Here's the scoop and why
I have a byte 0xCD in bits this is 1100 1101
I am shifting the bits left 7, then I'm saying & 0xFF since 0xFF in bits is 1111 1111
unsigned int bit = (0xCD << 7) & 0xFF<<7;
Now I would make the assumption that both 0xCD and 0xFF would get shifted to the left 7 times and the remaining bit would be 1&1 = 1 but I'm not getting that for output also I would also make the assumption that shifting 6 would give me bits 0&1 = 0 but I'm getting again a number above 1 like 205 0.o Is there something incorrect about the way I am trying to process bit shifting in my head? If so what is it that I am doing wrong?
Code Below:
unsigned char byte_now = 0xCD;
printf("Bits for byte_now: 0x%02x: ", byte_now);
/*
* We want to get the first bit in a byte.
* To do this we will shift the bits over 7 places for the last bit
* we will compare it to 0xFF since it's (1111 1111) if bit&1 then the bit is one
*/
unsigned int bit_flag = 0;
int bit_pos = 7;
bit_flag = (byte_now << bit_pos) & 0xFF;
printf("%d", bit_flag);

Is there something incorrect about the way I am trying to process bit shifting in my head?
There seems to be.
If so what is it that I am doing wrong?
That's unclear, so I offer a reasonably full explanation.
In the first place, it is important to understand that C does not not perform any arithmetic directly on integers smaller than int. Consider, then, your expression byte_now << bit_pos. "The usual arithmetic promotions" are performed on the operands, resulting in the left operand being converted to the int value 0xCD. The result has the same pattern of least-significant value bits as bit_flag, but also a bunch of leading zero bits.
Left shifting the result by 7 bits produces the bit pattern 110 0110 1000 0000, equivalent to 0x6680. You then perform a bitwise and operation on the result, masking off all but the least-significant 8 bits, thus yielding 0x80. What happens when you assign that to bit_flag depends on the type of that variable, but if it is an integer type that is either unsigned or has more than 7 value bits then the assignment is well-defined and value-preserving. Note that it is bit 7 that is nonzero, not bit 0.
The type of bit_flag is more important when you pass it to printf(). You've paired it with a %d field descriptor, which is correct if bit_flag has type int and incorrect otherwise. If bit_flag does have type int, then I would expect the program to print 128.

Related

c++ combining 2 uint8_t into one uint16_t not working?

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!

There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.

In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.

If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15

Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.

One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.

Why does shifting 0xff left by 24 bits result in an incorrect value?

I would like to shift 0xff left by 3 bytes and store it in a uint64_t, which should work as such:
uint64_t temp = 0xff << 24;
This yields a value of 0xffffffffff000000 which is most definitely not the expected 0xff000000.
However, if I shift it by fewer than 3 bytes, it results in the correct answer.
Furthermore, trying to shift 0x01 left by 3 bytes does work.
Here's my output:
0xff shifted by 0 bytes: 0xff
0x01 shifted by 0 bytes: 0x1
0xff shifted by 1 bytes: 0xff00
0x01 shifted by 1 bytes: 0x100
0xff shifted by 2 bytes: 0xff0000
0x01 shifted by 2 bytes: 0x10000
0xff shifted by 3 bytes: 0xffffffffff000000
0x01 shifted by 3 bytes: 0x1000000
With some experimentation, left shifting works up to 3 bits for each uint64_t up to 0x7f, which yields 0x7f000000. 0x80 yields 0xffffffff80000000.
Does anyone have an explanation for this bizarre behavior? 0xff000000 certainly falls within the 264 - 1 limits of uint64_t.

Does anyone have an explanation for this bizarre behavior?
Yes, type of operation always depend on operand types and never on result type:
double r = 1.0 / 2.0;
// double divided by double and result double assigned to r
// r == 0.5
double r = 1.0 / 2;
// 2 converted to double, double divided by double and result double assigned to r
// r == 0.5
double r = 1 / 2;
// int divided by int, result int converted to double and assigned to r
// r == 0.0
When you understand and remenber this you would not fall on this mistake again.

I suspect the behavior is compiler dependent, but I am seeing the same thing.
The fix is simple. Be sure to cast the 0xff to a uint64_t type BEFORE performing the shift. That way the compiler will handle it as the correct type.
uint64_t temp = uint64_t(0xff) << 24

Shifting left creates a negative (32 bit) number which then gets filled to 64 bits.
Try
0xff << 24LL;

Let's break your problem up into two pieces. The first is the shift operation, and the other is the conversion to uint64_t.
As far as the left shift is concerned, you are invoking undefined behavior on 32-bit (or smaller) architectures. As others have mentioned, the operands are int. A 32-bit int with the given value would be 0x000000ff. Note that this is a signed number, so the left-most bit is the sign. According to the standard, if you the shift affects the sign-bit, the result is undefined. It is up to the whims of the implementation, it is subject to change at any point, and it can even be completely optimized away if the compiler recognizes it at compile-time. The latter is not realistic, but it is actually permitted. While you should never rely on code of this form, this is actually not the root of the behavior that puzzled you.
Now, for the second part. The undefined outcome of the left shift operation has to be converted to a uint64_t. The standard states for signed to unsigned integral conversions:
If the destination type is unsigned, the resulting value is the smallest unsigned value equal to the source value modulo 2n where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower, signed integers are sign-extended[footnote 1] or truncated and unsigned integers are zero-extended or truncated respectively.
The footnote clarifies that sign-extension is true only for two's-complement representation which is used on every platform with a C++ compiler currently.
Sign-extension means just that everything left of the sign bit on the destination variable will be filled with the sign-bit, which produces all the fs in your result. As you noted, you could left shift 0x7f by 3-bytes without this occurring, That's because 0x7f=0b01111111. After the shift, you get 0x7f000000 which is the largest signed int, ie the largest number that doesn't affect the sign bit. Therefore, in the conversion, a 0 was extended.
Converting the left operand to a large enough type solves this.
uint64_t temp = uint64_t(0xff) << 24

Why does masking a negative number produce a positive number?

in c++, I have the following code:
int x = -3;
x &= 0xffff;
cout << x;
This produces
65533
But if I remove the negative, so I have this:
int x = 3;
x &= 0xffff;
cout << x;
I simply get 3 as a result
Why does the first result not produce a negative number? I would expect that -3 would be sign extended to 16 bits, which should still give a twos complement negative number, considering all those extended bits would be 1. Consequently the most significant bit would be 1 too.

It looks like your system uses 32-bit ints with two's complement representation of negatives.
Constant 0xFFFF covers the least significant two bytes, with the upper two bytes are zero.
The value of -3 is 0xFFFFFFFD, so masking it with 0x0000FFFF you get 0x0000FFFD, or 65533 in decimal.
Positive 3 is 0x00000003, so masking with 0x0000FFFF gives you 3 back.
You would get the result that you expect if you specify 16-bit data type, e.g.
int16_t x = -3;
x &= 0xffff;
cout << x;

In your case int is more than 2 bytes. You probably run on modern CPU where usually these days integer is 4 bytes (or 32 bits)
If you take a look the way system stores negative numbers you will see that its a complementary number. And if you take only last 2 bytes as your mask is 0xFFFF then you will get only a part of it.
your 2 options:
use short intstead of int. Usually its a half of integer and will be only 2 bites
use bigger mask like 0xFFFFFFFF that it covers all the bits of your integer
NOTE: I use "usually" because the amount of bits in your int and short depends on your CPU and compiler.

Why does std::bitset expose bits in little-endian fashion?

When I use std::bitset<N>::bitset( unsigned long long ) this constructs a bitset and when I access it via the operator[], the bits seems to be ordered in the little-endian fashion. Example:
std::bitset<4> b(3ULL);
std::cout << b[0] << b[1] << b[2] << b[3];
prints 1100 instead of 0011 i.e. the ending (or LSB) is at the little (lower) address, index 0.
Looking up the standard, it says
initializing the first M bit positions to the corresponding bit values in val
Programmers naturally think of binary digits from LSB to MSB (right to left). So the first M bit positions is understandably LSB → MSB, so bit 0 would be at b[0].
However, under shifting, the definition goes
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled.
Here one has to interpret the bits in E1 as going from MSB → LSB and then left-shift E2 times. Had it been written from LSB → MSB, then only right-shifting E2 times would give the same result.
I'm surprised that everywhere else in C++, the language seems to project the natural (English; left-to-right) writing order (when doing bitwise operations like shifting, etc.). Why be different here?

There is no notion of endian-ness as far as the standard is concerned. When it comes to std::bitset, [template.bitset]/3 defines bit position:
When converting between an object of class bitset<N> and a value of
some integral type, bit position pos corresponds to the bit value 1<<pos.
The integral value corresponding to two or more bits is the sum
of their bit values.
Using this definition of bit position in your standard quote
initializing the first M bit positions to the corresponding bit values in val
a val with binary representation 11 leads to a bitset<N> b with b[0] = 1, b[1] = 1 and remaining bits set to 0.

This is consistent with the way bits are usually numbered - bit 0 represents 20, bit 1 represents 21, etc. It has nothing to do with the endianness of the architecture, which concerns byte ordering not bit ordering.

Shifting more than 8 bits - leading to wrong output

I am trying to perform this operation, and im getting the wrong output.
signed char temp3[3] = {0x0D, 0xFF, 0xC0};
double temp = ((temp3[0] & 0x03) << 10) | (temp3[1]) | ((temp3[2] & 0xC0) >> 6)
I am trying to form a 12 bit number. get the last 2 bits of 0x0D, all 8 of 0xFF and first 2 of 0xC0 to form the binary number (011111111111) = 2047, however I am getting -1. When I break the first mask and shift of 10, I get 0. I dont know if this is my problem, trying to shift an 8 bit character 10 bits.

When bit twiddling, always use unsigned numbers.
Change the array to unsigned char.
Add the 'U' suffix to each constant, because each constant is a signed integer by default.
BTW, right shifting is undefined implementation defined for signed integers.
Per comments, changed "undefined" to "implementation defined".

There are a few things you need to address.
First up, c++ doesn't have 12 bit numbers. The best you can have are 16 bit. The top bit represents sign in twos complement form.
You also need to be very careful shift of the type of the number you are shifting. In your example, you are left shifting a char by over 8 bits. As a char is only 8 bits, you are zeroing it.
The following example gives a correct implmentation (for signed 12 bit numbers). There are no doubt more efficient ones.
// shift in top 2 bits
signed short test = static_cast<signed short>(temp3[0] & 0x03) << 10 ;
// shift in middle 8 bits
test |= (static_cast<signed short>(temp3[1]) << 2) & 0x03FC;
// rightshift, mask and append lower 2 bits
test |= (static_cast<signed short>(temp3[2]) >> 6) & 0x0003;
// sign extend top bits from 12 bits to 16 bits
test |= (temp3[0] & 0x02) == 0 ? 0x0000 : 0xF0000;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js