For loop with bitshifts

For loop with bitshifts - c++

Can anyone explain how does this foor loop works?
for (bitMask = 0x01; bitMask; bitMask <<= 1)
This is the first time that I have encountered such a syntax in a for loop and would love to know how the loop would end.

If you got an unsigned int32 variable bitMask. At the 32nd cycle, its bit representation is
10000000 00000000 00000000 00000000
Then shift one bit left, it overflows, and only lower 32 bits are kept, so the value becomes 0 and the loop condition becomes false.
1 00000000 00000000 00000000 00000000
↑
this bit is discarded
What about if bitMask is a signed int? Then it's an undefined behavior.
C standard (N2716, 6.5.7 Bitwise shift operators) says:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined
C++ standard (N4713, 8.5.7 Shift operators) says:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in
the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
My perspective is never use such kind of loop as we will easily forget this only works for an unsigned integer. Instead, you should use something like following to generate mask for each bit.
for (int i = 0; i < 32; i++) {
int bitMask = 1 << i;
}

I think the datatype of bitMask is important here. So considering it as int. Below would be happen.
bitMask would be initialized to value 1.
Every Iteration would shift the bit value of bitMask by 1 place.
eg. 1 = 00001(no of bit will be platform dependent. Considering 32
bit) left shifted by 1 will give 00010 which is equal to value 2
This will generate below sequence of value for bitMask. 1,2,4,16,... until the bit having value 1 overflows the 32 bit size.
Once over flowed the value becomes Undefined for int. and 0 if bitMask is considered Unsigned Int as all the bits are zero now. making the condition in for loop false & break.
Simpler readable version:
for (bitMask = 1; bitMask != 0; bitMask *= 2)

Related

Left-shift bit operation for multiplying int-variable: Limited Range for multiplying. Arithmetic pattern after exceeding?

My actual concern is about this:
The left-shift bit operation is used to multiply values of integer variables quickly.
But an integer variable has a defined range of available integers it can store, which is obviously very logical due to the place in bytes which is reserved for it.
Depending on 16-bit or 32-bit system, it preserves either 2 or 4 bytes, which range the available integers from
-32,768 to 32,767 [for signed int] (2 bytes), or
0 to 65,535 [for unsigned int] (2 bytes) on 16-bit
OR
-2,147,483,648 to 2,147,483,647 [for signed int] (4 bytes), or
0 to 4,294,967,295 [for unsigned int] (4 bytes) on 32-bit
My thought is, it should´t be able to multiply the values over the exact half of the maximum integer of the according range.
But what happens then to the values if you proceed the bitwise operation after the value has reached the integer value of the half of the max int value?
Is there an arithmetic pattern which will be applied to it?
One example (in case of 32-bit system):
unsigned int redfox_1 = 2147483647;
unsigned int redfox_2;
redfox_2 = redfox_1 << 1;
/* Which value has redfox_2 now? */
redfox_2 = redfox_1 << 2;
/* Which value has redfox_2 now? */
redfox_2 = redfox_1 << 3;
/* Which value has redfox_2 now? */
/* And so on and on */
/* Is there a arithmetic pattern what will be applied to the value of redfox_2 now? */
the value stored inside redfox_2 shouldn´t be able to go over 2.147.483.647 because its datatype is unsigned int, which can handle only integers up to 4,294,967,295.
What will happen now with the value of redfox_2?
And Is there a arithmetic pattern in what will happen to the value of redfox_2?
Hope you can understand what i mean.
Thank you very much for any answers.

Per the C 2018 standard, 6.5.7 4:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
So, for unsigned integer types, the bits are merely shifted left, and vacated bit positions are filled with zeroes. For signed integer types, the consequences of overflow are not defined by the C standard.
Many C implementations will, in signed shifts, slavishly shift the bits, including shifting value bits into the sign bit, resulting in various positive or negative values that a naïve programmer might not expect. However, since the behavior is not defined by the C standard, a C implementation could also:
Clamp the result at INT_MAX or INT_MIN (for int, or the corresponding maxima for the particular type).
Shift the value bits without affecting the sign bit.
Generate a trap.
Transform the program, when the undefined shift is recognized during compilation and optimization, in arbitrary ways, such as removing the entire code path that performs the shift.

If you really want to see the pattern, then just write a program that prints it:
#include <iostream>
#include <ios>
#include <bitset>
int main()
{
unsigned int redfox = 2147483647;
std::bitset<32> b;
for (int i = 0; i < 32; ++i)
{
redfox = redfox << 1;
b = redfox;
std::cout << std::dec << redfox << ", " << std::hex << redfox << ", " << b << std::endl;
}
}
This produces:
4294967294, fffffffe, 11111111111111111111111111111110
4294967292, fffffffc, 11111111111111111111111111111100
4294967288, fffffff8, 11111111111111111111111111111000
4294967280, fffffff0, 11111111111111111111111111110000
4294967264, ffffffe0, 11111111111111111111111111100000
4294967232, ffffffc0, 11111111111111111111111111000000
4294967168, ffffff80, 11111111111111111111111110000000
4294967040, ffffff00, 11111111111111111111111100000000
4294966784, fffffe00, 11111111111111111111111000000000
4294966272, fffffc00, 11111111111111111111110000000000
4294965248, fffff800, 11111111111111111111100000000000
4294963200, fffff000, 11111111111111111111000000000000
4294959104, ffffe000, 11111111111111111110000000000000
4294950912, ffffc000, 11111111111111111100000000000000
4294934528, ffff8000, 11111111111111111000000000000000
4294901760, ffff0000, 11111111111111110000000000000000
4294836224, fffe0000, 11111111111111100000000000000000
4294705152, fffc0000, 11111111111111000000000000000000
4294443008, fff80000, 11111111111110000000000000000000
4293918720, fff00000, 11111111111100000000000000000000
4292870144, ffe00000, 11111111111000000000000000000000
4290772992, ffc00000, 11111111110000000000000000000000
4286578688, ff800000, 11111111100000000000000000000000
4278190080, ff000000, 11111111000000000000000000000000
4261412864, fe000000, 11111110000000000000000000000000
4227858432, fc000000, 11111100000000000000000000000000
4160749568, f8000000, 11111000000000000000000000000000
4026531840, f0000000, 11110000000000000000000000000000
3758096384, e0000000, 11100000000000000000000000000000
3221225472, c0000000, 11000000000000000000000000000000
2147483648, 80000000, 10000000000000000000000000000000
0, 0, 00000000000000000000000000000000

Does C++20 well-define left shift for signed integers that "overflow"?

In the current C++ Standard Draft, the left shift operator is defined as follows [expr.shift]:
The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N, where N is the width of the type of the result.
Consider int E1 = 2^31-1 = 2'147'483'647, E2 = 1, and int having 32 bits. Then there is an infinite number of numbers congruent to E1×2^E2 = 4'294'967'294 modulo 2^N = 2^32, namely, all the numbers 4'294'967'294 + k×2^32 where k is an arbitrary integer. Examples are 4'294'967'294 (k=0) or -2 (k=-1).
I don't understand what the Standard means by the unique value out of these numbers. Does it mean the unique value that can be represented by the resulting data type? Then, I suppose the result is defined as -2. Is this interpretation correct?
Until C++20, the definition was different and this case would cause undefined behavior. I suppose the change is related to the mandatory 2's-complement representation of negative signed integers.
In fact, there is now no more requirement for E1 to be non-negative. It therefore seems that -1 << 1 is defined as -2. Is that right as well?

Does it mean the unique value that can be represented by the resulting
data type
Yes. The set of numbers congruent to E1×2^E2 modulo 2^N is infinite, but there is only one value in any interval of size 2^N, therefore there is only one value representable in an integer type of width N.
If we look in the "p0907R1 Signed Integers are Two’s Complement" proposal we find a similar phrase with "unique representation" which makes this more clear:
Conversion from signed to unsigned is always well-defined: the result
is the unique value of the destination type that is congruent to the
source integer modulo 2N.
Then, I suppose the result is defined as -2. Is this interpretation
correct?
Yes
On x64 the equivalent asm instruction is shlx (logical shift left)
I suppose the change is related to the mandatory 2-complement
representation of negative signed integers.
Correct. As was the case with unsigned types, now also signed types they mathematically represent equivalence classes (well, it's not clear to me how much this is true as it looks like they want to still keep some UB cases on overflow).

So we know that:
E1 = 2147483647
E2 = 1
N = sizeof(int) * CHAR_BIT = 4 * 8 = 32
Let's compute E1×2^E2 modulo 2^N (modulo is the remainder of the division):
x = E1×2^E2 mod 2^N = 2147483647 * 2 ^ 1 mod 4294967296 = 4294967294 mod 4294967296 = 4294967294
Then we go to here:
For each value x of a signed integer type, the value of the
corresponding unsigned integer type congruent to x modulo 2 N has
the same value of corresponding bits in its value representation.
and I think we also need:
The base-2 representation of a value of signed integer type is the
base-2 representation of the congruent value of the corresponding
unsigned integer type.
That means, that x = 4294967294 is equal to x = -2 for signed int. So the result will be -2.
It therefore seems that -1 << 1 is defined as -2. Is it right as well?
(signed)-1 << 1 =
4294967295 << 1 =
4294967295 * 2 ^ 1 mod 4294967296 =
8589934590 mod 4294967296 =
4294967294 =
(signed)-2

Why does shifting 0xff left by 24 bits result in an incorrect value?

I would like to shift 0xff left by 3 bytes and store it in a uint64_t, which should work as such:
uint64_t temp = 0xff << 24;
This yields a value of 0xffffffffff000000 which is most definitely not the expected 0xff000000.
However, if I shift it by fewer than 3 bytes, it results in the correct answer.
Furthermore, trying to shift 0x01 left by 3 bytes does work.
Here's my output:
0xff shifted by 0 bytes: 0xff
0x01 shifted by 0 bytes: 0x1
0xff shifted by 1 bytes: 0xff00
0x01 shifted by 1 bytes: 0x100
0xff shifted by 2 bytes: 0xff0000
0x01 shifted by 2 bytes: 0x10000
0xff shifted by 3 bytes: 0xffffffffff000000
0x01 shifted by 3 bytes: 0x1000000
With some experimentation, left shifting works up to 3 bits for each uint64_t up to 0x7f, which yields 0x7f000000. 0x80 yields 0xffffffff80000000.
Does anyone have an explanation for this bizarre behavior? 0xff000000 certainly falls within the 264 - 1 limits of uint64_t.

Does anyone have an explanation for this bizarre behavior?
Yes, type of operation always depend on operand types and never on result type:
double r = 1.0 / 2.0;
// double divided by double and result double assigned to r
// r == 0.5
double r = 1.0 / 2;
// 2 converted to double, double divided by double and result double assigned to r
// r == 0.5
double r = 1 / 2;
// int divided by int, result int converted to double and assigned to r
// r == 0.0
When you understand and remenber this you would not fall on this mistake again.

I suspect the behavior is compiler dependent, but I am seeing the same thing.
The fix is simple. Be sure to cast the 0xff to a uint64_t type BEFORE performing the shift. That way the compiler will handle it as the correct type.
uint64_t temp = uint64_t(0xff) << 24

Shifting left creates a negative (32 bit) number which then gets filled to 64 bits.
Try
0xff << 24LL;

Let's break your problem up into two pieces. The first is the shift operation, and the other is the conversion to uint64_t.
As far as the left shift is concerned, you are invoking undefined behavior on 32-bit (or smaller) architectures. As others have mentioned, the operands are int. A 32-bit int with the given value would be 0x000000ff. Note that this is a signed number, so the left-most bit is the sign. According to the standard, if you the shift affects the sign-bit, the result is undefined. It is up to the whims of the implementation, it is subject to change at any point, and it can even be completely optimized away if the compiler recognizes it at compile-time. The latter is not realistic, but it is actually permitted. While you should never rely on code of this form, this is actually not the root of the behavior that puzzled you.
Now, for the second part. The undefined outcome of the left shift operation has to be converted to a uint64_t. The standard states for signed to unsigned integral conversions:
If the destination type is unsigned, the resulting value is the smallest unsigned value equal to the source value modulo 2n where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower, signed integers are sign-extended[footnote 1] or truncated and unsigned integers are zero-extended or truncated respectively.
The footnote clarifies that sign-extension is true only for two's-complement representation which is used on every platform with a C++ compiler currently.
Sign-extension means just that everything left of the sign bit on the destination variable will be filled with the sign-bit, which produces all the fs in your result. As you noted, you could left shift 0x7f by 3-bytes without this occurring, That's because 0x7f=0b01111111. After the shift, you get 0x7f000000 which is the largest signed int, ie the largest number that doesn't affect the sign bit. Therefore, in the conversion, a 0 was extended.
Converting the left operand to a large enough type solves this.
uint64_t temp = uint64_t(0xff) << 24

Merge char* arrays to uint16_t

I'm trying to merge a char*-array into a uint16_t. This is my code so far:
char* arr = new char[2]{0};
arr[0] = 0x0; // Binary: 00000000
arr[1] = 0xBE; // Binary: 10111110
uint16_t merged = (arr[0] << 8) + arr[1];
cout << merged << " As Bitset: " << bitset<16>(merged) << endl;
I was expecting merged to be 0xBE, or in binary 00000000 10111110.
But the output of the application is:
65470 As Bitset: 1111111110111110
In the following descriptions I'm reading bits from left to right.
So arr[1] is at the right position, which is the last 8 bits.
The first 8 bits however are set to 1, which they should not be.
Also, if I change the values to
arr[0] = 0x1; // Binary: 00000001
arr[1] = 0xBE; // Binary: 10111110
The output is:
0000000010111110
Again, arr[1] is at the right position. But now the first 8 bits are 0, whereas the last on of the first 8 should be 1.
Basically what I wan't to do is append arr[1] to arr[0] and interpret the new number as whole.

Perhaps char is signed type in your case, and you are left-shifting 0xBE vhich is interpreted as signed negative value (-66 in a likely case of two's complement).
It is undefined behavior according to the standard. In practice it is often results in extending the sign bit, hence leading ones.
3.9.1 Fundamental types....
It is implementationdefined
whether a char object can hold negative values.
5.8 Shift operators....
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned
type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable
in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.

You need to assign to the wider type before shifting, otherwise you're shifting away† your high bits before they ever even hit the only variable here that's big enough to hold them.
uint16_t merged = arr[0];
merged <<= 8;
merged += arr[1];
Or, arguably:
const uint16_t merged = ((uint16_t)arr[0] << 8) + arr[1];
You also may want to consider converting through unsigned char first to avoid oddities with the high bit set. Try out a few different values in your unit test suite and see what happens.
† Well, your program has undefined behaviour from this out-of-range shift, so who knows what might happen!

Why does std::bitset expose bits in little-endian fashion?

When I use std::bitset<N>::bitset( unsigned long long ) this constructs a bitset and when I access it via the operator[], the bits seems to be ordered in the little-endian fashion. Example:
std::bitset<4> b(3ULL);
std::cout << b[0] << b[1] << b[2] << b[3];
prints 1100 instead of 0011 i.e. the ending (or LSB) is at the little (lower) address, index 0.
Looking up the standard, it says
initializing the first M bit positions to the corresponding bit values in val
Programmers naturally think of binary digits from LSB to MSB (right to left). So the first M bit positions is understandably LSB → MSB, so bit 0 would be at b[0].
However, under shifting, the definition goes
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled.
Here one has to interpret the bits in E1 as going from MSB → LSB and then left-shift E2 times. Had it been written from LSB → MSB, then only right-shifting E2 times would give the same result.
I'm surprised that everywhere else in C++, the language seems to project the natural (English; left-to-right) writing order (when doing bitwise operations like shifting, etc.). Why be different here?

There is no notion of endian-ness as far as the standard is concerned. When it comes to std::bitset, [template.bitset]/3 defines bit position:
When converting between an object of class bitset<N> and a value of
some integral type, bit position pos corresponds to the bit value 1<<pos.
The integral value corresponding to two or more bits is the sum
of their bit values.
Using this definition of bit position in your standard quote
initializing the first M bit positions to the corresponding bit values in val
a val with binary representation 11 leads to a bitset<N> b with b[0] = 1, b[1] = 1 and remaining bits set to 0.

This is consistent with the way bits are usually numbered - bit 0 represents 20, bit 1 represents 21, etc. It has nothing to do with the endianness of the architecture, which concerns byte ordering not bit ordering.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js