What does ~ mean in C++? - c++

Specifically, could you tell me what this line of code does:
int var1 = (var2 + 7) & ~7;
Thanks

It's bitwise negation. This means that it performs the binary NOT operator on every bit of a number. For example:
int x = 15; // Binary: 00000000 00000000 00000000 00001111
int y = ~x; // Binary: 11111111 11111111 11111111 11110000
When coupled with the & operator it is used for clearing bits. So, in your example it means that the last 3 bits of the result of var2+7 are set to zeroes.
As noted in the comments, it's also used to denote destructors, but that's not the case in your example.

This code rounds up var1 to the closest n*8 number. & ~7 sets last 3 bits to 0, rounding down to 8*n.

It's a bitwise NOT. Not to be confused with logical not (which is !), which flips the logical value (true to false and vice versa). This operator flips every bit in a variable.

7 in binary is 00000111, so ~7 is 11111000 (assuming an eight-bit byte). The code author is using it for bit masking.

The effect of the code, as noted, is to round a value to the next higher multiple of eight. My preferred formulation would be "var1 = (var2 | 7)+1;" but to understand the expression as written, it's most helpful to understand it from the outside in.
Although "&" and "~" are separate operators, with different prioritization rules, the concept of "a = b & ~c;" is a useful one which in a very real sense deserves its own operator (it would allow more sensible integer promotion rules, among other things). Basically, "a = b & ~c;" serves to cancel out any bits in 'b' that are also in 'c' (if 'b' is long and 'c' isn't, because of integer promotion rules, it my cancel higher bits as well). If 'c' is 2^N-1, the expression will cancel out the bottom N bits, which is equivalent to rounding down to the next multiple of 2^N.
The expression as written adds 7 to var2 before rounding the result down to the next multiple of 8. If var2 was a multiple of 8, adding 7 won't quite reach the next higher multiple of 8, but otherwise it will. Thus, the expression as a whole will round up to the next multiple of 8.
Incidentally, my preferred formulation rounds the number up to the next higher value that's just short of a multiple of 8, and then bumps it up to the next multiple. It avoids the repetition of the magic number "7", and in some instruction sets the approach will save code.

Related

Why doesn't the bit-shift of a variable and a number have the same result?

I'm shifting some bits and just realized that doing the operation using a variable doesn't have the same result as using a number. See the example below.
int a = 97;
int b = 0;
b = 1 << a;
printf("%d\n", b);
// 2
b = 1 << 97;
printf("%d\n", b);
// 0 - warning: shift count >= width of type [-Wshift-count-overflow]
Since the result of a left shift with a right operand larger than the length in bits of the left operand is undefined, any result is possible from the expression.
In the variable case (1 << a), since a is 97 (larger than the number of bits in an int), the most likely results are 1 << (97 % 32) == 1 << 1 == 2 or 0, typically depending on how the hardware (CPU) handles these shifts.
With a constant (1 << 97), the compiler knows you're shifting too far, issues the warning (which is not required), and defines the result as 0 (also not required).
The warning you are seeing is a compile time warning. Now, you can clearly see that your int b is a 32-bit variable which will be overflown if left-shifted 97 times. So, it's a valid concern. But the compiler can only detect this overflow for the constant number of shifts as it is evaluated during compilation and the compiler immediately knows that it'll overflow.
In case of variable number of shifts, the compiler isn't smart enough to know what value int a will posses when it'll come down to shifting. So, the compiler leaves it upto you.
The undefined behavior is outlined in the C++ standard here.
http://eel.is/c++draft/expr.shift
The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand.
You'll get different results depending on compiler and optimization level. If you turn on optimization, the compiler will easily optimize out the first shifting operation and then make it 0 as well.
Why exactly does it act like that though? The x86 instruction for shifting by a variable is SAL (shift-arithmetic-left). You can see the instruction list for bit shifting operations here:
https://c9x.me/x86/html/file_module_x86_id_285.html
The one that would be used in an unoptimized build would be SAL r/m32, CL. The CL register is 8 bits, but the processor masks it to 5 bits internally:
The destination operand can be a register or a memory location. The count operand can be an immediate value or register CL. The count is masked to 5 bits, which limits the count range to 0 to 31. A special opcode encoding is provided for a count of 1.

Typecasting char to long

Say I have a variable, a
char a = 0x01;
and I want to cast this to a long, as in
long b;
b = (long)a;
Will the upper 3 bytes in b be guaranteed to be 0? With my setup they are 0, but I'm not sure if this is compiler-dependent.
Yes, b is guaranteed to have the value 0x1 after this assignment even without the cast. The assignment operator in c++ is generally semantic or value driven, it will copy the value or state, rather than preform bit wise copy (even if the two are sometimes equivalent, such as for trivial types).
In some cases, specially because of operator overloading, this may not be the case. Developers are very strongly encouraged to keep to this concept when they design new types, but a careless programmer could overload the assignment operator for non-fundamental types to do anything he/she wants.
As a long can represent all values for a char (be it signed or unsigned) the conversion is guaranteed to not change the value.
If you initially have a positive value, because either char is signed in you architecture or because the char values is between 0 and 127 (assuming 8 bit characters), the resulting long is guaranteed to be positive and less that 256. So in an architecture where long is 4 bytes large, the 3 high order bytes are guaranteed to be 0.
If char is signed and if the initial value is negative, things will be different! The value will be unchanged and will still be negative. In a common 2'complement architecture, the 3 high order bits will be 0xFF
The answer already given is right, but I thought I'd add that for C++, it is recommended to use one of the C++-specific casting notations, to make it abundantly clear what you are doing. Here, you would use:
long b;
b = static_cast<long>(a);
This makes it very clear what you are doing (a cast whereby how the cast is performed is calculated at compile time to a long), and you know that the "right" sort of cast will be performed.
char a = 0x01;
long b;
b = (long)a;
C and C++ are two different (but closely related) languages. Their rules happen to be the same in this case.
The cast (not "typecast") is not necessary. The assignment could, and probably should, be written as:
b = a;
which causes an implicit conversion from char to long. Since the value being converted is within the representable range of type long, the result of the conversion is 1. The result of the conversion is specified in terms of values, not representations.
The representation of the value 1 in type long probably has a 1 in the low-order bit, and 0s in all the other bits. (And the position of the low-order bit can vary; some systems are big-endian, some are little-endian, and there are other possibilities.)
There is no guarantee that type long even has three high-order bytes. Type long is at least 32 bits wide, but a byte can be wider than 8 bits. It's even possible that there are values of type char that exceed LONG_MAX (if plain char is signed and long is 1 byte, which implies CHAR_BIT >= 32).
It's also possible that the representation of type long includes padding bits, bits that do not contribute to the value. It's guaranteed that the sign bit is 0, the low-order value bit is 1, and all other value bits are 0, but if there are padding bits their values are not guaranteed. (Some combinations of padding bits can result in a trap representation that does not represent any value, but that can't happen in this particular case.)
Most of these exotic possibilities are very unlikely to occur in real life. C implementations for some DSPs do have bytes wider than 8 bits, but any system you're using almost certainly has 8-bit bytes.
The point is that the result of the conversion is defined in terms of values, not representations, and 99% of the time that's all you need to care about. If you write:
char a = 1; /* same as 0x01 */
long b = a;
printf("b = %ld\n", b);
it will print b = 1, even if you're using some exotic system where the value 1 is represented strangely.
b will be 1; this is always, compiler and endianness-independent, true. Additionally, the following expressions will be true:
b == 1
b == 01
b == 0x1
b == 0x00000001
b == 0x00000000000000000000000000000000000000000000000000001
The right hand side in all cases is an int constant with the value 1; not more, not less. Note that the zeroes do not represent bytes in memory (an int most likely does not have the number of bytes the last expression appears to suggest). The hexadecimal notation is just another way to write down a 1, exactly like 1.
In particular, we don't know where in memory the byte with the value 1 is located, because that is architecture dependent. It may be the one at the address of the int, or it may be the other end, or even in between.
Now comes the sweet thing: C does not care how the memory in an int is laid out. None of the ways to write an integer constant is architecture dependent. That seems self-understood with decimal constants — did we expect that the meaning of int i = 1 is architecture dependent? Certainly not. Nor is int i = 0x00000001;. The same is true for the bit shift operators: << shifts towards more significant bits, >> towards less significant bits. The digits in (decimal or hexadecimal) integer constants are ordered so that the most significant digits are on the left side, aligning with the "direction" indicated by the arrow-like bit shift operators. That may or may not reflect your machine's int representation; on a PC it does not.
Bottom line: If you use the standard C (or C++) means to test the "upper 3 bytes", you are home free, and the following is always true, independent of the implementation or architecture:
char a = 0x01;
long b = a;
(b & 0x11) == 1 // least significant byte is 1
(b & 0x00000011) == 1 // exactly the same as above
(b & 0x11111100) == 0 // more significant three bytes are all 0
It's possible that your long has more bits, but that is implementation dependent. How many more there are: they are all zero, save for the least significant one.

Bit shifting and assignment [duplicate]

This question already has answers here:
Why doesn't left bit-shift, "<<", for 32-bit integers work as expected when used more than 32 times?
(10 answers)
Closed 9 years ago.
This is sort of driving me crazy.
int a = 0xffffffff;
int b = 32;
cout << (a << b) << "\n";
cout << (0xffffffff << 32) << "\n";
My output is
-1
0
Why am I not getting
0
0
Undefined behavior occurs when you shift a value by a number of bits which is not less than its size (e.g, 32 or more bits for a 32-bit integer). You've just encountered an example of that undefined behavior.
The short answer is that, since you're using an implementation with 32-bit int, the language standard says that a 32-bit shift is undefined behavior. Both the C standard (section 6.5.7) and the C++ standard (section 5.8) say
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
But if you want to know why:
Many computers have an instruction that can shift the value in a register, but the hardware only handles shift values that are actually needed. For instance, when shifting a 32 bit word, only 5 bits are needed to represent a shift value of 0 ... 31 and so the hardware may ignore higher order bits, and does on *86 machines (except for the 8086). So that compiler implementations could just use the instruction without generating extra code to check whether the shift value is too big, the authors of the C Standard (many of whom represented compiler vendors) ruled that the result of shifting by larger amounts is undefined.
Your first shift is performed at run time and it encounters this situation ... only the low order 5 bits of b are considered by your machine, and they are 0, so no shift happens. Your second shift is done at compile time, and the compiler calculates the value differently and actually does the 32-bit shift.
If you want to shift by an amount that may be larger than the number of bits in the thing you're shifting, you need to check the range of the value yourself. One possible way to do that is
#define LEFT_SHIFT(a, b) ((b) >= CHAR_BIT * sizeof(a)? 0 : (a) << (b))
C++ standard says ::
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
As GCC has no options to handle shifts by negative amounts or by amounts outside the width of the type predictably or trap on them; they are always treated as undefined.
So behavior is not defined.

What does this function do?

I am reading a program which contains the following function, which is
int f(int n) {
int c;
for (c=0;n!=0;++c)
n=n&(n-1);
return c;
}
I don't quite understand what does this function intend to do?
It counts number of 1's in binary representation of n
The function is INTENDED to return the number of bits in the representation of n. What is missed out in the other answers is, that the function invokes undefined behaviour for arguments n < 0. This is because the function peels the number away one bit a time, starting from the lowest bit to the highest. For a negative number this means, that the last value of n before the loop terminates (for 32-bit integers in 2-complement) is 0x8000000. This number is INT_MIN and it is now used in the loop for the last time:
n = n&(n-1)
Unfortunately, INT_MIN-1 is a overflow and overflows invoke undefined behavior. A conforming implementation is not required to "wrap around" integers, it may for example issue an overflow trap instead or leave all kinds of weird results.
It is a (now obsolete) workaround for the lack of the POPCNT instruction in non-military cpu's.
This counts the number of iterations it takes to reduce n to 0 by using a binary and.
The expression n = n & (n - 1) is a bitwise operation which replaces the rightmost bit '1' to '0' in n.
For examle, take an integer 5 (0101). Then n & (n - 1) → (0101) & (0100) → 0100 (removes first '1' bit from right side).
So the above code returns the number of 1's in binary form of given integer.
It shows a way how not to program(for the x86 Instruction set), using a intrinsic/inline assembler instruction is faster and better to read for something simple like this. (but this is only true for a x86 Architecture as far as i know, i don't know how's it about ARM or SPARC or something else)
Could it be it tries to return the number of significant bits in n? (Haven't thought through it completely...)

What does ~0 mean in this code?

What's the meaning of ~0 in this code?
Can somebody analyze this code for me?
unsigned int Order(unsigned int maxPeriod = ~0) const
{
Point r = *this;
unsigned int n = 0;
while( r.x_ != 0 && r.y_ != 0 )
{
++n;
r += *this;
if ( n > maxPeriod ) break;
}
return n;
}
~0 is the bitwise complement of 0, which is a number with all bits filled. For an unsigned 32-bit int, that's 0xffffffff. The exact number of fs will depend on the size of the value that you assign ~0 to.
It's the one complement, which inverts all bits.
~ 0101 => 1010
~ 0000 => 1111
~ 1111 => 0000
As others have mentioned, the ~ operator performs bitwise complement. However, the result of performing the operation on a signed value is not defined by the standard.
In particular, the value of ~0 need not be -1, which is probably the value intended. Setting the default argument to
unsigned int maxPeriod = -1
would make maxPeriod contain the highest possible value (signed to unsigned conversion is defined as an assignment modulo 2**n, where n is a characteristic number of the given unsigned type (the number of bits of representation)).
Also note that default arguments are not valid in C.
It's a binary complement function.
Basically it means flip each bit.
It is the bitwise complement of 0 which would be, in this example, an int with all the bits set to 1. If sizeof(int) is 4, then the number is 0xffffffff.
Basically, it's saying that maxPeriod has a default value of UINT_MAX. Rather than writing it as UINT_MAX, the author used his knowledge of complements to calculate the value.
If you want to make the code a bit more readable in the future, include
#include <limits>
and change the call to read
unsigned int Order(unsigned int maxPeriod = UINT_MAX) const
Now to explain why ~0 is UINT_MAX. Since we are dealing with an int, in which 0 is represented with all zero bits (00000000). Adding one would give (00000001), adding one more would give (00000010), and one more would give (00000011). Finally one more addition would give (00000100) because the 1's carry.
For unsigned ints, if you repeat the process ad-infiniteum, eventually you have all one bits (11111111), and adding another one will overflow the buffer setting all the bits back to zero. This means that all one bits in an unsigned number is the maximum that data type (int in your case) can hold.
The "~" operation flips all bits from 0 to 1 or 1 to 0, flipping a zero integer (which has all zero bits) effectively gives you UINT_MAX. So he basically the previous coded opted to computer UINT_MAX instead of using the system defined copy located in #include <limits.h>
In the example it is probably an attempt to generate the UINT_MAX value. The technique is possibly flawed for reasons already stated.
The expression does however does have legitimate use to generate a bit mask with all bits set using a literal constant that is type-width independent; but that is not how it is being used in your example.
As others have said, ~ is the bitwise complement operator (sometimes also referred to as bitwise not). It's a unary operator which means that it takes a single input.
Bitwise operators treat the input as a bit pattern and perform their respective operations on each individual bit then return the resulting pattern. Applying the ~ operator to the bit pattern will negate each bit (each zero becomes a one, each one becomes a zero).
In the example you gave, the bit representation of the integer 0 is all zeros. Thus, ~0 will produce a bit pattern of all ones. Even though 0 is an int, it is the bit pattern ~0 that is assigned to maxPeriod (not the int value that would be represented by said bit pattern). Since maxPeriod is an unsigned int, it is assigned the unsigned int value represented by ~0 (a pattern of all ones), which is in fact the highest value that an unsigned int can store before wrapping around back to 0.