Absolute value abs(x) using bitwise operators and Boolean logic [duplicate] - c++

This question already has answers here:
How to compute the integer absolute value
(11 answers)
Closed 2 years ago.
How does this work?
The idea is to make abs(x) use bitwise operators for integers (assuming 32 bit words):
y = x >> 31
(x + y) ^ y // This gives abs(x) (is ^ XOR)?

Assuming 32-bit words, as stated in the question:
For negative x, x >> 31 is implementation-defined in the C and C++ standards. The author of the code expects two’s complement integers and an arithmetic right-shift, in which x >> 31 produces all zero bits if the sign bit of x is zero and all one bits if the sign bit is one.
Thus, if x is positive or zero, y is zero, and x + y is x, so (x + y) ^ y is x, which is the absolute value of x.
If x is negative, y is all ones, which represents −1 in two’s complement. Then x + y is x - 1. Then XORing with all ones inverts all the bits. Inverting all the bits is equivalent to taking the two’s complement and subtracting one, and two’s complement is the method used to negate integers in two’s complement format. In other words, XORing q with all ones gives -q - 1. So x - 1 XORed with all ones produces -(x - 1) - 1 = -x + 1 - 1 = -x, which is the absolute value of x except when x is the minimum possible value for the format (−2,147,483,648 for 32-bit two’s complement), in which case the absolute value (2,147,483,648) is too large to represent, and the resulting bit pattern is just the original x.

This approach relies on many implementation specific behavior:
It assumes that x is 32 bits wide. Though, you could fix this by x >> (sizeof(x) * CHAR_BIT - 1)
It assumes that the machine uses two's complement representation.
the right-shift operator copies the sign bit from left to right.
Example with 3 bits:
101 -> x = -3
111 -> x >> 2
101 + 111 = 100 -> x + y
100 XOR 111 -> 011 -> 3
This is not portable.

This isn't portable, but I'll explain why it works anyway.
The first operation exploits a trait of 2's complement negative numbers, that the first bit if 1 if negative, and 0 if positive. This is because the numbers range from
The example below is for 8 bits, but can be extrapolated to any number of bits. In your case it's 32 bits (but 8 bits displays the ranges more easily)
10000000 (smallest negative number)
10000001 (next to smallest)
...
11111111 (negative one)
00000000 (zero)
00000001 (one)
...
01111110 (next to largest)
01111111 (largest)
Reasons for using 2's complement encoding of numbers come about by the property that adding any negative number to it's positive number yields zero.
Now, to create the negative of a 2's complement number, you would need to
Take the inverse (bitwise not) of a the input number.
Add one to it.
The reason the 1 is added to it is to force the feature of the addition zeroing the register. You see, if it was just x + ~(x), then you would get a register of all 1's. By adding one to it, you get a cascading carry which yields a register of zeros (with a 1 in the carry out of the register).
This understanding is important to know "why" the algorithm you provided (mostly) works.
y = x >> 31 // this line acts like an "if" statement.
// Depending on if y is 32 signed or unsigned, when x is negative,
// it will fill y with 0xFFFFFFFF or 1. The rest of the
// algorithm doesn't, care because it accommodates both inputs.
// when x is positive, the result is zero.
We will explore (x is positive first)
(x + y) ^ y // for positive x, first we substitute the y = 0
(x + 0) ^ 0 // reduce the addition
(x) ^ 0 // remove the parenthesis
x ^ 0 // which, by definition of xor, can only yield x
x
Now let's explore (x is negative, y is 0xFFFFFFFF (y was signed))
(x + y) ^ y // first substitute the Y
(x + 0xFFFFFFFF) ^ 0xFFFFFFFF // note that 0xFFFFF is the same as 2's complement -1
(x - 1) ^ 0xFFFFFFFF // add in a new variable Z to hold the result
(x - 1) ^ 0xFFFFFFFF = Z // take the ^ 0xFFFFFFFF of both sides
(x - 1) ^ 0xFFFFFFFF ^ 0xFFFFFFFF = Z ^ 0xFFFFFFFF // reduce the left side
(x - 1) = z ^ 0xFFFFFFFF // note that not is equivalent to ^ 0xFFFFFFFF
(x - 1) = ~(z) // add one to both sides
x - 1 + 1 = ~(z) + 1 // reduce
x = ~(z) + 1 // by definition z is negative x (for 2's complement numbers)
Now let's explore (x is negative, y is 0x01 (y was unsigned))
(x + y) ^ y // first substitute the Y
(x + 1) ^ 0x00000001 // note that x is a 2's complement negative, but is
// being treated as unsigned, so to make the unsigned
// context of x tracable, I'll add a -(x) around the X
(-(x) + 1) ^ 0x00000001 // which simplifies to
(-(x - 1)) ^ 0x00000001 // negative of a negative is positive
(-(x - 1)) ^ -(-(0x00000001)) // substituting 1 for bits of -1
(-(x - 1)) ^ -(0xFFFFFFFF) // pulling out the negative sign
-((x-1) ^ 0xFFFFFFFF) // recalling that while we added signs and negations to
// make the math sensible, there's actually no place to
// store them in an unsigned storage system, so dropping
// them is acceptable
x-1 ^ 0XFFFFFFFF = Z // introducing a new variable Z, take the ^ 0xFFFFFFF of both sides
x-1 ^ 0xFFFFFFFF ^ 0xFFFFFFFF = Z ^ 0xFFFFFFFF // reduce the left side
x-1 = z ^ 0xFFFFFFFF // note that not is equivalent to ^ 0xFFFFFFFF
x-1 = ~(z) // add one to both sides
x - 1 + 1 = ~(z) + 1 // reduce
x = ~(z) + 1 // by definition z is negative x (for 2's complement numbers, even though we used only non-2's complement types)
Note that while the above proofs are passable for a general explanation, the reality is that these proofs don't cover important edge cases, like x = 0x80000000 , which represents a negative number greater in absolute value than any positive X which could be stored in the same number of bits.

I use this code, first the calculation of the two's complement (the guard just ensures with a compile time check, the template is an Integer)
/**
* Zweierkomplement - Two's Complement
*/
template<typename T> constexpr auto ZQ(T const& _x) noexcept ->T{
Compile::Guards::IsInteger<T>();
return ((~(_x))+1);
}
and in a second step this is used to calculate the integer abs()
/**
* if number is negative, get the same number with positiv sign
*/
template<typename T> auto INTABS(T const _x) -> typename std::make_unsigned<T>::type{
Compile::Guards::IsInteger<T>();
return static_cast<typename std::make_unsigned<T>::type>((_x<0)?(ZQ<T>(_x)):(_x));
}
why I use this kind of code:
* compile-time checks
* works with all Integer sizes
* portable from small µC to modern cores
* Its clear, that we need to consider the two's complement, so you need an unsigned return value, e.g for 8bit abs(-128)=128 can not be expressed in an signed integer

Related

Invalid solution for code challenge with operator restrictions

To answer this question, I read this source code on github and found a problem with the second function.
The challenge is to write C code with various restrictions in terms of operators and language constructions to perform given tasks.
/*
* fitsShort - return 1 if x can be represented as a
* 16-bit, two's complement integer.
* Examples: fitsShort(33000) = 0, fitsShort(-32768) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 8
* Rating: 1
*/
int fitsShort(int x) {
/*
* after left shift 16 and right shift 16, the left 16 of x is 00000..00 or 111...1111
* so after shift, if x remains the same, then it means that x can be represent as 16-bit
*/
return !(((x << 16) >> 16) ^ x);
}
Left shifting a negative value or a number whose shifted value is beyond the range of int has undefined behavior, right shifting a negative value is implementation defined, so the above solution is incorrect (although it is probably the expected solution).
Is there a solution to this problem that only assumes 32-bit two's complement representation?
The following only assumes 2's complement with at least 16 bits:
int mask = ~0x7FFF;
return !(x&mask)|!(~x&mask);
That uses a 15-bit constant; if that is too big, you can construct it from three smaller constants, but that will push it over the 8-operator limit.
An equivalent way of writing that is:
int m = 0x7FFF;
return !(x&~m)|!~(x|m);
But it's still 7 operations, so int m = (0x7F<<8)|0xFF; would still push it to 9. (I only added it because I don't think I've ever before found a use for !~.)

Using 1's complement to generate a mask that shows the first non-zero bit

I found an interesting property about 1's complement when reading an interview preparation book.
The property says given a number X, we can generate a mask that shows the first set bit (from right to left) using the 1's complement as follows:
X & ~(X - 1) where ~ stands for 1's complement.
For example, if X = 0b0011 then
0b0011 & 0b1101 = 0b0001
I understood that the author is doing the X-1 to flip the first non-zero bit from the right. But I'm curious as to how did he come up with the idea that taking a 1's complement of X-1 and &ing it with X would result into a bit-mask that shows the first non-zero bit in X.
Its my first time posting at StackOverflow, so my apologies if this question doesn't belong here.
First, notice that for any X, X & (~X) = 0 and X & X = X.
Let X = b_n b_(n-1) ... b_k ... b_1, where b_k is the first set bit.
Thus, X is essentially this:
b_n b_(n-1) ... b_(k+1) 1 0 0 ... 0
---- k ----
X-1 is:
b_n b_(n-1) ... b_(k+1) 0 1 1 ... 1
---- k ----
~(X-1) is:
~b_n ~b_(n-1) ... ~b_(k+1) 1 0 0 ... 0
---- k ----
X & ~(X-1) is:
0 0 .................... 0 1 0 0 ... 0
---- k ----
This can actually be proved using some math. Let x be a positive integer. For all x, there exists a binary representation of x. Additionally, for all x there exists a number x - 1 which also has a binary representation. For all x, the bit in the 1s place, will differ from that of x - 1. Let us define ~ as the ones' complement operator. For any binary number b, ~b turns all of the 0s in b into 1s, and all of the 1s in b into 0s. We can then say that ~(x - 1) must then have the same bit in the 1s place as x. Now, this is simple for odd numbers as all odd numbers o have a 1 in the 1s bit, and so must ~(x - 1), and we can stop there. For even numbers this gets a bit trickier. For all even numbers, e, the 1 bit must be empty. As we stated that x (and also e) must be greater than 0, we can also say that for all even numbers, e, there exists some bit such that the value of that bit is 1. We can also say that for e - 1, the 1s bit must be 1 as e - 1 must be odd. Additionally, we can say that the first bit with a value of 1 in e will be 0 in e - 1. Therefore, using the ones' complement of e - 1, that bit in e that must have been 0, will become 1 by the rules of ones' complement. Using the & operator, that will be the common 1 bit between e and ~(e - 1).
This trick is probably better known written as
X & -X
which is by definition (of -) equivalent, and using the following interpretation of - it becomes very simple to understand:
In string notation for a number that isn't zero, -(a10k) = (~a)10k
If you're unfamiliar with the notation, a10k just means "some string of bits 'a' followed by a 1 followed by k zeroes".
This interpretation just says that negation keeps all the trailing zeroes and the lowest 1, but inverts all higher bits. You can see that it does that from the definition of negation as well, for example if you look at ~X + 1, you see that the +1 cancels out the inversion for the trailing zeroes (which become ones which the +1 carries through) and the lowest set bit (which becomes 0 and then the carry through the trailing zeroes is captured by it).
Anyway, using that interpretation of negation, obviously the top part is removed, the lowest set bit is kept, and the trailing zeroes are just going to stay.
In general, the string notation is very helpful when coming up with these tricks. For example if you know that negation looks like that in string notation, this trick is really quite obvious, and so are some related tricks which you can then also find:
x & x - 1 resets the lowest set bit
x | -x keeps the trailing zeroes but sets all higher bits
x ^ -x keeps the trailing zeroes, resets the lowest set bit, but sets all higher bits
.. and more variants.

Finding the next power of 2 for a negative number

Hey guys I was doing research on calculating the next power of two and stumbled on a code that looks like this:
int x;//assume that x is already initialized with a value
--x;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
return x+1;
It works fine when I run with positive numbers but it doesn't work with negative numbers which I dont understand because I think it should not matter whether or not a number is positive or negative in terms of finding the next power of two for it. If we have the number 5 and we want to find the next power of 2. If we think about it intuitively we know that its 8 because 8 is bigger than 5 and 8 is the same as 2^3. If I try using a negative number I always keep getting 0 which I don't understand because 0 is not a power of two
The short answer is because the C++ standard states that the value resulting from the >> operator on negative values is implementation defined, whereas on positive values it has a result of dividing by a power of 2.
The term "implementation defined", over-simplistically, means that the standard permits the result to vary between implementations i.e. between compilers. Among other things, that means no guarantee that it will behave in the same way as for positive values (for which the behaviour is unambiguously specified).
The reason is that the representation of a signed int is also implementation-defined. This allows, for example, the usage of twos-complement representation - which (although other representations are used sometimes) is quite commonly used in practice.
Mathematically, a right shift in twos-complement is equivalent to division by a power of two with rounding down toward -infinity, not toward zero. For a positive value, rounding toward zero and toward -infinity have the same effect (both zero and -infinity are less than any positive integral value). For a negative value they do not (rounding is away from zero, not toward zero).
Peter gave you an interpretation of the code based on towards which value you round. Here's another, more bitwise, one:
The successive shifts in this code are "filling in" with 1s all bit positions lower than the highest one set at the start. Let us look at that more in detail:
Let x=0b0001 ???? ???? ???? be the binary representation of a 16 bit number, where ? may be 0 or 1.
x |= x >> 1; // x = 0b0001 ???? ???? ???? | 0b0000 1??? ???? ???? = 0b0001 1??? ???? ????
x |= x >> 2; // x = 0b0001 1??? ???? ???? | 0b0000 011? ???? ???? = 0b0001 111? ???? ????
x |= x >> 4; // x = 0b0001 111? ???? ???? | 0b0000 0001 111? ???? = 0b0001 1111 111? ????
x |= x >> 8; // x = 0b0001 1111 111? ???? | 0b0000 0000 0001 1111 = 0b0001 1111 1111 1111
Hence the shifts are giving you the number of the form 0b00...011...1 that is only 0s then only 1s, which means a number of the form 2^n-1. That is why you add 1 at the end, to get a power of 2. To get the correct result, you also need to remove 1 at the start, to compensate for the one you'll add at the end.
Now for negative numbers, the C++ standard does not define what the most significant bits should be when right-shifting. But whether they are 1 or 0 is irrelevant in this specific case, as long as your representation of negative numbers uses a 1 in its most significant bit position, i.e. for almost all of them.*
Because you always or x with itself, all those left-most bits (where the shifts differ) are going to be ored with 1s. At the end of the algorithm, you will return 0b11111...1 + 1 which in your case means 0 (because you use 2s complement, the result would be 1 in 1s complement and -2number of bits - 1 + 1 in sign-magnitude).
* This holds true for the main negative numbers representations, from most to least popular: that is, 2s complement, sign-magnitude, and 1s complement. An example where this is not true is excess-K representation, which is used for IEEE floating point exponents.

Binary coded decimal addition using integer

If I have two numbers in packed BCD format and want to add them, is it a good approach to add them like this: convert both numbers to integers, perform a normal integer addition, then convert the result back to BCD?
The C99 code below adds packed BCD operands with eight BCD digits stored in a uint32_t. This code can easily be extended to wider BCD operands by choosing uint64_t to process 16 BCD digits. Since this approach relies on bit-parallel processing it may not be efficient for narrow packed BCD operands.
In a packed BCD format, each BCD digit occupies one nibble (4-bit group) of an unsigned integer operand. If nibble-wise addition results in a sum > 9, we want a carry into the next higher nibble. If we use regular integer addition to add two packed BCD operands, the desired nibble carries will not occur when the nibble sum is > 9, but < 16. To remedy this, we can add an additional 6 to each nibble sum.
We can find the nibble carries as follows: The bit-wise sum of two integers x, y is x ^ y. At any bit position that has a carry-in from the next lower bit position during regular integer addition, the bits in x ^ y and x + y will differ. So we can find bits with carry-in as (x ^ y) ^ (x + y). We are interested in bits 4, 8, ..., 32 for the carry-in, which are the carry-outs from bits 3, 7, ..., 31.
There is a slight problem if there is a carry-out from bit 31 to bit 32 since the uint32_t operands only hold 32 bits. We can detect this if we find that the sum of two unsigned integers is smaller than either of the addends. The three operations handling the carry-out from bit 31 can be omitted when operating on seven-digit operands instead of eight-digit operands.
/* Add two packed BCD operands, where each uint32_t holds 8 BCD digits */
uint32_t bcd_add (uint32_t x, uint32_t y)
{
uint32_t t0, t1;
t0 = x + 0x66666666; // force nibble carry when BCD digit > 9
t1 = x ^ y; // bit-wise sum
t0 = t0 + y; // addition with nibble carry
t1 = t1 ^ t0; // (x ^ y) ^ (x + y)
t0 = t0 < y; // capture carry-out from bit 31
t1 = (t1 >> 1) | (t0 << 31); // nibble carry-outs in bits 3, 7, ..., 31
t0 = t1 & 0x88888888; // extract nibble carry-outs
t1 = t0 >> 2; // 8 - (8 >> 2) = 6
return x + y + (t0 - t1); // add 6 to any digit with nibble carry-out
}
Knuth, TAOCP Vol.4A Part 1, offers a superior solution (requiring fewer operations) in the answer to exercise 100 from section 7.1.3. This variant is particularly well suited to processor architectures with an instruction that can evaluate any logical function of three arguments, such as the LOP3 instruction of modern NVIDIA GPUs.
uint32_t median (uint32_t x, uint32_t y, uint32_t z)
{
return (x & (y | z)) | (y & z);
}
uint32_t bcd_add_knuth (uint32_t x, uint32_t y)
{
uint32_t z, u, t;
z = y + 0x66666666;
u = x + z;
t = median (~x, ~z, u) & 0x88888888;
return u - t + (t >> 2);
}

Change a bit of an integer [duplicate]

This question already has answers here:
How do I set, clear, and toggle a single bit?
(27 answers)
Closed 8 years ago.
We have an integer number
int x = 50;
in binary, it's
00110010
How can I change the fourth (4th) bit programatically?
You can set the fourth bit of a number by OR-ing it with a value that is zero everywhere except in the fourth bit. This could be done as
x |= (1u << 3);
Similarly, you can clear the fourth bit by AND-ing it with a value that is one everywhere except in the fourth bit. For example:
x &= ~(1u << 3);
Finally, you can toggle the fourth bit by XOR-ing it with a value that is zero everywhere except in the fourth bit:
x ^= (1u << 3);
To see why this works, we need to look at two things:
What is the behavior of the << operator in this context?
What is the behavior of the AND, OR, and XOR operators here?
In all three of the above code snippets, we used the << operator to generate a value. The << operator is the bitwise shift-left operator, which takes a value and then shifts all of its bits some number of steps to the left. In your case, I used
1u << 3
to take the value 1 (which has binary representation 1) and to then shift all its bits over three spots, filling in the missing values with 0. This creates the binary value 1000, which has a bit set in the fourth bit.
Now, why does
x |= (1u << 3);
set the fourth bit of the number? This has to do with how the OR operator works. The |= operator is like += or *= except for bitwise OR - it's equivalent to
x = x | (1u << 3);
So why does OR-ing x with the binary value 1000 set its fourth bit? This has to do with the way that OR is defined:
0 | 0 == 0
0 | 1 == 1
1 | 0 == 1
1 | 1 == 1
More importantly, though, we can rewrite this more compactly as
x | 0 == x
x | 1 == 1
This is an extremely important fact, because it means that OR-ing any bit with zero doesn't change the bit's value, while OR-ing any bit with 1 always sets that bit to one. This means that when we write
x |= (1u << 3);
since (1u << 3) is a value that is zero everywhere except in the fourth bit, the bitwise OR leaves all the bits of x unchanged except for the fourth bit, which is then set to one. More generally, OR-ing a number with a value that is a series of zeros and ones will preserve all the values where the bits are zero and set all of the values where the bits are one.
Now, let's look at
x &= ~(1u << 3);
This uses the bitwise complement operator ~, which takes a number and flips all of its bits. If we assume that integers are two bytes (just for simplicity), this means that the actual encoding of (1u << 3) is
0000000000001000
When we take the complement of this, we get the number
1111111111110111
Now, let's see what happens when we bitwise AND two values together. The AND operator has this interesting truth table:
0 & 0 == 0
0 & 1 == 0
1 & 0 == 0
1 & 1 == 1
Or, more compactly:
x & 0 == 0
x & 1 == x
Notice that this means that if we AND two numbers together, the resulting value will be such that all of the bits AND-ed with zero are set to zero, while all other bits are preserved. This means that if we AND with
~(1u << 3)
we are AND-ing with
1111111111110111
So by our above table, this means "keep all of the bits, except for the fourth bit, as-is, and then change the fourth bit to be zero."
More generally, if you want to clear a set of bits, create a number that is one everywhere you want to keep the bits unchanged and zero where you want to clear the bits.
Finally, let's see why
x ^= (1u << 3)
Flips the fourth bit of the number. This is because the binary XOR operator has this truth table:
0 ^ 0 == 0
0 ^ 1 == 1
1 ^ 0 == 1
1 ^ 1 == 0
Notice that
x ^ 0 == 0
x ^ 1 == ~x
Where ~x is the opposite of x; it's 0 for 1 and 1 for 0. This means that if we XOR x with the value (1u << 3), we're XOR-ing it with
0000000000001000
So this means "keep all the bits but the fourth bit set as is, but flip the fourth bit." More generally, if you want to flip some number of bits, XOR the value with a number that has zero where you want to keep the bits intact and one where you want to flip this bits.
Hope this helps!
You can always use std::bitset which makes modifying bits easy.
Or you can use bit manipulations (assuming you mean 4th bit counting at one. Don't subtract 1 if you mean counting from 0). Note that I use 1U just to guarantee that the whole operation happens on unsigned numbers:
To set: x |= (1U << (4 - 1));
To clear: x &= ~(1U << (4 - 1));
To toggle: x ^= (1U << (4 - 1));
To set the fourth bit, OR with 00001000 (binary).
To clear the fourth bit, AND with 11110111 (binary).
To toggle the fourth bit, XOR with 00001000 (binary).
Examples:
00110010 OR 00001000 = 00111010
00110010 AND 11110111 = 00110010
00110010 XOR 00001000 = 00111010
Simple, since you have, or whatever value you have,
int x = 50;
To set 4th bit (from right) programatically,
int y = x | 0x00000008;
Because, 0x prefixed before a number means it's hexadecimal form.
So, 0x0 = 0000 in binary, and 0x8=1000 in binary form.
That explains the answer.
Try one of these functions in C language to change n bit
char bitfield;
// start at 0th position
void chang_n_bit(int n, int value)
{
bitfield = (bitfield | (1 << n)) & (~( (1 << n) ^ (value << n) ));
}
void chang_n_bit(int n, int value)
{
bitfield = (bitfield | (1 << n)) & ((value << n) | ((~0) ^ (1 << n)));
}
void chang_n_bit(int n, int value)
{
if(value)
bitfield |= 1 << n;
else
bitfield &= ~0 ^ (1 << n);
}
char print_n_bit(int n)
{
return (bitfield & (1 << n)) ? 1 : 0;
}
You can use binary AND and OR to toggle the fourth bit.
To set the fourth bit on x, you would use x |= 1<<3;, 1<<3 being a left shift of 0b0001 by three bits producing 0b1000.
To clear the fourth bit on x, you would use x &= ~(1<<3);, a binary AND between 0b00110010 (x) and (effectively) 0b11110111, masking out every bit in x that is not in position four, thus clearing it.