Bitwise Xor optimizing and/or/not use - bit-manipulation

I was looking at the code for two different MD5 implementations and I seen F (a bitwise ternary operation) implemented two different ways:
in C:
#define f1(x, y, z) (((x) & (y)) | (~(x) & (z)))
#define f2(x, y, z) ((z) ^ ((x) & ((z) ^ (y))))
In Pseudo:
f1 = (x And y) Or ((Not x) And z)
f2 = z Xor (x And (z Xor y))
What I can't wrap my brain around is how someone came up with f2 in the first place. I could have come up with f1 on my own because its the logical code to write when you hear (if x then y else z) - but I couldn't have come up with f2.
To be clear - I understand what f2 is doing and how Xor works - I just can't understand how someone can go from f1 to f2 .. how did they know using xor in that way was equivalent?
I can't just use something because it works - I want to understand why it works.
Can someone "explain the math", so to speak?
Are there specific rules where you can use Xor to optimize And/Or/Not?

Related

A bitwise shortcut for calculating the signed result of `(x - y) / z`, given unsigned operands

I'm looking for a neat way (most likely, a "bitwise shortcut") for calculating the signed value of the expression (x - y) / z, given unsigned operands x, y and z.
Here is a "kinda real kinda pseudo" code illustrating what I am currently doing (please don't mind the actual syntax being "100% perfect C or C++"):
int64 func(uint64 x, uint64 y, uint64 z)
{
if (x >= y) {
uint64 result = (x - y) / z;
if (int64(result) >= 0)
return int64(result);
}
else {
uint64 result = (y - x) / z;
if (int64(result) >= 0)
return -int64(result);
}
throwSomeError();
}
Please assume that I don't have a larger type at hand.
I'd be happy to read any idea of how to make this simpler/shorter/neater.
There is a shortcut, by using a bitwise trick for conditional-negation twice (once for the absolute difference, and then again to restore the sign).
I'll use some similar non-perfect C-ish syntax I guess, to match the question.
First get a mask that has all bits set iff x < y:
uint64 m = -uint64(x < y);
(x - y) and -(y - x) are actually the same, even in unsigned arithmetic, and conditional negation can be done by using the definition of two's complement: -a = ~(a - 1) = (a + (-1) ^ -1). (a + 0) ^ 0 is of course equal to a again, so when m is -1, (a + m) ^ m = -a and when m is zero, it is a. So it's a conditional negation.
uint64 absdiff = (x - y + m) ^ m;
Then divide as usual, and restore the sign by doing another conditional negation:
return int64((absdiff / z + m) ^ m);

Making recursive function in OCaml

I want to make a recursive function that sums the integers between two values. I'm doing:
let rec sum_between x y =
if x>y then sum_between y x else
if x=y then x else x + sum_between x+1 y ;;
But I get the error: This expression has type int -> int
but an expression was expected of type int
What am I doing wrong?
Function application has high precedence in OCaml. You need to parenthesize an expression when it's an argument to a function.
Your code
sum_between x+1 y
is parsed like this:
(sum_between x) + (1 y)
You need parentheses:
sum_between (x + 1) y
(Same answer as Edgar Aroutiounian but more helpful detail I hope.)

Understanding XOR (Exclusive Or) in code [duplicate]

Can someone explain to me how XOR swapping of two variables with no temp variable works?
void xorSwap (int *x, int *y)
{
if (x != y) {
*x ^= *y;
*y ^= *x;
*x ^= *y;
}
}
I understand WHAT it does, but can someone walk me through the logic of how it works?
You can see how it works by doing the substitution:
x1 = x0 xor y0
y2 = x1 xor y0
x2 = x1 xor y2
Substituting,
x1 = x0 xor y0
y2 = (x0 xor y0) xor y0
x2 = (x0 xor y0) xor ((x0 xor y0) xor y0)
Because xor is fully associative and commutative:
y2 = x0 xor (y0 xor y0)
x2 = (x0 xor x0) xor (y0 xor y0) xor y0
Since x xor x == 0 for any x,
y2 = x0 xor 0
x2 = 0 xor 0 xor y0
And since x xor 0 == x for any x,
y2 = x0
x2 = y0
And the swap is done.
Other people have explained it, now I want to explain why it was a good idea, but now isn't.
Back in the day when we had simple single cycle or multi-cycle CPUs, it was cheaper to use this trick to avoid costly memory dereferences or spilling registers to the stack. However, we now have CPUs with massive pipelines instead. The P4's pipeline ranged from having 20 to 31 (or so) stages in their pipelines, where any dependence between reading and writing to a register could cause the whole thing to stall. The xor swap has some very heavy dependencies between A and B that don't actually matter at all but stall the pipeline in practice. A stalled pipeline causes a slow code path, and if this swap's in your inner loop, you're going to be moving very slowly.
In general practice, your compiler can figure out what you really want to do when you do a swap with a temp variable and can compile it to a single XCHG instruction. Using the xor swap makes it much harder for the compiler to guess your intent and therefore much less likely to optimize it correctly. Not to mention code maintenance, etc.
I like to think of it graphically rather than numerically.
Let's say you start with x = 11 and y = 5
In binary (and I'm going to use a hypothetical 4 bit machine), here's x and y
x: |1|0|1|1| -> 8 + 2 + 1
y: |0|1|0|1| -> 4 + 1
Now to me, XOR is an invert operation and doing it twice is a mirror:
x^y: |1|1|1|0|
(x^y)^y: |1|0|1|1| <- ooh! Check it out - x came back
(x^y)^x: |0|1|0|1| <- ooh! y came back too!
Here's one that should be slightly easier to grok:
int x = 10, y = 7;
y = x + y; //x = 10, y = 17
x = y - x; //x = 7, y = 17
y = y - x; //x = 7, y = 10
Now, one can understand the XOR trick a little more easily by understanding that ^ can be thought of as + or -. Just as:
x + y - ((x + y) - x) == x
, so:
x ^ y ^ ((x ^ y) ^ x) == x
The reason WHY it works is because XOR doesn't lose information. You could do the same thing with ordinary addition and subtraction if you could ignore overflow. For example, if the variable pair A,B originally contains the values 1,2, you could swap them like this:
// A,B = 1,2
A = A+B // 3,2
B = A-B // 3,1
A = A-B // 2,1
BTW there's an old trick for encoding a 2-way linked list in a single "pointer".
Suppose you have a list of memory blocks at addresses A, B, and C. The first word in each block is , respectively:
// first word of each block is sum of addresses of prior and next block
0 + &B // first word of block A
&A + &C // first word of block B
&B + 0 // first word of block C
If you have access to block A, it gives you the address of B. To get to C, you take the "pointer" in B and subtract A, and so on. It works just as well backwards. To run along the list, you need to keep pointers to two consecutive blocks. Of course you would use XOR in place of addition/subtration, so you wouldn't have to worry about overflow.
You could extend this to a "linked web" if you wanted to have some fun.
Most people would swap two variables x and y using a temporary variable, like this:
tmp = x
x = y
y = tmp
Here’s a neat programming trick to swap two values without needing a temp:
x = x xor y
y = x xor y
x = x xor y
More details in Swap two variables using XOR
On line 1 we combine x and y (using XOR) to get this “hybrid” and we store it back in x. XOR is a great way to save information, because you can remove it by doing an XOR again.
On line 2. We XOR the hybrid with y, which cancels out all the y information, leaving us only with x. We save this result back into y, so now they have swapped.
On the last line, x still has the hybrid value. We XOR it yet again with y (now with x’s original value) to remove all traces of x out of the hybrid. This leaves us with y, and the swap is complete!
The computer actually has an implicit “temp” variable that stores intermediate results before writing them back to a register. For example, if you add 3 to a register (in machine-language pseudocode):
ADD 3 A // add 3 to register A
The ALU (Arithmetic Logic Unit) is actually what executes the instruction 3+A. It takes the inputs (3,A) and creates a result (3 + A), which the CPU then stores back into A’s original register. So, we used the ALU as temporary scratch space before we had the final answer.
We take the ALU’s implicit temporary data for granted, but it’s always there. In a similar way, the ALU can return the intermediate result of the XOR in the case of x = x xor y, at which point the CPU stores it into x’s original register.
Because we aren’t used to thinking about the poor, neglected ALU, the XOR swap seems magical because it doesn’t have an explicit temporary variable. Some machines have a 1-step exchange XCHG instruction to swap two registers.
#VonC has it right, it's a neat mathematical trick. Imagine 4 bit words and see if this helps.
word1 ^= word2;
word2 ^= word1;
word1 ^= word2;
word1 word2
0101 1111
after 1st xor
1010 1111
after 2nd xor
1010 0101
after 3rd xor
1111 0101
Basically there are 3 steps in the XOR approach:
a’ = a XOR b (1)
b’ = a’ XOR b (2)
a” = a’ XOR b’ (3)
To understand why this works first note that:
XOR will produce a 1 only if exactly one of it’s operands is 1, and the other is zero;
XOR is commutative so a XOR b = b XOR a;
XOR is associative so (a XOR b) XOR c = a XOR (b XOR c); and
a XOR a = 0 (this should be obvious from the definition in 1 above)
After Step (1), the binary representation of a will have 1-bits only in the bit positions where a and b have opposing bits. That is either (ak=1, bk=0) or (ak=0, bk=1). Now when we do the substitution in Step (2) we get:
b’ = (a XOR b) XOR b
= a XOR (b XOR b) because XOR is associative
= a XOR 0 because of [4] above
= a due to definition of XOR (see 1 above)
Now we can substitute into Step (3):
a” = (a XOR b) XOR a
= (b XOR a) XOR a because XOR is commutative
= b XOR (a XOR a) because XOR is associative
= b XOR 0 because of [4] above
= b due to definition of XOR (see 1 above)
More detailed information here:
Necessary and Sufficient
As a side note I reinvented this wheel independently several years ago in the form of swapping integers by doing:
a = a + b
b = a - b ( = a + b - b once expanded)
a = a - b ( = a + b - a once expanded).
(This is mentioned above in a difficult to read way),
The exact same reasoning applies to xor swaps: a ^ b ^ b = a and a ^ b ^ a = a. Since xor is commutative, x ^ x = 0 and x ^ 0 = x, this is quite easy to see since
= a ^ b ^ b
= a ^ 0
= a
and
= a ^ b ^ a
= a ^ a ^ b
= 0 ^ b
= b
Hope this helps. This explanation has already been given... but not very clearly imo.
I just want to add a mathematical explanation to make the answer more complete. In group theory, XOR is an abelian group, also called a commutative group. It means it satisfies five requirements: Closure, Associativity, Identity element, Inverse element, Commutativity.
XOR swap formula:
a = a XOR b
b = a XOR b
a = a XOR b
Expand the formula, substitute a, b with previous formula:
a = a XOR b
b = a XOR b = (a XOR b) XOR b
a = a XOR b = (a XOR b) XOR (a XOR b) XOR b
Commutativity means "a XOR b" equal to "b XOR a":
a = a XOR b
b = a XOR b = (a XOR b) XOR b
a = a XOR b = (a XOR b) XOR (a XOR b) XOR b
= (b XOR a) XOR (a XOR b) XOR b
Associativity means "(a XOR b) XOR c" equal to "a XOR (b XOR c)":
a = a XOR b
b = a XOR b = (a XOR b) XOR b
= a XOR (b XOR b)
a = a XOR b = (a XOR b) XOR (a XOR b) XOR b
= (b XOR a) XOR (a XOR b) XOR b
= b XOR (a XOR a) XOR (b XOR b)
The inverse element in XOR is itself, it means that any value XOR with itself gives zero:
a = a XOR b
b = a XOR b = (a XOR b) XOR b
= a XOR (b XOR b)
= a XOR 0
a = a XOR b = (a XOR b) XOR (a XOR b) XOR b
= (b XOR a) XOR (a XOR b) XOR b
= b XOR (a XOR a) XOR (b XOR b)
= b XOR 0 XOR 0
The identity element in XOR is zero, it means that any value XOR with zero is left unchanged:
a = a XOR b
b = a XOR b = (a XOR b) XOR b
= a XOR (b XOR b)
= a XOR 0
= a
a = a XOR b = (a XOR b) XOR (a XOR b) XOR b
= (b XOR a) XOR (a XOR b) XOR b
= b XOR (a XOR a) XOR (b XOR b)
= b XOR 0 XOR 0
= b XOR 0
= b
And you can get further information in group theory.
Others have posted explanations but I think it would be better understood if its accompanied with a good example.
XOR Truth Table
If we consider the above truth table and take the values A = 1100 and B = 0101 we are able to swap the values as such:
A = 1100
B = 0101
A ^= B; => A = 1100 XOR 0101
(A = 1001)
B ^= A; => B = 0101 XOR 1001
(B = 1100)
A ^= B; => A = 1001 XOR 1100
(A = 0101)
A = 0101
B = 1100

Find smallest integer greater or equal than x (positive integer) multiple of z (positive integer, probably power of 2)

Probably very easy question, yet I came out with this implementation that looks far too complicated...
unsigned int x;
unsigned int z;
unsigned int makeXMultipleOfZ(const unsigned x, const unsigned z) {
return x + (z - x % z) % z;
//or
//return x + (z - (x + 1) % z - 1); //This generates shorter assembly,
//6 against 8 instructions
}
I would like to avoid if-statements
If this can help we can safely say that z will be a power of 2
In my case z=4 (I know I could replace the modulo operation with a & bit operator), and I was wondering if could come with an implementation that involves less steps.
If z is a power of two, the modulo operation can be reduced to this bitwise operation:
return (x + z - 1) & ~(z - 1);
This logic is very common for data structure boundary alignment, for example. More info here: https://en.wikipedia.org/wiki/Data_structure_alignment
If z is a power of two and the integers are unsigned, the following will work:
x + (z - 1) & ~(z - 1)
I cannot think of a solution using bit-twiddling if z is an arbitrary number.

Strange C arithmetical behavior

I have a problem with this piece of C code:
int y = 0, h = 640, ih = 640;
h = y + h - max(0, (y + h) - ih);
It should set h to 640, but instead it is setted to 0!
You can see it running here: http://ideone.com/zBZSsr
Any idea about this strange behavior? Am I doing something wrong?
The max macro in the example you linked needs an extra pair of parentheses.
You have:
#define max(x, y) ((x) > (y)) ? (x) : (y)
In your example, this expands to:
h = y + h - ((0) > ((y+h)-ih)) ? (0) : ((y+h)-ih);
I believe the operator precedence means that everything on the left is subsumed into the condition expression for the ternary operator. There's some implicit conversion from bool to int and back again, resulting in an always-true condition, so you then get the true branch, which is simply 0.
Your macro should be:
#define max(x, y) (((x) > (y)) ? (x) : (y))
your code gets preprocessed to
h = y + h - ((0) > ((y + h) - ih)) ? (0) : ((y + h) - ih);
the problem is that + and - has priority over ?: operator.
#define max(x, y) ((x) > ((y)) ? (x) : (y))
add () around the defines and your computation will be correct.