Rotating a bitmap 90 degrees - c++

I have a one 64-bit integer, which I need to rotate 90 degrees in 8 x 8 area (preferably with straight bit-manipulation). I cannot figure out any handy algorithm for that. For instance, this:
// 0xD000000000000000 = 1101000000000000000000000000000000000000000000000000000000000000
1 1 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
after rotation becomes this:
// 0x101000100000000 = 0000000100000001000000000000000100000000000000000000000000000000
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
I wonder if there's any solutions without need to use any pre-calculated hash-table(s)?

v = (v & 0x000000000f0f0f0fUL) << 004 | (v & 0x00000000f0f0f0f0UL) << 040 |
(v & 0xf0f0f0f000000000UL) >> 004 | (v & 0x0f0f0f0f00000000UL) >> 040;
v = (v & 0x0000333300003333UL) << 002 | (v & 0x0000cccc0000ccccUL) << 020 |
(v & 0xcccc0000cccc0000UL) >> 002 | (v & 0x3333000033330000UL) >> 020;
v = (v & 0x0055005500550055UL) << 001 | (v & 0x00aa00aa00aa00aaUL) << 010 |
(v & 0xaa00aa00aa00aa00UL) >> 001 | (v & 0x5500550055005500UL) >> 010;

Without using any look-up tables, I can't see much better than treating each bit individually:
unsigned long r = 0;
for (int i = 0; i < 64; ++i) {
r += ((x >> i) & 1) << (((i % 8) * 8) + (7 - i / 8));

There is an efficient way to perform bit reversal, using O(log n) shift operations. If you interpret a 64-bit UINT as an 8x8 array of bits, then bit reversal corresponds to a rotation by 180 degrees.
Half of these shifts effectively perform a horizontal reflection; the other half perform a vertical reflection. To obtain rotations by 90 and 270 degrees, an orthogonal (i.e. vertical or horizontal) reflection could be combined with a diagonal reflection, but the latter remains an awkward bit.
typedef unsigned long long uint64;
uint64 reflect_vert (uint64 value)
value = ((value & 0xFFFFFFFF00000000ull) >> 32) | ((value & 0x00000000FFFFFFFFull) << 32);
value = ((value & 0xFFFF0000FFFF0000ull) >> 16) | ((value & 0x0000FFFF0000FFFFull) << 16);
value = ((value & 0xFF00FF00FF00FF00ull) >> 8) | ((value & 0x00FF00FF00FF00FFull) << 8);
return value;
uint64 reflect_horiz (uint64 value)
value = ((value & 0xF0F0F0F0F0F0F0F0ull) >> 4) | ((value & 0x0F0F0F0F0F0F0F0Full) << 4);
value = ((value & 0xCCCCCCCCCCCCCCCCull) >> 2) | ((value & 0x3333333333333333ull) << 2);
value = ((value & 0xAAAAAAAAAAAAAAAAull) >> 1) | ((value & 0x5555555555555555ull) << 1);
return value;
uint64 reflect_diag (uint64 value)
uint64 new_value = value & 0x8040201008040201ull; // stationary bits
new_value |= (value & 0x0100000000000000ull) >> 49;
new_value |= (value & 0x0201000000000000ull) >> 42;
new_value |= (value & 0x0402010000000000ull) >> 35;
new_value |= (value & 0x0804020100000000ull) >> 28;
new_value |= (value & 0x1008040201000000ull) >> 21;
new_value |= (value & 0x2010080402010000ull) >> 14;
new_value |= (value & 0x4020100804020100ull) >> 7;
new_value |= (value & 0x0080402010080402ull) << 7;
new_value |= (value & 0x0000804020100804ull) << 14;
new_value |= (value & 0x0000008040201008ull) << 21;
new_value |= (value & 0x0000000080402010ull) << 28;
new_value |= (value & 0x0000000000804020ull) << 35;
new_value |= (value & 0x0000000000008040ull) << 42;
new_value |= (value & 0x0000000000000080ull) << 49;
return new_value;
uint64 rotate_90 (uint64 value)
return reflect_diag (reflect_vert (value));
uint64 rotate_180 (uint64 value)
return reflect_horiz (reflect_vert (value));
uint64 rotate_270 (uint64 value)
return reflect_diag (reflect_horiz (value));
In the above code, the reflect_diag() function still requires many shifts. I suspect that it is possible to implement this function with fewer shifts, but I have not yet found a way to do that.

If you're going to do this fast, you shouldn't object to lookup tables.
I'd break the 64 bit integers into N-bit chunks, and look up the N bit chunks in a position-selected table of transpose values. If you choose N=1, you need 64 lookups in tables of two slots, which is relatively slow. If you choose N=64, you need one table and one lookup but the table is huge :-}
N=8 seems like a good compromise. You'd need 8 tables of 256 entries. The code should look something like this:
// value to transpose is in v, a long
long r; // result
r != byte0transpose[(v>>56)&0xFF];
r != byte1transpose[(v>>48)&0xFF];
r != byte2transpose[(v>>40)&0xFF];
r != byte3transpose[(v>>32)&0xFF];
r != byte4transpose[(v>>24)&0xFF];
r != byte5transpose[(v>>16)&0xFF];
r != byte6transpose[(v>>08)&0xFF];
r != byte7transpose[(v>>00)&0xFF];
Each table contains precomputed values that "spread" the contiguous bits in the input across the 64 bit transposed result. Ideally you'd compute this value offline and
simply initialize the table entries.
If you don't care about speed, then the standard array transpose
algorithms will work; just index the 64 bit as if it were a bit array.
I have a sneaking suspicion that one might be able to compute the transposition using
bit twiddling type hacks.

To expand on my comment to Ira's answer, you can use:
#define ROT_BIT_0(X) X, (X)|0x1UL
#define ROT_BIT_1(X) ROT_BIT_0(X), ROT_BIT_0((X) | 0x100UL)
#define ROT_BIT_2(X) ROT_BIT_1(X), ROT_BIT_1((X) | 0x10000UL)
#define ROT_BIT_3(X) ROT_BIT_2(X), ROT_BIT_2((X) | 0x1000000UL)
#define ROT_BIT_4(X) ROT_BIT_3(X), ROT_BIT_3((X) | 0x100000000UL)
#define ROT_BIT_5(X) ROT_BIT_4(X), ROT_BIT_4((X) | 0x10000000000UL)
#define ROT_BIT_6(X) ROT_BIT_5(X), ROT_BIT_5((X) | 0x1000000000000UL)
#define ROT_BIT_7(X) ROT_BIT_6(X), ROT_BIT_6((X) | 0x100000000000000UL)
static unsigned long rot90[256] = { ROT_BIT_7(0) };
unsigned long rotate90(unsigned long v)
unsigned long r = 0;
r |= rot90[(v>>56) & 0xff];
r |= rot90[(v>>48) & 0xff] << 1;
r |= rot90[(v>>40) & 0xff] << 2;
r |= rot90[(v>>32) & 0xff] << 3;
r |= rot90[(v>>24) & 0xff] << 4;
r |= rot90[(v>>16) & 0xff] << 5;
r |= rot90[(v>>8) & 0xff] << 6;
r |= rot90[v & 0xff] << 7;
return r;
This depends on 'unsigned long' being 64 bits, of course, and does the rotate assuming
the bits are in row-major order with the msb being the upper right, which seems to be the case in this question....

This is quite easy using IA32 SIMD, there's a handy opcode to extract every eighth bit from a 64 bit value (this was written using DevStudio 2005):
source [8] = {0, 0, 0, 0, 0, 0, 0, 0xd0},
dest [8];
mov ch,3
movq xmm0,qword ptr [source]
lea edi,dest
mov cl,8
pmovmskb eax,xmm0
psllq xmm0,1
dec cl
jnz Rotate1
movq xmm0,qword ptr [dest]
dec ch
jnz Rotate2
It rotates the data three times (-270 degrees) since +90 is a bit trickier (needs a bit more thought)

If you look at this as a 2 dimensional array then you have the solution no?
Just make the rows the new columns.
First row is the last column, 2nd is the one before last and so on.
Visually at least, it looks like your solution.

probably something like that
for(int i = 0; i < 8; i++)
for(int j = 0; j < 8; j++)
new_image[j*8+8-i] = image[i*8+j];

If an if-powered loop is acceptable, the formula for bits is simple enough:
8>>Column - Row - 1
Column and Row are 0-indexed.
This gives you this mapping:
7 15 23 31 39 47 55 63
6 14 22 ...
5 ...
4 ...
3 ...
2 ...
1 ...
0 8 16 24 32 40 48 54


Programing the encoding function based on a decode RLE (bitwise) function

Currently i only have the decoding procedure, EncodedRle is a byte array with the encoded image bytes, pixel is the pixel position to draw on decoded image, col is the decoded pixel color either 0 or 255, matSpan is the image pixel span (byte array)
Decoding function:
int pixel = 0;
foreach (var run in EncodedRle)
byte col = (byte) ((run & 0x01) * 255);
var numPixelsInRun =
(((run & 128) > 0 ? 1 : 0) |
((run & 64) > 0 ? 2 : 0) |
((run & 32) > 0 ? 4 : 0) |
((run & 16) > 0 ? 8 : 0) |
((run & 8) > 0 ? 16 : 0) |
((run & 4) > 0 ? 32 : 0) |
((run & 2) > 0 ? 64 : 0)) + 1;
for (; numPixelsInRun > 0; numPixelsInRun--)
matSpan[pixel++] = col;
How can i write the Encoding function? I understand that i need to limit strides, so sum up equal pixels in same line up found a diferent value AND limiting to 128 max per byte? But i'm not good at bitwise and can't find a way to do it properly. And tips on that?

XOR programming puzzle advice

Given a long int x, count the number of values of a that satisfy the following conditions:
a XOR x > x
0 < a < x
where a and x are long integers and XOR is the bitwise XOR operator
How would you go about completing this problem?
I should also mentioned that the input x can be as large as 10^10
I have managed to get a brute force solution by iterating over 0 to x checking the conditions and incrementing a count value.. however this is not an optimal solution...
This is the brute force that I tried. It works but is extremely slow for large values of x.
for(int i =0; i < x; i++)
if((0 < i && i < x) && (i ^ x) > x)
long long NumberOfA(long long x)
long long t = x <<1;
while(t^(t&-t)) t ^= (t&-t);
return t-++x;
long long x = 10000000000;
printf("%lld ==> %lld\n", 10LL, NumberOfA(10LL) );
printf("%lld ==> %lld\n", x, NumberOfA(x) );
10 ==> 5
10000000000 ==> 7179869183
Link to IDEOne Code
Trying to explain the logic (using example 10, or 1010b)
Shift x to the left 1. (Value 20 or 10100b)
Turn off all low bits, leaving just the high bit (Value 16 or 10000b)
Subtract x+1 (16 - 11 == 5)
Attempting to explain
(although its not easy)
Your rule is that a ^ x must be bigger than x, but that you cannot add extra bits to a or x.
(If you start with a 4-bit value, you can only use 4-bits)
The biggest possible value for a number in N-bits is 2^n -1.
(eg. 4-bit number, 2^4-1 == 15)
Lets call this number B.
Between your value x and B (inclusive), there are B-x possible values.
(back to my example, 10. Between 15 and 10, there are 5 possible values: 11, 12, 13, 14, 15)
In my code, t is x << 1, then with all the low bits turned off.
(10 << 1 is 20; turn off all the low bits to get 16)
Then 16 - 1 is B, and B - x is your answer:
(t - 1 - x, is the same as t - ++x, is the answer)
One way to look at this is to consider each bit in x.
If it's 1, then flipping it will yield a smaller number.
If it's 0, then flipping it will yield a larger number, and we should count it - and also all the combinations of bits to the right. That conveniently adds up to the mask value.
long f(long const x)
// only positive x can have non-zero result
if (x <= 0) return 0;
long count = 0;
// Iterate from LSB to MSB
for (long mask = 1; mask < x; mask <<= 1)
count += x & mask
? 0
: mask;
return count;
We might suspect a pattern here - it looks like we're just copying x and flipping its bits.
Let's confirm, using a minimal test program:
#include <cstdlib>
#include <iostream>
int main(int, char **argv)
while (*++argv)
std::cout << *argv << " -> " << f(std::atol(*argv)) << std::endl;
0 -> 0
1 -> 0
2 -> 1
3 -> 0
4 -> 3
5 -> 2
6 -> 1
7 -> 0
8 -> 7
9 -> 6
10 -> 5
11 -> 4
12 -> 3
13 -> 2
14 -> 1
15 -> 0
So all we have to do is 'smear' the value so that all the zero bits after the most-significant 1 are set, then xor with that:
long f(long const x)
if (x <= 0) return 0;
long mask = x;
while (mask & (mask+1))
mask |= mask+1;
return mask ^ x;
This is much faster, and still O(log n).

Converting 1-bit bmp file to array in C/C++ [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm looking to turn a 1-bit bmp file of variable height/width into a simple two-dimensional array with values of either 0 or 1. I don't have any experience with image editing in code and most libraries that I've found involve higher bit-depth than what I need. Any help regarding this would be great.
Here's the code to read a monochrome .bmp file
(See dmb's answer below for a small fix for odd-sized .bmps)
#include <stdio.h>
#include <string.h>
#include <malloc.h>
unsigned char *read_bmp(char *fname,int* _w, int* _h)
unsigned char head[54];
FILE *f = fopen(fname,"rb");
// BMP header is 54 bytes
fread(head, 1, 54, f);
int w = head[18] + ( ((int)head[19]) << 8) + ( ((int)head[20]) << 16) + ( ((int)head[21]) << 24);
int h = head[22] + ( ((int)head[23]) << 8) + ( ((int)head[24]) << 16) + ( ((int)head[25]) << 24);
// lines are aligned on 4-byte boundary
int lineSize = (w / 8 + (w / 8) % 4);
int fileSize = lineSize * h;
unsigned char *img = malloc(w * h), *data = malloc(fileSize);
// skip the header
// skip palette - two rgb quads, 8 bytes
fseek(f, 8, SEEK_CUR);
// read data
// decode bits
int i, j, k, rev_j;
for(j = 0, rev_j = h - 1; j < h ; j++, rev_j--) {
for(i = 0 ; i < w / 8; i++) {
int fpos = j * lineSize + i, pos = rev_j * w + i * 8;
for(k = 0 ; k < 8 ; k++)
img[pos + (7 - k)] = (data[fpos] >> k ) & 1;
*_w = w; *_h = h;
return img;
int main()
int w, h, i, j;
unsigned char* img = read_bmp("test1.bmp", &w, &h);
for(j = 0 ; j < h ; j++)
for(i = 0 ; i < w ; i++)
printf("%c ", img[j * w + i] ? '0' : '1' );
return 0;
It is plain C, so no pointer casting - beware while using it in C++.
The biggest problem is that the lines in .bmp files are 4-byte aligned which matters a lot with single-bit images. So we calculate the line size as "width / 8 + (width / 8) % 4". Each byte contains 8 pixels, not one, so we use the k-based loop.
I hope the other code is obvious - much has been told about .bmp header and pallete data (8 bytes which we skip).
Expected output:
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 1 1 1 1 0 0 1 1 0 0
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
I tried the solution of Viktor Lapyov on a 20x20 test image:
But with his code, I get this output (slightly reformatted but you can see the problem):
The last 4 pixels are not read. The problem is here. (The last partial byte in a row is ignored.)
// decode bits
int i, j, k, rev_j;
for(j = 0, rev_j = h - 1; j < h ; j++, rev_j--) {
for(i = 0 ; i < w / 8; i++) {
int fpos = j * lineSize + i, pos = rev_j * w + i * 8;
for(k = 0 ; k < 8 ; k++)
img[pos + (7 - k)] = (data[fpos] >> k ) & 1;
I rewrote the inner loop like this:
// decode bits
int i, byte_ctr, j, rev_j;
for(j = 0, rev_j = h - 1; j < h ; j++, rev_j--) {
for( i = 0; i < w; i++) {
byte_ctr = i / 8;
unsigned char data_byte = data[j * lineSize + byte_ctr];
int pos = rev_j * w + i;
unsigned char mask = 0x80 >> i % 8;
img[pos] = (data_byte & mask ) ? 1 : 0;
and all is well:
The following c code works with monochrome bitmaps of any size. I'll assume you've got your bitmap in a buffer with heights and width initialized from file. So
// allocate mem for global buffer
if (!(img = malloc(h * w)) )
int i = 0, k, j, scanline;
// calc the scanline. Monochrome images are
// padded with 0 at every line end. This
// makes them divisible by 4.
scanline = ( w + (w % 8) ) >> 3;
// account for the paddings
if (scanline % 4)
scanline += (4 - scanline % 4);
// loop and set the img values
for (i = 0, k = h - 1; i < h; i++)
for (j = 0; j < w; j++) {
img[j+i*w] = (buffer[(j>>3)+k*scanline])
& (0x80 >> (j % 8));
Hope this help's. To convert it to 2D is now a trivial matter: But if u get lost here is the math to convert 1D array to 2D suppose r & c are row and column and w is the width then:
. c + r * w = r, c
If you got further remarks hit me back, am out!!!
Lets think of a1x7 monochrome bitmap i.e. This is a bitmap of a straight line with 7 pixels wide. To store this image on a Windows OS; since 7 is not evenly divisible by 4 it's going to pad in it an extra 3 bytes.
So the biSizeImage of the BITMAPINFOHEADER structure will show a total of 4 bytes. Nonetheless the biHeight and biWidth members will correctly state the true bitmap dimensions.
The above code will fail because 7 / 8 = 0 (by rounding off as with all c compilers do). Hence loop "i" will not execute so will "k".
That means the vector "img" now contains garbage values that do not correspond to the pixels contained in " data" i.e. the result is incorrect.
And by inductive reasoning if it does not satisfy the base case then chances are it wont do much good for general cases.

Direct formula for summing XOR

I have to XOR numbers from 1 to N, does there exist a direct formula for it ?
For example if N = 6 then 1^2^3^4^5^6 = 7 I want to do it without using any loop so I need an O(1) formula (if any)
Your formula is N & (N % 2 ? 0 : ~0) | ( ((N & 2)>>1) ^ (N & 1) ):
int main()
int S = 0;
for (int N = 0; N < 50; ++N) {
S = (S^N);
int check = N & (N % 2 ? 0 : ~0) | ( ((N & 2)>>1) ^ (N & 1) );
std::cout << "N = " << N << ": " << S << ", " << check << std::endl;
if (check != S) throw;
return 0;
N = 0: 0, 0 N = 1: 1, 1 N = 2: 3, 3
N = 3: 0, 0 N = 4: 4, 4 N = 5: 1, 1
N = 6: 7, 7 N = 7: 0, 0 N = 8: 8, 8
N = 9: 1, 1 N = 10: 11, 11 N = 11: 0, 0
N = 12: 12, 12 N = 13: 1, 1 N = 14: 15, 15
N = 15: 0, 0 N = 16: 16, 16 N = 17: 1, 1
N = 18: 19, 19 N = 19: 0, 0 N = 20: 20, 20
N = 21: 1, 1 N = 22: 23, 23 N = 23: 0, 0
N = 24: 24, 24 N = 25: 1, 1 N = 26: 27, 27
N = 27: 0, 0 N = 28: 28, 28 N = 29: 1, 1
N = 30: 31, 31 N = 31: 0, 0 N = 32: 32, 32
N = 33: 1, 1 N = 34: 35, 35 N = 35: 0, 0
N = 36: 36, 36 N = 37: 1, 1 N = 38: 39, 39
N = 39: 0, 0 N = 40: 40, 40 N = 41: 1, 1
N = 42: 43, 43 N = 43: 0, 0 N = 44: 44, 44
N = 45: 1, 1 N = 46: 47, 47 N = 47: 0, 0
N = 48: 48, 48 N = 49: 1, 1 N = 50: 51, 51
Low bit is XOR between low bit and next bit.
For each bit except low bit the following holds:
if N is odd then that bit is 0.
if N is even then that bit is equal to corresponded bit of N.
Thus for the case of odd N the result is always 0 or 1.
GSerg Has posted a formula without loops, but deleted it for some reason (undeleted now). The formula is perfectly valid (apart from a little mistake). Here's the C++-like version.
if n % 2 == 1 {
result = (n % 4 == 1) ? 1 : 0;
} else {
result = (n % 4 == 0) ? n : n + 1;
One can prove it by induction, checking all reminders of division by 4. Although, no idea how you can come up with it without generating output and seeing regularity.
Please explain your approach a bit more.
Since each bit is independent in xor operation, you can calculate them separately.
Also, if you look at k-th bit of number 0..n, it'll form a pattern. E.g., numbers from 0 to 7 in binary form.
You see that for k-th bit (k starts from 0), there're 2^k zeroes, 2^k ones, then 2^k zeroes again, etc.
Therefore, you can for each bit calculate how many ones there are without actually going through all numbers from 1 to n.
E.g., for k = 2, there're repeated blocks of 2^2 == 4 zeroes and ones. Then,
int ones = (n / 8) * 4; // full blocks
if (n % 8 >= 4) { // consider incomplete blocks in the end
ones += n % 8 - 3;
For odd N, the result is either 1 or 0 (cyclic, 0 for N=3, 1 for N=5, 0 for N=7 etc.)
For even N, the result is either N or N+1 (cyclic, N+1 for N=2, N for N=4, N+1 for N=6, N for N=8 etc).
if (N mod 2) = 0
if (N mod 4) = 0 then r = N else r = N+1
if (N mod 4) = 1 then r = 1 else r = 0
Lets say the function that XORs all the values from 1 to N be XOR(N), then
XOR(1) = 000 1 = 0 1 ( The 0 is the dec of bin 000)
XOR(2) = 001 1 = 1 1
XOR(3) = 000 0 = 0 0
XOR(4) = 010 0 = 2 0
XOR(5) = 000 1 = 0 1
XOR(6) = 011 1 = 3 1
XOR(7) = 000 0 = 0 0
XOR(8) = 100 0 = 4 0
XOR(9) = 000 1 = 0 1
XOR(10)= 101 1 = 5 1
XOR(11)= 000 0 = 0 0
XOR(12)= 110 0 = 6 0
I hope you can see the pattern. It should be similar for other numbers too.
Try this:
the LSB gets toggled each time the N is odd, so we can say that
rez & 1 == (N & 1) ^ ((N >> 1) & 1)
The same pattern can be observed for the rest of the bits.
Each time the bits B and B+1 (starting from LSB) in N will be different, bit B in the result should be set.
So, the final result will be (including N): rez = N ^ (N >> 1)
EDIT: sorry, it was wrong. the correct answer:
for odd N: rez = (N ^ (N >> 1)) & 1
for even N: rez = (N & ~1) | ((N ^ (N >> 1)) & 1)
Great answer by Alexey Malistov! A variation of his formula: n & 1 ? (n & 2) >> 1 ^ 1 : n | (n & 2) >> 1 or equivalently n & 1 ? !(n & 2) : n | (n & 2) >> 1.
this method avoids using conditionals F(N)=(N&((N&1)-1))|((N&1)^((N&3)>>1)
F(N)= (N&(b0-1)) | (b0^b1)
If you look at the XOR of the first few numbers you get:
N | F(N)
0001 | 0001
0010 | 0011
0011 | 0000
0100 | 0100
0101 | 0001
0110 | 0111
0111 | 0000
1000 | 1000
1001 | 0001
Hopefully you notice the pattern:
if N mod 4 = 1 than F(N)=1
if N mod 4 = 3 than F(N)=0
if N mod 4 = 0 than F(N)=N
if N mod 4 = 2 than F(N)=N but with the first bit as 1 so N|1
the tricky part is getting this in one statement without conditionals ill explain the logic I used to do this.
take the first 2 significant bits of N call them:
b0 and b1 and these are obtained with:
b0 = N&1
b1 = N&3>>1
Notice that if b0 == 1 we have to 0 all of the bits, but if it isn't all of the bits except for the first bit stay the same. We can do this behavior by:
N & (b0-1) : this works because of 2's complement, -1 is equal to a number with all bits set to 1 and 1-1=0 so when b0=1 this results in F(N)=0.. so that is the first part of the function:
F(N)= (N&(b0-1))...
now this will work for for N mod 4 == 3 and 0, for the other 2 cases lets look solely at b1, b0 and F(N)0:
1| 1| 0
0| 0| 0
1| 0| 1
0| 1| 1
Ok hopefully this truth table looks familiar! it is b0 XOR b1 (b1^b0). so now that we know how to get the last bit let put that on our function:
and there you go, a function without using conditionals. also this is useful if you want to compute the XOR from positive numbers a to b. you can do:
F(a) XOR F(b).
With minimum change to the original logic:
int xor = 0;
for (int i = 1; i <= N; i++) {
xor ^= i;
We can have:
int xor = 0;
for (int i = N - (N % 4); i <= N; i++) {
xor ^= i;
It does have a loop but it would take a constant time to execute. The number of times we iterate through the for-loop would vary between 1 and 4.
How about this?
This works fine without any issues for any n;
unsigned int xorn(unsigned int n)
if (n % 4 == 0)
return n;
else if(n % 4 == 1)
return 1;
else if(n % 4 == 2)
return n+1;
return 0;
Take a look at this. This will solve your problem.
To calculate the XOR sum from 1 to N:
int ans,mod=N%4;
if(mod==0) ans=N;
else if(mod==1) ans=1;
else if(mod==2) ans=N+1;
else if(mod==3) ans=0;
If still someone needs it here simple python solution:
def XorSum(L):
res = 0
if (L-1)%4 == 0:
res = L-1
elif (L-1)%4 == 1:
res = 1
elif (L-1)%4 == 2:
res = (L-1)^1
else: #3
res= 0
return res

Mix of two bit sequences

Is there any clever way to mix two bit sequences in such way that bits from first sequence will be on odd places, and bits from second sequence will be on even places.
Both sequences are no longer than 16b so output will fit into 32bit integer.
First sequence : 1 0 0 1 0 0
Second sequence : 1 1 1 0 1 1
Output : 1 1 0 1 0 1 1 0 0 1 0 1
I thought about making integer array of size 2^16 and then the output would be:
arr[first] << 1 | arr[second]
Have a look at This page lists the obvious (for loop) and 3 optimized algorithms. Neither one is particularly simple but without testing I'd guess they are considerably faster than a loop.
in C#:
public Int32 Mix(Int16 b1, Int16 b2)
Int32 res = 0;
for (int i=0; i<16; i++)
res |= ((b2 >> i) & 1) << 2*i;
res |= ((b1 >> i) & 1) << 2*i + 1;
return res;