OpenMP integral image slower then sequential

OpenMP integral image slower then sequential - c++

I have implemented Summed Area Table (or Integral image) in C++ using OpenMP.
The problem is that the Sequential code is always faster then the Parallel code even changing the number of threads and image sizes.
For example I tried images from (100x100) to (10000x10000) and threads from 1 to 64, but none of the combination is ever faster.
I also tried this code in different machines like:
Mac OSX 1,4 GHz Intel Core i5 dual core
Mac OSX 2,3 GHz Intel Core i7 quad core
Ubuntu 16.04 Intel Xeon E5-2620 2,4 GHz 12 cores
The time has been measured with OpenMP function: omp_get_wtime().
For compiling I use: g++ -fopenmp -Wall main.cpp.
Here is the parallel code:
void transpose(unsigned long *src, unsigned long *dst, const int N, const int M) {
#pragma omp parallel for
for(int n = 0; n<N*M; n++) {
int i = n/N;
int j = n%N;
dst[n] = src[M*j + i];
}
}
unsigned long * integralImageMP(uint8_t*x, int n, int m){
unsigned long * out = new unsigned long[n*m];
unsigned long * rows = new unsigned long[n*m];
#pragma omp parallel for
for (int i = 0; i < n; ++i)
{
rows[i*m] = x[i*m];
for (int j = 1; j < m; ++j)
{
rows[i*m + j] = x[i*m + j] + rows[i*m + j - 1];
}
}
transpose(rows, out, n, m);
#pragma omp parallel for
for (int i = 0; i < n; ++i)
{
rows[i*m] = out[i*m];
for (int j = 1; j < m; ++j)
{
rows[i*m + j] = out[i*m + j] + rows[i*m + j - 1];
}
}
transpose(rows, out, m, n);
delete [] rows;
return out;
}
Here is the sequential code:
unsigned long * integralImage(uint8_t*x, int n, int m){
unsigned long * out = new unsigned long[n*m];
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < m; ++j)
{
unsigned long val = x[i*m + j];
if (i>=1)
{
val += out[(i-1)*m + j];
if (j>=1)
{
val += out[i*m + j - 1] - out[(i-1)*m + j - 1];
}
} else {
if (j>=1)
{
val += out[i*m + j -1];
}
}
out[i*m + j] = val;
}
}
return out;
}
I also tried without the transpose but it was even slower probably because the cache accesses.
An example of calling code:
int main(int argc, char **argv){
uint8_t* image = //read image from file (gray scale)
int height = //height of the image
int width = //width of the image
double start_omp = omp_get_wtime();
unsigned long* integral_image_parallel = integralImageMP(image, height, width); //parallel
double end_omp = omp_get_wtime();
double time_tot = end_omp - start_omp;
std::cout << time_tot << std::endl;
start_omp = omp_get_wtime();
unsigned long* integral_image_serial = integralImage(image, height, width); //sequential
end_omp = omp_get_wtime();
time_tot = end_omp - start_omp;
std::cout << time_tot << std::endl;
return 0;
}
Each thread is working on a block of rows (maybe an illustration of what each thread is doing can be useful):
Where ColumnSum is done transposing the matrix and repeating RowSum.

Let me first say, that the results are a bit surprising to me and I would guesstimate the problem being in the non local memory access required by the transpose algorithm.
You can anyway mitigate it by turning your sequential algorithm into parallel by a two pass approach. The first pass has to calculate the 2D integral in T threads N rows apart and the second pass must compensate the fact that each block didn't start from the accumulated result of the previous row but from zero.
An example with Matlab shows the principle in 2D.
f=fix(rand(12,8)*8) % A random matrix with 12 rows, 8 columns
5 6 1 4 7 5 4 4
4 6 0 7 1 3 2 0
7 0 2 3 0 1 6 3
5 3 1 7 4 3 7 2
6 4 3 2 7 3 5 1
3 3 2 5 5 0 2 1
3 5 7 5 1 4 4 3
6 5 7 4 2 1 0 0
0 2 0 5 3 3 7 4
1 3 5 5 7 4 7 3
1 0 2 1 1 2 6 5
3 7 3 1 6 2 2 5
ff=cumsum(cumsum(f')') % The Summed Area Table
5 11 12 16 23 28 32 36
9 21 22 33 41 49 55 59
16 28 31 45 53 62 74 81
21 36 40 61 73 85 104 113
27 46 53 76 95 110 134 144
30 52 61 89 113 128 154 165
33 60 76 109 134 153 183 197
39 71 94 131 158 178 208 222
39 73 96 138 168 191 228 246
40 77 105 152 189 216 260 281
41 78 108 156 194 223 273 299
44 88 121 170 214 245 297 328
fx=[cumsum(cumsum(f(1:4,:)')'); % The original table summed in
cumsum(cumsum(f(5:8,:)')'); % three parts -- 4 rows per each
cumsum(cumsum(f(9:12,:)')')] % "thread"
5 11 12 16 23 28 32 36
9 21 22 33 41 49 55 59
16 28 31 45 53 62 74 81
21 36 40 61 73 85 104 113 %% Notice this row #4
6 10 13 15 22 25 30 31
9 16 21 28 40 43 50 52
12 24 36 48 61 68 79 84
18 35 54 70 85 93 104 109 %% Notice this row #8
0 2 2 7 10 13 20 24
1 6 11 21 31 38 52 59
2 7 14 25 36 45 65 77
5 17 27 39 56 67 89 106
fx(4,:) + fx(8,:) %% this is the SUM of row #4 and row #8
39 71 94 131 158 178 208 222
%% and finally -- what is the difference of the piecewise
%% calculated result and the real result?
ff-fx
0 0 0 0 0 0 0 0 %% look !! the first block
0 0 0 0 0 0 0 0 %% is already correct
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
21 36 40 61 73 85 104 113 %% All these rows in this
21 36 40 61 73 85 104 113 %% block are short by
21 36 40 61 73 85 104 113 %% the row #4 above
21 36 40 61 73 85 104 113 %%
39 71 94 131 158 178 208 222 %% and all these rows
39 71 94 131 158 178 208 222 %% in this block are short
39 71 94 131 158 178 208 222 %% by the SUM of the rows
39 71 94 131 158 178 208 222 %% #4 and #8 above
Fortunately one can start integrating the block 2, i.e. rows 2N..3N-1 before the block #1 has been compensated -- one just has to calculate the offset, which is a relatively small sequential task.
acc_for_block_2 = row[2*N-1] + row[N-1];
acc_for_block_3 = acc_for_block_2 + row[3*N-1];
..
acc_for_block_T-1 = acc_for_block_(T-2) + row[N*(T-1)-1];

Related

Number pattern in C++

I want to print a number pattern:
1
2 4
3 6 9
4 8 12 16
5 10 15 20 25
...
10 . .. . .
This is my code:
#include <iostream>
#include <conio.h>
using namespace std;
int main()
{
int cols, row, num=1;
for(row=1; row<=10; row++)
{
for(cols = row; cols <= row*10; cols = cols + row)
{
cout << cols << " ";
}
cout << "\n";
}
return 0;
getch();
}
But it gives me the output:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100

To make that pattern you only really need the number of rows. The number of columns in each row happens to be equal to the row number, and the values are integer-multipliers of that row number.
#include <iostream>
int main()
{
const int rows = 10;
for (int row = 1; row <= rows; ++row)
{
for (int col = 1; col <= row; ++col)
{
int value = row * col;
std::cout << value << ' ';
}
std::cout << '\n';
}
}
Output
1
2 4
3 6 9
4 8 12 16
5 10 15 20 25
6 12 18 24 30 36
7 14 21 28 35 42 49
8 16 24 32 40 48 56 64
9 18 27 36 45 54 63 72 81
10 20 30 40 50 60 70 80 90 100

Number of combinations with C-pair of N elements

I have N buckets. Each bucket can contain 0 or 1. C is number that represents how many number 1 is showing continuously (e.g. if C=3 i would have 111).
E.g. for N=5 and C=2, total number of all combinations is 19 (here C=2, so I have always to have at least two ones - 11 in row):
And this is calculation for first 20 N and C numbers (I marked yellow case above):
How to get to the formula that depends on C and N ?

This python progam
import scipy.special
import fractions
def bi(n, m):
return scipy.special.comb(n, m, exact=True)
def fr(*args):
return fractions.Fraction(*args)
def f(N, k):
N = fr(N)
k = fr(k)
s = 1
m = 0
while m <= k - 1:
if m % k == N % k:
x = (N - m)/k
s -= bi(m, x) * (-1)**x * 2**(-(k + 1)*x)
m += 1
while m <= N:
if m % k == N % k:
x = (N - m)/k
s -= (bi(m, x) - fr(1, 2)**k * bi(m - k, x) ) * (-1)**x * 2**(-(k + 1)*x)
m += 1
return(s * 2**N)
for N in range(1, 20):
for C in range(1, N + 1):
print("%6.d" % f(N, C), end = ' ')
print()
Outputs:
1
3 1
7 3 1
15 8 3 1
31 19 8 3 1
63 43 20 8 3 1
127 94 47 20 8 3 1
255 201 107 48 20 8 3 1
511 423 238 111 48 20 8 3 1
1023 880 520 251 112 48 20 8 3 1
2047 1815 1121 558 255 112 48 20 8 3 1
4095 3719 2391 1224 571 256 112 48 20 8 3 1
8191 7582 5056 2656 1262 575 256 112 48 20 8 3 1
16383 15397 10616 5713 2760 1275 576 256 112 48 20 8 3 1
32767 31171 22159 12199 5984 2798 1279 576 256 112 48 20 8 3 1
65535 62952 46023 25888 12880 6088 2811 1280 576 256 112 48 20 8 3 1
131071 126891 95182 54648 27553 13152 6126 2815 1280 576 256 112 48 20 8 3 1
262143 255379 196132 114832 58631 28240 13256 6139 2816 1280 576 256 112 48 20 8 3 1
524287 513342 402873 240335 124192 60320 28512 13294 6143 2816 1280 576 256 112 48 20 8 3 1
The formula is from Markus Scheuer.

Why are my bit shifts giving incorrect numbers

I'm trying to store a number in an array of 4 integers. The array is in the class Num. My problem is that when I call getValue, the function returns numbers that aren't correct. I tried go through the program on paper, doing all the calculations in Microsoft's calculator, and the program should give the correct output. I don't even know which function could be problematic since there aren't any errors or warnings, and both worked on paper.
21 in binary:10101
What I'm trying to do:
Input to setValue function: 21
setValue puts the first four bits of 21 (0101) into num[3]. So num[3] is now 0101 in binary. Then it should put the next four bits of 21 into num[2]. The next four bits are 0001 so 0001 goes into num[2] The rest of the bits are 0 so we ignore them. Now num is {0,0,1,5}. getValue first goes to num[3]. There is 5 which is 0101 in binary. So it puts that into the first four bits of return value. It then puts 0001 into the next four bits. The rest of the numbers are 0 so it is supposed to ignore them. Then the output of the function getValue is directly printed out. The actual output is at the bottom.
My code:
#include <iostream>
class Num {
char len = 4;
int num[4];
public:
void setValue(int);
int getValue();
};
void Num::setValue(int toSet)
{
char len1=len-1;
for (int counter = len1;counter>=0;counter--)
{
if(toSet&(0xF<<(len1-counter))!=0)
{
num[counter]=(toSet&(0xF<<(len1-counter)))>>len1-counter;
} else {
break;
}
}
}
int Num::getValue()
{
char len1 = len-1;
int returnValue = 0;
for(char counter = len1; counter>=0;counter--)
{
if (num[counter]!=0) {
returnValue+=(num[counter]<<(len1-counter));
} else {
break;
}
}
return returnValue;
}
int main()
{
int x=260;
Num number;
while (x>0)
{
number.setValue(x);
std::cout<<x<<"Test: "<<number.getValue()<<std::endl;
x--;
}
std::cin>>x;
return 0;
}
Output:
260Test: -1748023676
259Test: 5
258Test: 5
257Test: 1
256Test: 1
255Test: 225
254Test: 225
253Test: 221
252Test: 221
251Test: 213
250Test: 213
249Test: 209
248Test: 209
247Test: 193
246Test: 193
245Test: 189
244Test: 189
243Test: 181
242Test: 181
241Test: 177
240Test: 177
239Test: 177
238Test: 177
237Test: 173
236Test: 173
235Test: 165
234Test: 165
233Test: 161
232Test: 161
231Test: 145
230Test: 145
229Test: 141
228Test: 141
227Test: 133
226Test: 133
225Test: 1
224Test: 1
223Test: 161
222Test: 161
221Test: 157
220Test: 157
219Test: 149
218Test: 149
217Test: 145
216Test: 145
215Test: 129
214Test: 129
213Test: 125
212Test: 125
211Test: 117
210Test: 117
209Test: 113
208Test: 113
207Test: 113
206Test: 113
205Test: 109
204Test: 109
203Test: 101
202Test: 101
201Test: 97
200Test: 97
199Test: 81
198Test: 81
197Test: 77
196Test: 77
195Test: 5
194Test: 5
193Test: 1
192Test: 1
191Test: 161
190Test: 161
189Test: 157
188Test: 157
187Test: 149
186Test: 149
185Test: 145
184Test: 145
183Test: 129
182Test: 129
181Test: 125
180Test: 125
179Test: 117
178Test: 117
177Test: 113
176Test: 113
175Test: 113
174Test: 113
173Test: 109
172Test: 109
171Test: 101
170Test: 101
169Test: 97
168Test: 97
167Test: 81
166Test: 81
165Test: 77
164Test: 77
163Test: 69
162Test: 69
161Test: 1
160Test: 1
159Test: 97
158Test: 97
157Test: 93
156Test: 93
155Test: 85
154Test: 85
153Test: 81
152Test: 81
151Test: 65
150Test: 65
149Test: 61
148Test: 61
147Test: 53
146Test: 53
145Test: 49
144Test: 49
143Test: 49
142Test: 49
141Test: 45
140Test: 45
139Test: 37
138Test: 37
137Test: 33
136Test: 33
135Test: 17
134Test: 17
133Test: 13
132Test: 13
131Test: 5
130Test: 5
129Test: 1
128Test: 1
127Test: 225
126Test: 225
125Test: 221
124Test: 221
123Test: 213
122Test: 213
121Test: 209
120Test: 209
119Test: 193
118Test: 193
117Test: 189
116Test: 189
115Test: 181
114Test: 181
113Test: 177
112Test: 177
111Test: 177
110Test: 177
109Test: 173
108Test: 173
107Test: 165
106Test: 165
105Test: 161
104Test: 161
103Test: 145
102Test: 145
101Test: 141
100Test: 141
99Test: 133
98Test: 133
97Test: 1
96Test: 1
95Test: 161
94Test: 161
93Test: 157
92Test: 157
91Test: 149
90Test: 149
89Test: 145
88Test: 145
87Test: 129
86Test: 129
85Test: 125
84Test: 125
83Test: 117
82Test: 117
81Test: 113
80Test: 113
79Test: 113
78Test: 113
77Test: 109
76Test: 109
75Test: 101
74Test: 101
73Test: 97
72Test: 97
71Test: 81
70Test: 81
69Test: 77
68Test: 77
67Test: 5
66Test: 5
65Test: 1
64Test: 1
63Test: 161
62Test: 161
61Test: 157
60Test: 157
59Test: 149
58Test: 149
57Test: 145
56Test: 145
55Test: 129
54Test: 129
53Test: 125
52Test: 125
51Test: 117
50Test: 117
49Test: 113
48Test: 113
47Test: 113
46Test: 113
45Test: 109
44Test: 109
43Test: 101
42Test: 101
41Test: 97
40Test: 97
39Test: 81
38Test: 81
37Test: 77
36Test: 77
35Test: 69
34Test: 69
33Test: 1
32Test: 1
31Test: 97
30Test: 97
29Test: 93
28Test: 93
27Test: 85
26Test: 85
25Test: 81
24Test: 81
23Test: 65
22Test: 65
21Test: 61
20Test: 61
19Test: 53
18Test: 53
17Test: 49
16Test: 49
15Test: 49
14Test: 49
13Test: 45
12Test: 45
11Test: 37
10Test: 37
9Test: 33
8Test: 33
7Test: 17
6Test: 17
5Test: 13
4Test: 13
3Test: 5
2Test: 5
1Test: 1
I compiled this with g++ 6.3.0 with the command g++ a.cpp -o a.exe

When compiling with -Wall, there are a number of warnings:
orig.cpp: In member function ‘void Num::setValue(int)’:
orig.cpp:15:39: warning: suggest parentheses around comparison in operand of ‘&’ [-Wparentheses]
if(toSet&(0xF<<(len1-counter))!=0)
~~~~~~~~~~~~~~~~~~~~~^~~
orig.cpp:17:61: warning: suggest parentheses around ‘-’ inside ‘>>’ [-Wparentheses]
num[counter]=(toSet&(0xF<<(len1-counter)))>>len1-counter;
~~~~^~~~~~~~
orig.cpp: In member function ‘int Num::getValue()’:
orig.cpp:30:24: warning: array subscript has type ‘char’ [-Wchar-subscripts]
if (num[counter]!=0) {
^
orig.cpp:31:38: warning: array subscript has type ‘char’ [-Wchar-subscripts]
returnValue+=(num[counter]<<(len1-counter));
^
If you were to print the values of num before changing them, you'd see that some might be non-zero (i.e. they are uninitialized), which causes undefined behavior and probably breaks your for loops in getValue and setValue.
So change:
int num[4];
Into:
int num[4] = { 0 };
Here's a cleaned up version with the warnings fixed:
#include <iostream>
class Num {
int len = 4;
int num[4] = { 0 };
public:
void setValue(int);
int getValue();
void showval();
};
void Num::setValue(int toSet)
{
int len1=len-1;
for (int counter = len1;counter>=0;counter--)
{
if ((toSet & (0xF << (len1-counter))) != 0)
{
num[counter] = (toSet & (0xF << (len1-counter))) >> (len1-counter);
} else {
break;
}
}
}
int Num::getValue()
{
int len1 = len-1;
int returnValue = 0;
for(int counter = len1; counter>=0;counter--)
{
if (num[counter]!=0) {
returnValue+=(num[counter]<<(len1-counter));
} else {
break;
}
}
return returnValue;
}
void Num::showval()
{
for (int i = 0; i < len; ++i)
std::cout << i << ": show: " << num[i] << "\n";
#if 0
for (int i = 0; i < len; ++i)
num[i] = 0;
#endif
}
int main()
{
int x=260;
Num number;
number.showval();
while (x>0)
{
number.setValue(x);
std::cout << x << " Test: " << number.getValue() << std::endl;
x--;
}
std::cin>>x;
return 0;
}

To break a number into nibbles, the shift counts should be multiples of 4. Otherwise slices of 4 bits are extracted that don't line up.
00010101 (21)
^^^^ first nibble
^^^^ second nibble
The second nibble is displaced by 4 bits so it needs to be shifted right by 4, not by 1.
You could multiply your shift counts by 4, but there is an easier way: only ever shift by 4. For example:
for (int i = len - 1; i >= 0; i--) {
num[i] = toSet & 0xF;
toSet >>= 4;
}
Then every iteration extracts the lowest nibble in toSet, and shifts toSet over so that the next nibble becomes the lowest nibble.
I didn't put in a break and there should not be one. It definitely shouldn't be the kind of break that you had, which stops the loop also whenever a number has a zero in the middle of it (for example in 0x101 the middle 0 causes the loop to stop). The loop also should not stop when the entire rest of the number is zero, since that leaves junk in the other entries of num.
It's more common to store the lowest nibble in the 0th element and so on (then you don't have to deal with all the "reverse logic" with down-counting loops and subtracting things from the length) but that's up to you.
Extracting the value can be done symmetrically, building up the result while shifting it, instead of shifting every piece into its final place immediately. Or just multiply (len1-counter) by 4. While extracting the value, you also cannot stop when num[i] is zero, since that does not prove that the rest of the number is zero too.

SSE - mismatch between _mm_extract_ps and direct access

The following piece of code:
__m128 var1;
float *a = (float*)malloc(50*sizeof(float));
float *ptr = a;
//Initialise a with some values
for(int i = 0; i < 50; i++)
*(a+i) = i;
//print those values
for(int i = 0; i < 50; i+=4,ptr+=4)
{
var1 = _mm_loadu_ps(ptr);
cout<<(*ptr)<<" "<<var1[0]<<" "<<_mm_extract_ps(var1, 0)<<endl;
cout<<(*ptr+1)<<" "<<var1[1]<<" "<<_mm_extract_ps(var1, 1)<<endl;
cout<<(*ptr+2)<<" "<<var1[2]<<" "<<_mm_extract_ps(var1, 2)<<endl;
cout<<(*ptr+3)<<" "<<var1[3]<<" "<<_mm_extract_ps(var1, 3)<<endl;
}
returns this output:
0 0 0
1 1 1065353216
2 2 1073741824
3 3 1077936128
4 4 1082130432
5 5 1084227584
6 6 1086324736
7 7 1088421888
8 8 1090519040
9 9 1091567616
10 10 1092616192
11 11 1093664768
12 12 1094713344
13 13 1095761920
14 14 1096810496
15 15 1097859072
16 16 1098907648
17 17 1099431936
18 18 1099956224
19 19 1100480512
20 20 1101004800
21 21 1101529088
22 22 1102053376
23 23 1102577664
24 24 1103101952
25 25 1103626240
26 26 1104150528
27 27 1104674816
28 28 1105199104
29 29 1105723392
30 30 1106247680
31 31 1106771968
32 32 1107296256
33 33 1107558400
34 34 1107820544
35 35 1108082688
36 36 1108344832
37 37 1108606976
38 38 1108869120
39 39 1109131264
40 40 1109393408
41 41 1109655552
42 42 1109917696
43 43 1110179840
44 44 1110441984
45 45 1110704128
46 46 1110966272
47 47 1111228416
48 48 1111490560
49 49 1111752704
1.45875e-42 1.45875e-42 1041
0 0 0
My questions is: Isn't _mm_extract_ps the right way of accessing the contents of an __m128 variable? Why does it print values that dont match the actual value, whereas var[0] prints the correct values. As far as I know, accessing the fields of an __m128 variable using var[0] is incorrect and may lead to problems. What exactly is the right approach, at times when I need to debug my code.

Type of a is pointer to float, when you write float == 1.0f, into memory, its representation in hex is 0x3F800000, decimal value is 1 065 353 216, so printed value is valid, _mm_extract_ps returns int, and cout prints it. Hex representation of 2.0f is 0x40000000, in decimal 1 073 741 824. You printed hex representation of float using decimal value.

Next higher number with one zero bit

Today I've run into this problem, but I couldn't solve it after a period of time. I need some help
I have number N. The problem is to find next higher number ( > N ) with only one zero bit in binary.
Example:
Number 1 can be represented in binary as 1.
Next higher number with only one zero bit is 2 - Binary 10
A few other examples:
N = 2 (10), next higher number with one zero bit is 5 (101)
N = 5 (101), next higher number is 6 (110)
N = 7 (111), next higher number is 11 (1011)
List of 200 number:
1 1
2 10 - 1
3 11
4 100
5 101 - 1
6 110 - 1
7 111
8 1000
9 1001
10 1010
11 1011 - 1
12 1100
13 1101 - 1
14 1110 - 1
15 1111
16 10000
17 10001
18 10010
19 10011
20 10100
21 10101
22 10110
23 10111 - 1
24 11000
25 11001
26 11010
27 11011 - 1
28 11100
29 11101 - 1
30 11110 - 1
31 11111
32 100000
33 100001
34 100010
35 100011
36 100100
37 100101
38 100110
39 100111
40 101000
41 101001
42 101010
43 101011
44 101100
45 101101
46 101110
47 101111 - 1
48 110000
49 110001
50 110010
51 110011
52 110100
53 110101
54 110110
55 110111 - 1
56 111000
57 111001
58 111010
59 111011 - 1
60 111100
61 111101 - 1
62 111110 - 1
63 111111
64 1000000
65 1000001
66 1000010
67 1000011
68 1000100
69 1000101
70 1000110
71 1000111
72 1001000
73 1001001
74 1001010
75 1001011
76 1001100
77 1001101
78 1001110
79 1001111
80 1010000
81 1010001
82 1010010
83 1010011
84 1010100
85 1010101
86 1010110
87 1010111
88 1011000
89 1011001
90 1011010
91 1011011
92 1011100
93 1011101
94 1011110
95 1011111 - 1
96 1100000
97 1100001
98 1100010
99 1100011
100 1100100
101 1100101
102 1100110
103 1100111
104 1101000
105 1101001
106 1101010
107 1101011
108 1101100
109 1101101
110 1101110
111 1101111 - 1
112 1110000
113 1110001
114 1110010
115 1110011
116 1110100
117 1110101
118 1110110
119 1110111 - 1
120 1111000
121 1111001
122 1111010
123 1111011 - 1
124 1111100
125 1111101 - 1
126 1111110 - 1
127 1111111
128 10000000
129 10000001
130 10000010
131 10000011
132 10000100
133 10000101
134 10000110
135 10000111
136 10001000
137 10001001
138 10001010
139 10001011
140 10001100
141 10001101
142 10001110
143 10001111
144 10010000
145 10010001
146 10010010
147 10010011
148 10010100
149 10010101
150 10010110
151 10010111
152 10011000
153 10011001
154 10011010
155 10011011
156 10011100
157 10011101
158 10011110
159 10011111
160 10100000
161 10100001
162 10100010
163 10100011
164 10100100
165 10100101
166 10100110
167 10100111
168 10101000
169 10101001
170 10101010
171 10101011
172 10101100
173 10101101
174 10101110
175 10101111
176 10110000
177 10110001
178 10110010
179 10110011
180 10110100
181 10110101
182 10110110
183 10110111
184 10111000
185 10111001
186 10111010
187 10111011
188 10111100
189 10111101
190 10111110
191 10111111 - 1
192 11000000
193 11000001
194 11000010
195 11000011
196 11000100
197 11000101
198 11000110
199 11000111
200 11001000

There are three cases.
The number x has more than one zero bit in its binary representation. All but one of these zero bits must be "filled in" with 1 to obtain the required result. Notice that all numbers obtained by taking x and filling in one or more of its low-order zero bits are numerically closer to x compared to the number obtained by filling just the top-most zero bit. Therefore the answer is the number x with all-but-one of its zero bits filled: only its topmost zero bit remains unfilled. For example if x=110101001 then the answer is 110111111. To get the answer, find the index i of the topmost zero bit of x, and then calculate the bitwise OR of x and 2^i - 1.
C code for this case:
// warning: this assumes x is known to have *some* (>1) zeros!
unsigned next(unsigned x)
{
unsigned topmostzero = 0;
unsigned bit = 1;
while (bit && bit <= x) {
if (!(x & bit)) topmostzero = bit;
bit <<= 1;
}
return x | (topmostzero - 1);
}
The number x has no zero bits in binary. It means that x=2^n - 1 for some number n. By the same reasoning as above, the answer is then 2^n + 2^(n-1) - 1. For example, if x=111, then the answer is 1011.
The number x has exactly one zero bit in its binary representation. We know that the result must be strictly larger than x, so x itself is not allowed to be the answer. If x has the only zero in its least-significant bit, then this case reduces to case #2. Otherwise, the zero should be moved one position to the right. Assuming x has zero in its i-th bit, the answer should have its zero in i-1-th bit. For example, if x=11011, then the result is 11101.

You could also use another approach:
Every number with exactly one zero bit can be represented as
2^n - 1 - 2^m
Now the task is easy:
1. Find an n, great enough for at least 2^n-1-2^0>x, that's equivalent to 2^n>x+2
2. Find the greatest m for which 2^n-1-2^m is still greater than x.
as Code:
#include <iostream>
#include <math.h>
using namespace std;
//binary representation
void bin(unsigned n)
{
for (int i = floor(log2(n));i >= 0;--i)
(n & (1<<i))? printf("1"): printf("0");
}
//outputs the next greater int to x with exactly one 0 in binary representation
int nextHigherOneZero(int x)
{
unsigned int n=0;
while((1<<n)<= x+2 ) ++n;
unsigned int m=0;
while((1<<n)-1-(1<<(m+1)) > x && m<n-2)
++m;
return (1<<n)-1-(1<<m);
}
int main()
{
int r=0;
for(int i = 1; i<100;++i){
r=nextHigherOneZero(i);
printf("\nX: %i=",i);
bin(i);
printf(";\tnextHigherOneZero(x):%i=",r);
bin(r);
printf("\n");
}
return 0;
}
You can try it here (with some additional Debug-Output):
http://ideone.com/6w3fAN
As a note: its probably possible to get m and n faster with some good binary logic, feel free to contribute...
Pro of this approach:
No assumptions needs to be made
Cons:
Ugly while loops

couldn't miss the opportunity to remember binary logic :), here's my solution:
here's main
main(int argc, char** argv)
{
int i = 139261;
i++;
while (!oneZero(i))
{
i++;
}
std::cout << i;
}
and here's all logic to find if number has 1 zero
bool oneZero(int i)
{
int count = 0;
while (i != 0)
{
// check last bit if it is zero
if ((1 & i) == 0) {
count++;
if (count > 1) return false;
}
// make the number shorter :)
i = i >> 1;
}
return (count == 1);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

OpenMP integral image slower then sequential - c++

Related

Number pattern in C++

Number of combinations with C-pair of N elements

Why are my bit shifts giving incorrect numbers

SSE - mismatch between _mm_extract_ps and direct access

Next higher number with one zero bit

Categories

Resources