I want to compare two small (<=20) sets of integers (1..20) lexicographically.
The sets are represented by single integers, e.g.
1, 2, 4, 6
will be represented as
... 0 1 0 1 0 1 1
(... 7 6 5 4 3 2 1)
So where there's a 1 the number is present in the set.
Could someone verify if this code is correct?
bool less_than(unsigned a, unsigned b) {
unsigned tmp = a ^ b;
tmp = tmp & (~tmp + 1); //first difference isolated
return (tmp & a) && (__builtin_clz(b) < __builtin_clz(tmp));
}
The __builtin_clz part is for the case when b is a prefix of a.
The case of an empty set is handled elsewhere (__builtin_clz is undefined for 0).
EDIT:
bool less_than(unsigned a, unsigned b) {
unsigned tmp = a ^ b;
tmp &= -tmp; //first difference isolated
return ((tmp & a) && (__builtin_clz(b) < __builtin_clz(tmp)))
|| (__builtin_clz(a) > __builtin_clz(tmp));
}
and
bool less_than_better(unsigned a, unsigned b) {
unsigned tmp = a ^ b;
tmp &= -tmp; //first difference isolated
return ((tmp & a) && tmp < b) || tmp > a;
}
appear to be both correct.
(Tested versus a naive implementation using std::lexicographical_compare on tens of millions of randomized tests)
The second one is more portable though since it doesn't use __builtin_clz.
The difference in speed on my machine is negligible (the second one being ~2% faster), however on machines without __builtin_clz as one processor instruction (e.g. BSR on x86) the difference will probably be huge.
It's not correct in the case that a == 0. This should return true unless b == 0, but since tmp & a will be false regardless of the value of tmp (which will be the lowest-order 1-bit in b), the function will return false.
a should be "less than" b if:
1. `a` is a proper prefix of `b`, or
2. The lowest-order bit of `a^b` is in `a`.
The first condition also handles the case where a is the empty set and b is not. (This is slightly different from your formulation, which is "(The lowest-order bit of a^b is in a) and not (b is a proper prefix of a).)
A simple test of the case "a is a proper prefix of b", given the fact that we have the the lowest-order bit of a^b in tmp, is tmp > a. That avoids the use of __builtin_clz [Note 1].
Also, you could write
tmp = tmp & (~tmp + 1);
as
tmp &= -tmp;
but I think that most C compilers will find that optimization on their own. [Note 2].
Applying those optimizations, the result would be (untested):
bool less_than(unsigned a, unsigned b) {
unsigned tmp = a ^ b;
tmp &= -tmp; //first difference isolated
return tmp > a || tmp & a;
}
Notes
This is worth doing because (1) even though __builtin_clz is builtin, it is not necessarily super-fast; and (2) it may not be present if you're compiling with a compiler other than gcc or clang.
-tmp is guaranteed to be the 2s-complement negative of tmp if tmp is an unsigned type, even if the underlying implementation is not 2s-complement. See ยง6.2.6.2/1 (the range of an unsigned type is 0..2N-1 for some integer N) and &6.3.1.3/2 (a negative value is converted to an unsigned integer type by repeatedly adding 2N until the value is in range.
Here's a listing calculating all combinations for 2-bit inputs:
#include <stdio.h>
bool less_than(unsigned a, unsigned b) {
unsigned tmp = a ^ b;
tmp = tmp & (~tmp + 1); //first difference isolated
return (tmp & a) && (__builtin_clz(b) < __builtin_clz(tmp));
}
#define BITPATTERN "%d%d%d"
#define BYTETOBITS(byte) \
(byte & 0x04 ? 1 : 0), \
(byte & 0x02 ? 1 : 0), \
(byte & 0x01 ? 1 : 0)
int main(int argc, char** argv) {
for ( int a = 0; a < 4; a ++ )
for ( int b = 0; b < 4; b ++)
printf("a: "BITPATTERN" b: "BITPATTERN": %d\n",
BYTETOBITS(a), BYTETOBITS(b), less_than(a,b)
);
}
And here's the output:
a: 000 b: 000: 0
a: 000 b: 001: 0
a: 000 b: 010: 0
a: 000 b: 011: 0
a: 001 b: 000: 0
a: 001 b: 001: 0
a: 001 b: 010: 1
a: 001 b: 011: 0
a: 010 b: 000: 0
a: 010 b: 001: 0
a: 010 b: 010: 0
a: 010 b: 011: 0
a: 011 b: 000: 0
a: 011 b: 001: 0
a: 011 b: 010: 1
a: 011 b: 011: 0
It doesn't seem to look correct..
Related
Given a long int x, count the number of values of a that satisfy the following conditions:
a XOR x > x
0 < a < x
where a and x are long integers and XOR is the bitwise XOR operator
How would you go about completing this problem?
I should also mentioned that the input x can be as large as 10^10
I have managed to get a brute force solution by iterating over 0 to x checking the conditions and incrementing a count value.. however this is not an optimal solution...
This is the brute force that I tried. It works but is extremely slow for large values of x.
for(int i =0; i < x; i++)
{
if((0 < i && i < x) && (i ^ x) > x)
count++;
}
long long NumberOfA(long long x)
{
long long t = x <<1;
while(t^(t&-t)) t ^= (t&-t);
return t-++x;
}
long long x = 10000000000;
printf("%lld ==> %lld\n", 10LL, NumberOfA(10LL) );
printf("%lld ==> %lld\n", x, NumberOfA(x) );
Output
10 ==> 5
10000000000 ==> 7179869183
Link to IDEOne Code
Trying to explain the logic (using example 10, or 1010b)
Shift x to the left 1. (Value 20 or 10100b)
Turn off all low bits, leaving just the high bit (Value 16 or 10000b)
Subtract x+1 (16 - 11 == 5)
Attempting to explain
(although its not easy)
Your rule is that a ^ x must be bigger than x, but that you cannot add extra bits to a or x.
(If you start with a 4-bit value, you can only use 4-bits)
The biggest possible value for a number in N-bits is 2^n -1.
(eg. 4-bit number, 2^4-1 == 15)
Lets call this number B.
Between your value x and B (inclusive), there are B-x possible values.
(back to my example, 10. Between 15 and 10, there are 5 possible values: 11, 12, 13, 14, 15)
In my code, t is x << 1, then with all the low bits turned off.
(10 << 1 is 20; turn off all the low bits to get 16)
Then 16 - 1 is B, and B - x is your answer:
(t - 1 - x, is the same as t - ++x, is the answer)
One way to look at this is to consider each bit in x.
If it's 1, then flipping it will yield a smaller number.
If it's 0, then flipping it will yield a larger number, and we should count it - and also all the combinations of bits to the right. That conveniently adds up to the mask value.
long f(long const x)
{
// only positive x can have non-zero result
if (x <= 0) return 0;
long count = 0;
// Iterate from LSB to MSB
for (long mask = 1; mask < x; mask <<= 1)
count += x & mask
? 0
: mask;
return count;
}
We might suspect a pattern here - it looks like we're just copying x and flipping its bits.
Let's confirm, using a minimal test program:
#include <cstdlib>
#include <iostream>
int main(int, char **argv)
{
while (*++argv)
std::cout << *argv << " -> " << f(std::atol(*argv)) << std::endl;
}
0 -> 0
1 -> 0
2 -> 1
3 -> 0
4 -> 3
5 -> 2
6 -> 1
7 -> 0
8 -> 7
9 -> 6
10 -> 5
11 -> 4
12 -> 3
13 -> 2
14 -> 1
15 -> 0
So all we have to do is 'smear' the value so that all the zero bits after the most-significant 1 are set, then xor with that:
long f(long const x)
{
if (x <= 0) return 0;
long mask = x;
while (mask & (mask+1))
mask |= mask+1;
return mask ^ x;
}
This is much faster, and still O(log n).
Artichoke101 asked this:
Lets say that I have an array of 4 32-bit integers which I use to store the 128-bit number
How can I perform left and right shift on this 128-bit number?"
My question is related to the answer Remus Rusanu gave:
void shiftl128 (
unsigned int& a,
unsigned int& b,
unsigned int& c,
unsigned int& d,
size_t k)
{
assert (k <= 128);
if (k > 32)
{
a=b;
b=c;
c=d;
d=0;
shiftl128(a,b,c,d,k-32);
}
else
{
a = (a << k) | (b >> (32-k));
b = (b << k) | (c >> (32-k));
c = (c << k) | (d >> (32-k));
d = (d << k);
}
}
void shiftr128 (
unsigned int& a,
unsigned int& b,
unsigned int& c,
unsigned int& d,
size_t k)
{
assert (k <= 128);
if (k > 32)
{
d=c;
c=b;
b=a;
a=0;
shiftr128(a,b,c,d,k-32);
}
else
{
d = (c << (32-k)) | (d >> k); \
c = (b << (32-k)) | (c >> k); \
b = (a << (32-k)) | (b >> k); \
a = (a >> k);
}
}
Lets just focus on one shift, the left shift say. Specifically,
a = (a << k) | (b >> (32-k));
b = (b << k) | (c >> (32-k));
c = (c << k) | (d >> (32-k));
d = (d << k);
How is this left shifting the 128-bit number? I understand what bit shifting is, << shifts bits left, (8-bit number) like 00011000 left shifted 2 is 01100000. Same goes for the right shift, but to the right. Then the single "pipe" | is OR meaning any 1 in either 32-bit number will be in the result.
How is a = (a << k) | (b >> (32-k)) shifting the first part (32) of the 128-bit number correctly?
This technique is somewhat idiomatic. Let's simplify to just a and b. We start with:
+----------+----------+
| a | b |
+----------+----------+
and we want to shift left some amount to obtain:
+----------+----------+
| a : | b : | c ...
+----------+----------+
|<--x-->| |
->|y |<-
So X is simply a << k. y is the k msbs of b, right-aligned in the word. You obtain that result with b >> (32-k).
So overall, you get:
a = x | y
= (a << k) | (b >> (32-k))
[Note: This approach is only valid for 1 <= k <= 31, so your code is actually incorrect.]
When the bits of a get shifted to the left, something has to fill in the space left over on the right end. Since a and b are conceptually adjacent to each other, the void left by shifting the bits of a gets filled by the bits that are shifted off the end of b. The expression b >> (32-k) computes the bits that get shifted off of b.
You need to remember that it is acceptable, in shifting, to "lose" data.
The simplest way to understand shifting is to imagine a window. For example, let us work on bytes. You can view a byte as:
0 0 0 0 0 0 0 0 a b c d e f g h 0 0 0 0 0 0 0 0
[ B ]
Now, shifting is just about moving this window:
0 0 0 0 0 0 0 0 a b c d e f g h 0 0 0 0 0 0 0 0
[ B >> 8 ]
[ B >> 7 ]
[ B >> 6 ]
[ B >> 5 ]
0 0 0 0 0 0 0 0 a b c d e f g h 0 0 0 0 0 0 0 0
[ B >> 4 ]
[ B >> 3 ]
[ B >> 2 ]
[ B >> 1 ]
0 0 0 0 0 0 0 0 a b c d e f g h 0 0 0 0 0 0 0 0
[ B << 1 ]
[ B << 2 ]
[ B << 3 ]
[ B << 4 ]
0 0 0 0 0 0 0 0 a b c d e f g h 0 0 0 0 0 0 0 0
[ B << 5 ]
[ B << 6 ]
[ B << 7 ]
[ B << 8 ]
0 0 0 0 0 0 0 0 a b c d e f g h 0 0 0 0 0 0 0 0
If you look at the direction of the arrows, you can think of it as having a fixed window and a moving content... just like your fancy mobile phone touch screen!
So, what is happening in the expression a = (a << k) | (b >> (32-k)) ?
a << k selects the 32 - k rightmost bits of a and move them toward the left, creating a space of k 0 on the right side
b >> (32-k) selects the k leftmost bits of b and move them toward the right, creating a space of 32 - k 0 on the left side
the two are merged together
Getting back to using byte-length bites:
Suppose that a is [a7, a6, a5, a4, a3, a2, a1, a0]
Suppose that b is [b7, b6, b5. b4, b3, b2, b1, b0]
Suppose that k is 3
The number represented is:
// before
a7 a6 a5 a4 a3 a2 a1 a0 b7 b6 b5 b4 b3 b2 b1 b0
[ a ]
[ b ]
// after (or so we would like)
a7 a6 a5 a4 a3 a2 a1 a0 b7 b6 b5 b4 b3 b2 b1 b0
[ a ]
[ b ]
So:
a << 3 does actually become a4 a3 a2 a1 a0 0 0 0
b >> (8 - 3) becomes 0 0 0 0 0 b7 b6 b5
combining with | we get a4 a3 a2 a1 a0 b7 b6 b5
rinse and repeat :)
Note that in the else case k is guaranteed to be 32 or less. So each part of your larger number can actually be shifted by k bits. However, shifting it either left or right makes the k higher/lower bits 0. To shift the whole 128bit number you need to fill these k bits with the bits "shifted out" of the neighboring number.
In the case of a left shift by k, the k lower bits of the higher number need to be filled with the k upper bits of the lower number. to get these upper k bits, we shift that (32bit) number right by 32-k bits and now we got those bits in the right position to fill in the zero k bits from the higher number.
BTW: the code above assumes that an unsigned int is exactly 32 bits. That is not portable.
To simplify, consider a 16-bit unsigned short, where we store the high and low bytes as unsigned char h, l respectively.
To simplify further, let's just shift it left by one bit, to see how that goes.
I'm writing it out as 16 consecutive bits, since that's what we're modelling:
[h7 h6 h5 h4 h3 h2 h1 h0 l7 l6 l5 l4 l3 l2 l1 l0]
so, [h, l] << 1 will be
[h6 h5 h4 h3 h2 h1 h0 l7 l6 l5 l4 l3 l2 l1 l0 0]
(the top bit, h7 has been rotated off the top, and the low bit is filled with zero).
Now let's break that back up into h and l ...
[h, l] = [h6 h5 h4 h3 h2 h1 h0 l7 l6 l5 l4 l3 l2 l1 l0 0]
=> h = [h6 h5 h4 h3 h2 h1 h0 l7]
= (h << 1) | (l >> 7)
etc.
my variant for logical left shift of 128 bit number in little endian environment:
typedef struct { unsigned int component[4]; } vector4;
vector4 shift_left_logical_128bit_le(vector4 input,unsigned int numbits) {
vector4 result;
if(n>=128) {
result.component[0]=0;
result.component[1]=0;
result.component[2]=0;
result.component[3]=0;
return r;
}
result=input;
while(numbits>32) {
numbits-=32;
result.component[0]=0;
result.component[1]=result.component[0];
result.component[2]=result.component[1];
result.component[3]=result.component[2];
}
unsigned long long temp;
result.component[3]<<=numbits;
temp=(unsigned long long)result.component[2];
temp=(temp<<numbits)>>32;
result.component[3]|=(unsigned int)temp;
result.component[2]<<=numbits;
temp=(unsigned long long)result.component[1];
temp=(temp<<numbits)>>32;
result.component[2]|=(unsigned int)temp;
result.component[1]<<=numbits;
temp=(unsigned long long)result.component[0];
temp=(temp<<numbits)>>32;
result.component[1]|=(unsigned int)temp;
result.component[0]<<=numbits;
return result;
}
#include<stdio.h>
#include<iostream.h>
main()
{
unsigned char c,i;
union temp
{
float f;
char c[4];
} k;
cin>>k.f;
c=128;
for(i=0;i<8;i++)
{
if(k.c[3] & c) cout<<'1';
else cout<<'0';
c=c>>1;
}
c=128;
cout<<'\n';
for(i=0;i<8;i++)
{
if(k.c[2] & c) cout<<'1';
else cout<<'0';
c=c>>1;
}
return 0;
}
if(k.c[2] & c)
That is called bitwise AND.
Illustration of bitwise AND
//illustration : mathematics of bitwise AND
a = 10110101 (binary representation)
b = 10011010 (binary representation)
c = a & b
= 10110101 & 10011010
= 10010000 (binary representation)
= 128 + 16 (decimal)
= 144 (decimal)
Bitwise AND uses this truth table:
X | Y | R = X & Y
---------
0 | 0 | 0
0 | 1 | 0
1 | 0 | 0
1 | 1 | 1
See these tutorials on bitwise AND:
Bitwise Operators in C and C++: A Tutorial
Bitwise AND operator &
A bitwise operation (AND in this case) perform a bit by bit operation between the 2 operands.
For example the & :
11010010 &
11000110 =
11000010
Bitwise Operation in your code
c = 128 therefore the binary representation is
c = 10000000
a & c will and every ith but if c with evert ith bit of a. Because c only has 1 in the MSB position (pos 7), so a & c will be non-zero if a has a 1 in its position 7 bit, if a has a 0 in pos bit, then a & c will be zero. This logic is used in the if block above. The if block is entered depending upon if the MSB (position 7 bit) of the byte is 1 or not.
Suppose a = ? ? ? ? ? ? ? ? where a ? is either 0 or 1
Then
a = ? ? ? ? ? ? ? ?
AND & & & & & & & &
c = 1 0 0 0 0 0 0 0
---------------
? 0 0 0 0 0 0 0
As 0 & ? = 0. So if the bit position 7 is 0 then answer is 0 is bit position 7 is 1 then answer is 1.
In each iteration c is shifted left one position, so the 1 in the c propagates left wise. So in each iteration masking with the other variable you are able to know if there is a 1 or a 0 at that position of the variable.
Use in your code
You have
union temp
{
float f;
char c[4];
} k;
Inside the union the float and the char c[4] share the same memory location (as the property of union).
Now, sizeof (f) = 4bytes) You assign k.f = 5345341 or whatever . When you access the array k.arr[0] it will access the 0th byte of the float f, when you do k.arr[1] it access the 1st byte of the float f . The array is not empty as both the float and the array points the same memory location but access differently. This is actually a mechanism to access the 4 bytes of float bytewise.
NOTE THAT k.arr[0] may address the last byte instead of 1st byte (as told above), this depends on the byte ordering of storage in memory (See little endian and big endian byte ordering for this)
Union k
+--------+--------+--------+--------+ --+
| arr[0] | arr[1] | arr[2] | arr[3] | |
+--------+--------+--------+--------+ |---> Shares same location (in little endian)
| float f | |
+-----------------------------------+ --+
Or the byte ordering could be reversed
Union k
+--------+--------+--------+--------+ --+
| arr[3] | arr[2] | arr[1] | arr[0] | |
+--------+--------+--------+--------+ |---> Shares same location (in big endian)
| float f | |
+-----------------------------------+ --+
Your code loops on this and shifts the c which propagates the only 1 in the c from bit 7 to bit 0 in one step at a time in each location, and the bitwise anding checks actually every bit position of the bytes of the float variable f, and prints a 1 if it is 1 else 0.
If you print all the 4 bytes of the float, then you can see the IEEE 754 representation.
c has single bit in it set. 128 is 10000000 in binary. if(k.c[2] & c) checks if that bit is set in k.c[2] as well. Then the bit in c is shifted around to check for other bits.
As result the program is made to display the binary representation of float it seems.
I have the following bottleneck function.
typedef unsigned char byte;
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const byte b2 = 128+30;
for (const byte * p1 = p1Start; p1 != p1End; ++p1, ++p2, ++p3) {
*p3 = (*p1 < *p2 ) ? b1 : b2;
}
}
I want to replace C++ code with SSE2 intinsic functions. I have tried _mm_cmpgt_epi8 but it used signed compare. I need unsigned compare.
Is there any trick (SSE, SSE2, SSSE3) to solve my problem?
Note:
I do not want to use multi-threading in this case.
Instead of offsetting your signed values to make them unsigned, a slightly more efficient way would be to do the following:
use _mm_min_epu8 to get the unsigned min of p1, p2
compare this min for equality with p2 using _mm_cmpeq_epi8
the resulting mask will now be 0x00 for elements where p1 < p2 and 0xff for elements where p1 >= p2
you can now use this mask with _mm_or_si128 and _mm_andc_si128 to select the appropriate b1/b2 values
Note that this is 4 instructions in total, compared with 5 using the offset + signed comparison approach.
You can subtract 127 from your numbers, and then use _mm_cmpgt_epi8
Yes, this can be done in SIMD, but it will take a few steps to make the mask.
Ruslik got it right, I think. You want to xor each component with 0x80 to flip the sense of the signed and unsigned comparison. _mm_xor_si128 (PXOR) gets you that -- you'll need to create the mask as a static char array somewhere before loading it into a SIMD register. Then _mm_cmpgt_epi8 gets you a mask and you can use a bitwise AND (eg _mm_and_si128) to perform a masked-move.
Yes, SSE will not work here.
You can improve this code performance on multi-core computer by using OpenMP:
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const byte b2 = 128+30;
int n = p1End - p1Start;
#pragma omp parallel for
for (int i = 0; i < n; ++p1, ++i)
{
p3[i] = (p1[i] < p2[i]) ? b1 : b2;
}
}
Unfortunately, many of the answers above are incorrect. Let's assume a 3-bit word:
unsigned: 4 5 6 7 0 1 2 3 == signed: -4 -3 -2 -1 0 1 2 3 (bits: 100 101 110 111 000 001 010 011)
The method by Paul R is incorrect. Suppose we want to know if 3 > 2. min(3,2) == 2, which suggests yes, so the method works here. Now suppose we want to know if if 7 > 2. The value 7 is -1 in signed representation, so min(-1,2) == -1, which suggests wrongly that 7 is not greater than 2 unsigned.
The method by Andrey is also incorrect. Suppose we want to know if 7 > 2, or a = 7, and b = 2. The value 7 is -1 in signed representation, so the first term (a > b) fails, and the method suggests that 7 is not greater than 2.
However, the method by BJobnh, as corrected by Alexey, is correct. Just subtract 2^(n-1) from the values, where n is the number of bits. In this case, we would subtract 4 to obtain new corresponding values:
old signed: -4 -3 -2 -1 0 1 2 3 => new signed: 0 1 2 3 -4 -3 -2 -1 == new unsigned 0 1 2 3 4 5 6 7.
In other words, unsigned_greater_than(a,b) is equivalent to signed_greater_than(a - 2^(n-1), b - 2^(n-1)).
use pcmpeqb and be the Power with you.
can someone explain to me why the following results in b = 13?
int a, b, c;
a = 1|2|4;
b = 8;
c = 2;
b |= a;
b&= ~c;
It is using binary manipultaors. (Assuming ints are 1 byte, and use Two's complement for storage, etc.)
a = 1|2|4 means a = 00000001 or 00000010 or 00000100, which is 00000111, or 7.
b = 8 means b = 00001000.
c = 2 means c = 00000010.
b |= a means b = b | a which means b = 00001000 or 00000111, which is 00001111, or 15.
~c means not c, which is 11111101.
b &= ~c means b = b & ~c, which means b = 00001111 and 11111101, which is 00001101, or 13.
http://www.cs.cf.ac.uk/Dave/C/node13.html
a = 1|2|4
= 0b001
| 0b010
| 0b100
= 0b111
= 7
b = 8 = 0b1000
c = 2 = 0b10
b|a = 0b1000
| 0b0111
= 0b1111 = 15
~c = 0b111...1101
(b|a) & ~c = 0b00..001111
& 0b11..111101
= 0b00..001101
= 13
lets go into binary mode:
a = 0111 (7 in decimal)
b = 1000 (8)
c = 0010 (2)
then we OR b with a to get b = 1111 (15)
c = 0010 and ~c = 1101
finally b is anded with negated c which gives us c = 1101 (13)
hint: Convert decimal to binary and give it a shot.. maybe... just maybe you'll figure it all out by yourself
a = 1 | 2 | 4;
Assigns the value 7 to a. This is because you are performing a bitwise OR operation on the constants 1, 2 and 4. Since the binary representation of each of these is 1, 10 and 100 respectively, you get 111 which is 7.
b |= a;
This ORs b and a and assigns the result to b. Since b's binary representation is now 111 and a's binary representation is 1000 (8), you end up with 1111 or 15.
b &= ~c;
The ~c in this expression means the bitwise negation of c. This essentially flips all 0's to 1's and vice versa in the binary representation of c. This means c switches from 10 to 111...11101.
After negating c, there is a bitwise AND operation between b and c. This means only bits that are 1 in both b and c remain 1, all others equal 0. Since b is now 1111 and c is all 1's except in the second lowest order bit, all of b's bits remain 1 except the 2 bit.
The result of flipping b's 2 bit is the same as if you simply subtracted 2 from its value. Since its current value is 15, and 15-2 = 13, the assignment results in b == 13.