Lets say that I have an array of 4 32-bit integers which I use to store the 128-bit number
How can I perform left and right shift on this 128-bit number?
Thanks!
Working with uint128? If you can, use the x86 SSE instructions, which were designed for exactly that. (Then, when you've bitshifted your value, you're ready to do other 128-bit operations...)
SSE2 bit shifts take ~4 instructions on average, with one branch (a case statement). No issues with shifting more than 32 bits, either. The full code for doing this is, using gcc intrinsics rather than raw assembler, is in sseutil.c (github: "Unusual uses of SSE2") -- and it's a bit bigger than makes sense to paste here.
The hurdle for many people in using SSE2 is that shift ops take immediate (constant) shift counts. You can solve that with a bit of C preprocessor twiddling (wordpress: C preprocessor tricks). After that, you have op sequences like:
LeftShift(uint128 x, int n) = _mm_slli_epi64(_mm_slli_si128(x, n/8), n%8)
for n = 65..71, 73..79, … 121..127
... doing the whole shift in two instructions.
void shiftl128 (
unsigned int& a,
unsigned int& b,
unsigned int& c,
unsigned int& d,
size_t k)
{
assert (k <= 128);
if (k >= 32) // shifting a 32-bit integer by more than 31 bits is "undefined"
{
a=b;
b=c;
c=d;
d=0;
shiftl128(a,b,c,d,k-32);
}
else
{
a = (a << k) | (b >> (32-k));
b = (b << k) | (c >> (32-k));
c = (c << k) | (d >> (32-k));
d = (d << k);
}
}
void shiftr128 (
unsigned int& a,
unsigned int& b,
unsigned int& c,
unsigned int& d,
size_t k)
{
assert (k <= 128);
if (k >= 32) // shifting a 32-bit integer by more than 31 bits is "undefined"
{
d=c;
c=b;
b=a;
a=0;
shiftr128(a,b,c,d,k-32);
}
else
{
d = (c << (32-k)) | (d >> k); \
c = (b << (32-k)) | (c >> k); \
b = (a << (32-k)) | (b >> k); \
a = (a >> k);
}
}
Instead of using a 128 bit number why not use a bitset? Using a bitset, you can adjust how big you want it to be. Plus you can perform quite a few operations on it.
You can find more information on these here:
http://www.cppreference.com/wiki/utility/bitset/start?do=backlink
First, if you're shifting by n bits and n is greater than or equal to 32, divide by 32 and shift whole integers. This should be trivial. Now you're left with a remaining shift count from 0 to 31. If it's zero, return early, you're done.
For each integer you'll need to shift by the remaining n, then shift the adjacent integer by the same amount and combine the valid bits from each.
Since you mentioned you're storing your 128-bit value in an array of 4 integers, you could do the following:
void left_shift(unsigned int* array)
{
for (int i=3; i >= 0; i--)
{
array[i] = array[i] << 1;
if (i > 0)
{
unsigned int top_bit = (array[i-1] >> 31) & 0x1;
array[i] = array[i] | top_bit;
}
}
}
void right_shift(unsigned int* array)
{
for (int i=0; i < 4; i++)
{
array[i] = array[i] >> 1;
if (i < 3)
{
unsigned int bottom_bit = (array[i+1] & 0x1) << 31;
array[i] = array[i] | bottom_bit;
}
}
}
Related
Let n an integer and 0<=b<=63, b natural number. Find the b-th bit for the number n on it's 64 bit representation with sign.
and T be the number of test cases.
This is my attempt:
#include <iostream>
#define f cin
#define g cout
using namespace std;
int T;
long long n;
int b;
int main()
{
f >> T;
for(int i = 1; i <= T; ++i)
{
f >> n >> b;
int ans = 0;
bool ok = true;
while(n)
{
if(b == ans)
{
g << n % 2;
ok = false;
break;
}
n /= 2;
++ans;
}
if(ok) g << 0;
}
return 0;
}
but it does not work on all test cases... also is there another way to do this? or is there another way to store the bits? is there some special libraries? can you do this more efficiently with other tools? can you give me some information to read about bitmasks? and where and when you should use them and how are they usefull?
Computers already store the integers in its bitwise representation. All you need are bitwise operators to know a particular bit.
int bthbit(long long n, int b) {
if (n & (1ULL << b)) return 1;
return 0;
}
The solution uses bitwise & operator after left-shifting 1 by b bits. You may want to read about bitwise operators and bitmasks .
Came across a question to reverse bits of an unsigned integer. Tried a different approach. However, I'm not very familiar with how bit-wise operators work. Can someone please point what is fundamentally wrong here?
unsigned int reverse(unsigned int A)
{
unsigned int c=0;
int a=0;
while(a < 32)
{
c = c << 1;
c = c | ( A & (1 << a) );
a++;
}
return c;
You shift 1 to the left in both cases and getting the same result. Try to use 10000... (32 bits) and shift it to the right instead of 1 << a
#include<stdio.h>
int main()
{
int num,m,n,t,res,i;
printf("Enter number\n");
scanf("%d",&num);
for(i=31;i>=0;i--)
{
printf("%d",((num>>i)&1));
}
printf("\n");
printf("Enter position 1 and position 2\n");
scanf("%d%d",&m,&n);
printf("enter number\n");
scanf("%d",&t);
res=((num&(~(((~(unsigned)0)>>(32-((m-t)+1)))<<t)))&(num&(~(((~(unsigned)0)>>(32-((n-t)+1)))<<t))))|(((((num&((((~(unsigned)0)>>(((m-t))))<<(n))))>>(m-t))))|(((num&((((~(unsigned)0)>>(((32-n))))<<(32-t))))<<(m-t))));
for(i=31;i>=0;i--)
{
printf("%d",(res>>i)&1);
}
printf("\n");
}
I need to swap bits from (m to m-t) and (n to n-t) in number num.I tried the above code but it doesn't work..can someone please help.
As usual with bit swapping problems, you can save a few instructions by using xor.
unsigned f(unsigned num, unsigned n, unsigned m, unsigned t) {
n -= t; m -= t;
unsigned mask = ((unsigned) 1 << t) - 1;
unsigned nm = ((num >> n) ^ (num >> m)) & mask;
return num ^ (nm << n) ^ (nm << m);
}
It's easier if you break it down into smaller steps.
First, make a bit mask t bits wide. You can do this by subtracting 1 from a power of 2, like this:
int mask = (1 << t) - 1;
For example if t is 3 then mask will be 7 (111 in binary).
Then you can make a copy of num and clear the bits in the range of m to m-t and n to n-t by shifting up the mask, NOTing it and ANDing, so that only bits not covered by the mask remain set:
res = num & ~(mask<<(m-t)) & ~(mask<<(n-t));
Then you can shift the bits in the two ranges into their proper locations and OR with the result. You can do this by shifting down by (n-t), masking, and then shifting up by (m-t), then vice versa:
res |= ((num >> (n-t)) & mask) << (m-t);
res |= ((num >> (m-t)) & mask) << (n-t);
The bits are now in the correct place.
You could do this in one line like this:
res = (num & ~(mask<<(m-t)) & ~(mask<<(n-t))) | (((num >> (n-t)) & mask) << (m-t)) | (((num >> (m-t)) & mask) << (n-t));
And it can be simplified by doing the m-t and n-t subtractions beforehand, assuming you don't want to use the values afterwards:
m -= t; n -= t;
res = (num & ~(mask<<m) & ~(mask<<n)) | (((num >> n)) & mask) << m) | (((num >> m) & mask) << n);
This doesn't work if the two ranges overlap. It's not clear what the correct behaviour would be in that case.
I've been looking at some of these books with fun interview problems. One has a question where one is supposed to write code to flip two bits in a 64-bit integer given the indices of the two bits. After playing around with this for a while I came up with the following code, which is faster than the solution given in the textbook, since it doesn't have any branches:
uint64_t swapbits(uint64_t n, size_t i, size_t j)
{
// extract ith and jth bit
uint64_t bi = ((uint64_t) 0x1 << i) & n;
uint64_t bj = ((uint64_t) 0x1 << j) & n;
// clear ith and jth bit in n
n ^= bi | bj;
n ^= (bi >> i) << j;
n ^= (bj >> j) << i;
return n;
}
My question is essentially the following: Is there an even faster way of doing this?
EDIT: Here's the other implementation as reference:
uint64_t swapbits(uint64_t x, size_t i, size_t j)
{
if(((x >> i) & 1) != ((x >> j) & 1)) {
x ^= (1L << i) | (1L << j);
}
return x;
}
With compiler optimizations the latter is around 35% slower on a Core i7 4770. As I said in the comments, I'm interested in whether there are any interesting tricks for doing this very efficiently. I've seen some extremely clever bit fiddling tricks that can do something that looks fairly complicated in just a few instructions.
Here's a solution which uses only 8 operations. Note that this works even when i == j.
uint64_t swapbits(uint64_t n, size_t i, size_t j)
{
uint64_t x = ((n >> i) ^ (n >> j)) & 1; // x = 1 bit "toggle" flag
return n ^ ((x << i) | (x << j)); // apply toggle to bits i and j
}
Explanation: x is equal to 1 only if the original bits at indices i and j are different (10 or 01), and therefore need to be toggled. Otherwise it's zero and the bits are to remain unchanged (00 or 11). We then apply this toggle bit to the original bits (i.e. XOR it with the original bits) to get the required result.
This question already has answers here:
Multiply two overflowing integers modulo a third
(2 answers)
Closed 9 years ago.
Can someone help me how to calculate (A*B)%C, where 1<=A,B,C<=10^18 in C++, without big-num, just a mathematical approach.
Off the top of my head (not extensively tested)
typedef unsigned long long BIG;
BIG mod_multiply( BIG A, BIG B, BIG C )
{
BIG mod_product = 0;
A %= C;
while (A) {
B %= C;
if (A & 1) mod_product = (mod_product + B) % C;
A >>= 1;
B <<= 1;
}
return mod_product;
}
This has complexity O(log A) iterations. You can probably replace most of the % with a conditional subtraction, for a bit more performance.
typedef unsigned long long BIG;
BIG mod_multiply( BIG A, BIG B, BIG C )
{
BIG mod_product = 0;
// A %= C; may or may not help performance
B %= C;
while (A) {
if (A & 1) {
mod_product += B;
if (mod_product > C) mod_product -= C;
}
A >>= 1;
B <<= 1;
if (B > C) B -= C;
}
return mod_product;
}
This version has only one long integer modulo -- it may even be faster than the large-chunk method, depending on how your processor implements integer modulo.
Live demo: https://ideone.com/1pTldb -- same result as Yakk's.
An implementation of this stack overflow answer prior:
#include <stdint.h>
#include <tuple>
#include <iostream>
typedef std::tuple< uint32_t, uint32_t > split_t;
split_t split( uint64_t a )
{
static const uint32_t mask = -1;
auto retval = std::make_tuple( mask&a, ( a >> 32 ) );
// std::cout << "(" << std::get<0>(retval) << "," << std::get<1>(retval) << ")\n";
return retval;
}
typedef std::tuple< uint64_t, uint64_t, uint64_t, uint64_t > cross_t;
template<typename Lambda>
cross_t cross( split_t lhs, split_t rhs, Lambda&& op )
{
return std::make_tuple(
op(std::get<0>(lhs), std::get<0>(rhs)),
op(std::get<1>(lhs), std::get<0>(rhs)),
op(std::get<0>(lhs), std::get<1>(rhs)),
op(std::get<1>(lhs), std::get<1>(rhs))
);
}
// c must have high bit unset:
uint64_t a_times_2_k_mod_c( uint64_t a, unsigned k, uint64_t c )
{
a %= c;
for (unsigned i = 0; i < k; ++i)
{
a <<= 1;
a %= c;
}
return a;
}
// c must have about 2 high bits unset:
uint64_t a_times_b_mod_c( uint64_t a, uint64_t b, uint64_t c )
{
// ensure a and b are < c:
a %= c;
b %= c;
auto Z = cross( split(a), split(b), [](uint32_t lhs, uint32_t rhs)->uint64_t {
return (uint64_t)lhs * (uint64_t)rhs;
} );
uint64_t to_the_0;
uint64_t to_the_32_a;
uint64_t to_the_32_b;
uint64_t to_the_64;
std::tie( to_the_0, to_the_32_a, to_the_32_b, to_the_64 ) = Z;
// std::cout << to_the_0 << "+ 2^32 *(" << to_the_32_a << "+" << to_the_32_b << ") + 2^64 * " << to_the_64 << "\n";
// this line is the one that requires 2 high bits in c to be clear
// if you just add 2 of them then do a %c, then add the third and do
// a %c, you can relax the requirement to "one high bit must be unset":
return
(to_the_0
+ a_times_2_k_mod_c(to_the_32_a+to_the_32_b, 32, c) // + will not overflow!
+ a_times_2_k_mod_c(to_the_64, 64, c) )
%c;
}
int main()
{
uint64_t retval = a_times_b_mod_c( 19010000000000000000, 1011000000000000, 1231231231231211 );
std::cout << retval << "\n";
}
The idea here is to split your 64-bit integer into a pair of 32-bit integers, which are safe to multiply in 64-bit land.
We express a*b as (a_high * 2^32 + a_low) * (b_high * 2^32 + b_low), do the 4-fold multiplication (keeping track of the 232 factors without storing them in our bits), then note that doing a * 2^k % c can be done via a series of k repeats of this pattern: ((a*2 %c) *2%c).... So we can take this 3 to 4 element polynomial of 64-bit integers in 232 and reduce it without having to worry about things.
The expensive part is the a_times_2_k_mod_c function (the only loop).
You can make it go many times faster if you know that c has more than one high bit clear.
You could instead replace the a %= c with subtraction a -= (a>=c)*c;
Doing both isn't all that practical.
Live example