turning off leftmost non zero bit of a number - c++

how can I turn off leftmost non-zero bit of a number in O(1)?
for example
n = 366 (base 10) = 101101110 (in base 2)
then after turning the leftmost non-zero bit off ,number looks like = 001101110
n will always be >0

Well, if you insist on O(1) under any circumstances, the Intel Intrinsics function _bit_scan_reverse() defined in immintrin.h does a hardware find for the most-significant non-zero bit in a int number.
Though the operation does use a loop (functional equivalent), I believe its constant time given its latency at fixed 3 (as per Intel Intrinsics Guide).
The function will return the index to the most-significant non-zero bit thus doing a simple:
n = n & ~(1 << _bit_scan_reverse(n));
should do.
This intrinsic is undefined for n == 0. So you gotta watch out there. I'm following the assumption of your original post where n > 0.

n = 2^x + y.
x = log(n) base 2
Your highest set bit is x.
So in order to reset that bit,
number &= ~(1 << x);
Another approach:
int highestOneBit(int i) {
i |= (i >> 1);
i |= (i >> 2);
i |= (i >> 4);
i |= (i >> 8);
i |= (i >> 16);
return i - (i >> 1);
}
int main() {
int n = 32767;
int z = highestOneBit(n); // returns the highest set bit number i.e 2^x.
cout<< (n&(~z)); // Resets the highest set bit.
return 0;
}

Check out this question, for a possibly faster solution, using a processor instruction.
However, an O(lgN) solution is:
int cmsb(int x)
{
unsigned int count = 0;
while (x >>= 1) {
++count;
}
return x & ~(1 << count);
}

If ANDN is not supported and LZCNT is supported, the fastest O(1) way to do it is not something along the lines of n = n & ~(1 << _bit_scan_reverse(n)); but rather...
int reset_highest_set_bit(int x)
{
const int mask = 0x7FFFFFFF; // 011111111[...]
return x & (mask >> __builtin_clz(x));
}

Related

Next largest integer with some middle bits matching a pattern?

My input is:
a bit mask mask of width n and some offset k >=0
a bit pattern pattern with 1s in some (but not necessarily all) the positions where the bit mask has 1s.
an integer val
I want to find the next largest integer result such that:
result > val
result & mask == pattern
For example, suppose mask = 0xFF00 and pattern = 0x0100. Then we expect the following result:
NextLargest(mask, pattern, 0x00000) => 0x00100
NextLargest(mask, pattern, 0x000FF) => 0x00100
NextLargest(mask, pattern, 0x010FE) => 0x001FF
NextLargest(mask, pattern, 0x010FF) => 0x10100
Another example -- say mask = 0xF and pattern = 0xF. Then we expect:
NextLargest(mask, pattern, 0x20) => 0x2F.
I've tried something like "strip out the bits that mask cares about, increment it, OR back in pattern and return" but I keep hitting edge cases. The problem is something like a generalization of finding the next largest multiple of some integer.
Here's my attempt so far (runnable link: https://ideone.com/AhXG5M):
#include <iostream>
using namespace std;
using uint32 = unsigned long;
uint32 NextLargest(int width, int offset, uint32 mask, uint32 pattern, uint32 val) {
unsigned long long ret = (val + 1) & ~mask;
if ((ret & ((1 << (offset + 1)) - 1)) == 0) {
// "carry" across the mask
ret += 1 << (offset + width);
}
return ret | pattern;
}
int main() {
// your code goes here
int width = 12;
int offset = 4;
uint32 significant_bits = (1 << (width + 1) - 1) << offset;
uint32 wanted_bits = 0xFFF << offset;
cout << hex;
// want 0xFFF1 -- correct
cout << NextLargest(width, offset, significant_bits, wanted_bits, 0) << endl;
// want 0xFFF2 -- correct
cout << NextLargest(width, offset, significant_bits, wanted_bits, 1) << endl;
// want 0x1FFFF0 -- incorrect, get 0xFFF0
cout << NextLargest(width, offset, significant_bits, wanted_bits, 0xF) << endl;
return 0;
}
I didn't test this, but the following algorithm should work (pseudocode):
let mask, pattern, and val be inputs
let fls be function that finds last bit set in word
let ffs be function that finds first bit set in a word
let applied be (val & ~mask) | pattern
if applied is greater than val then
return applied
let low_order_mask be (1 << ffs(mask)) - 1
if applied == val then
let flipped_low be (~value & low_order_mask)
if not flipped_low then
return applied + 1 // no need to carry
// need to carry
let set_low_zero be applied & ~low_order_mask
let carry be 1 << (fls(mask) + 1)
return set_low_zero + carry
fls and ffs are provided by POSIX, but other systems might not do so. There are answers on SO for how to implement these if you need to.
Think of the value broken into 3.
The bits above the mask, in the mask and below the mask.
H(value), M(value), L(value).
We know M(result)==pattern.
We have three candidates.
C1 is H(value)+pattern+0.
C2 is H(value)+pattern+L(value)+1
C3 is H(value)+pattern+X
X==(mask<<1)&~mask. That is the lowest bit above the mask.
If pattern>M(value) we can use C1.
Reducing the high-bits will get a number <value and setting any low bits will increase the number.
If pattern==M(value) then we can try C2 which is actually value+1.
That fails if adding one overflows to the pattern bits.
That means all the low bits are set and the next lowest place to add is the first bit above the mask.
unsigned next_masked(unsigned mask,unsigned pattern,unsigned value){
unsigned reduced_pattern=(mask&pattern);//May not be required...
unsigned over_add=(mask<<1)&~mask;
unsigned upper_mask=~(over_add-1);
unsigned cand=(value&upper_mask)|reduced_pattern;
if(cand>value){
return cand;
}
if((value&mask)==reduced_pattern){
unsigned scand=value+1;
if((scand&mask)==reduced_pattern){
return scand;
}
}
return cand + over_add;
}
Here it is again with some unit tests:
#include <iostream>
unsigned next_masked(unsigned mask,unsigned pattern,unsigned value){
unsigned reduced_pattern=(mask&pattern);//May not be required...
unsigned over_add=(mask<<1)&~mask;
unsigned upper_mask=~(over_add-1);
unsigned cand=(value&upper_mask)|reduced_pattern;
if(cand>value){
return cand;
}
if((value&mask)==reduced_pattern){
unsigned scand=value+1;
if((scand&mask)==reduced_pattern){
return scand;
}
}
return cand + over_add;
}
bool invariant_next_masked(unsigned mask,unsigned pattern,unsigned value,unsigned result){
if((result&mask)!=(pattern&mask)){
return false;
}
if(result<=value){
return false;
}
for(unsigned test=result-1;test>value;--test){
if((test&mask)==(pattern&mask)){
return false;
}
}
return true;
}
int check_next_masked(unsigned mask,unsigned pattern,unsigned value,unsigned expect){
unsigned result=next_masked(mask,pattern,value);
if(result!=expect){
std::cout << std::hex << mask << ' ' << std::hex << pattern << ' ' << std::hex <<value << "==" << std::hex <<result << "!=" << std::hex <<expect <<'\n';
return 1;
}
if(!invariant_next_masked(mask,pattern,value,result)){
return 1;
}
return 0;
}
int main() {
int errors=0;
errors+=check_next_masked(0xFF00,0x0100,0x0000,0x00100);
errors+=check_next_masked(0xFF00,0x0100,0x00FF,0x00100);
errors+=check_next_masked(0xFF00,0x0100,0x10FE,0x10100);
errors+=check_next_masked(0xFF00,0x0100,0x1067,0x10100);
errors+=check_next_masked(0xFF00,0x0100,0x10123,0x10124);
errors+=check_next_masked(0xFF00,0x0100,0x110FF,0x20100);
errors+=check_next_masked(0xFF00,0x0100,0x102FF,0x20100);
errors+=check_next_masked(0xFF00,0x0100,0x101FF,0x20100);
errors+=check_next_masked(0x000F,0x0007,0x10123,0x10127);
errors+=check_next_masked(0x000F,0x0007,0x10128,0x10137);
errors+=check_next_masked(0x0FF0,0x0230,0x10128,0x10230);
errors+=check_next_masked(0x0FFF0,0x01230,0x01231,0x01232);
errors+=check_next_masked(0x0FFF0,0x01230,0x41237,0x41238);
errors+=check_next_masked(0x0FFF0,0x01230,0x4123F,0x51230);
if(errors>0){
std::cout << "Errors "<< errors << '\n';
return 1;
}
std::cout << "Success\n";
return 0;
}
is the problem to calculate the LARGEST or SMALLEST next value?? the largest value seems odd to be. if the requirement is to calculate the smallest value, I think this code should work: (tested on gcc 7.1, assuming a 64 bit target, sizeof(void *) == sizeof(size_t) == sizeof(uint64_t))
size_t next_smallest_value (size_t mask, size_t pattern, size_t x) {
assert(pattern & mask == pattern);
// only change bits within mask range to meet the requirement
auto y = x & ~mask | pattern;
if (y > x) {
// if the operation increased the value
// mask off all the lower bits
auto lsb_mask = __builtin_ctzll(mask);
return y & ~ones(lsb_mask);
} else {
// otherwise, the operation decreased or didn't change the value
// need to increase the fraction higher than the mask
auto msb_mask = 63 - __builtin_clzll(mask);
// higher part cannot be empty if the masked part decrease
assert(msb_mask < 63);
auto higher = ((y >> msb_mask) + 1) << msb_mask;
// also higher part cannot overflow
assert(higher != 0);
return y & mask | higher;
}
}
the idea is very simple: divide the bits into 3 parts: higher part, masked part, lower part. the masked part can be derived directly from and is determined by the mask and pattern, it cannot be other values.
after calculating the masked bits, if the value increase, just mask off all bits in the lower part. otherwise, increase the higher part by 1 (and also mask off all the lower bits).
the above code doesn't deal with ill-formed input, it will trigger the assertions, but the checks are not exhausted.
Here is two functions for this little bit confusing question
First function gives the largest next integer that fulfills the requirements for the result.
Second one gives SMALLEST next value.
1: Get the LARGEST integer which satisfies result & mask == pattern and result > val:
unsigned NextLargest (unsigned mask, unsigned pattern, unsigned val) {
// zero "mask" bits and set "pattern" bits in largest (unsigned) int
unsigned const x = ~mask | pattern;
// if result is not greater than val, we can't satisfy requirements
if (x <= val) {
... report error, return error code or throw something
}
return x;
}
Obviously this just returns highest (unsigned) integer value that meets the requirements result & mask == pattern and result > val. The if-clause checks if result will not be greater than val, and the function will fail.
2: Get the SMALLEST next value after val that meets the requirements:
unsigned NextSmallest (unsigned mask, unsigned pattern, unsigned val) {
unsigned const x = (val + mask + 1) & ~mask | pattern;
if (x <= val) {
... increment wrapped, can't give greater value
}
return x;
}
edit: Changed (val|mask) to val+mask because the result must be still greater than val.
This function calculates val + 1 and carrying overflowing bits over mask'd bits.
Here are few examples what the function does, if mask = 0x0ff00 and pattern = 0x00500:
val +mask +1 &~mask |pattern == result
0x00000 0x0ff00 0x0ff01 0x00001 0x00501
0x00001 0x0ff01 0x0ff02 0x00002 0x00502
0x000fe 0x0fffe 0x0ffff 0x000ff 0x005ff
0x000ff 0x0ffff 0x10000 0x10000 0x10500
0x00100 0x10000 0x10001 0x10001 0x10501
0x0f000 0x1ef00 0x1ef01 0x10001 0x10501
0x0ff00 0x1fe00 0x1fe01 0x10001 0x10501
0x0ffff 0x1feff 0x1ff00 0x10000 0x10500
0x10000 0x1ff00 0x1ff01 0x10001 0x10501
0x10001 0x1ff01 0x1ff02 0x10002 0x10502
0x100ff 0x1ffff 0x20000 0x20000 0x20500
After long editing and rewriting I still can't give good enough answer for the question. Its examples has weird results. I still leave this here, if someone finds these functions or parts of them useful. Also I did not actually tested the functions on the computer.

Swapping lower byte (0-7) with the higher one (8-15) one

I now know how it's done in one line, altough I fail to realise why my first draft doesn't work aswell. What I'm trying to do is saving the lower part into a different variable, shifting the higher byte to the right and adding the two numbers via OR. However, it just cuts the lower half of the hexadecimal and returns the rest.
short int method(short int number) {
short int a = 0;
for (int x = 8; x < 16; x++){
if ((number & (1 << x)) == 1){
a = a | (1<<x);
}
}
number = number >> 8;
short int solution = number | a;
return solution;
You are doing it one bit at a time; a better approach would do it with a single operation:
uint16_t method(uint16_t number) {
return (number << 8) | (number >> 8);
}
The code above specifies 16-bit unsigned type explicitly, thus avoiding issues related to sign extension. You need to include <stdint.h> (or <cstdint> in C++) in order for this to compile.
if ((number & (1 << x)) == 1)
This is only going to return true if x is 0. Since 1 in binary is 00000000 00000001, and 1 << x is going to set all but the x'th bit to 0.
You don't care if it's 1 or not, you just care if it's non-zero. Use
if (number & (1 << x))

Swapping two bits in an integer as quickly as possible

I've been looking at some of these books with fun interview problems. One has a question where one is supposed to write code to flip two bits in a 64-bit integer given the indices of the two bits. After playing around with this for a while I came up with the following code, which is faster than the solution given in the textbook, since it doesn't have any branches:
uint64_t swapbits(uint64_t n, size_t i, size_t j)
{
// extract ith and jth bit
uint64_t bi = ((uint64_t) 0x1 << i) & n;
uint64_t bj = ((uint64_t) 0x1 << j) & n;
// clear ith and jth bit in n
n ^= bi | bj;
n ^= (bi >> i) << j;
n ^= (bj >> j) << i;
return n;
}
My question is essentially the following: Is there an even faster way of doing this?
EDIT: Here's the other implementation as reference:
uint64_t swapbits(uint64_t x, size_t i, size_t j)
{
if(((x >> i) & 1) != ((x >> j) & 1)) {
x ^= (1L << i) | (1L << j);
}
return x;
}
With compiler optimizations the latter is around 35% slower on a Core i7 4770. As I said in the comments, I'm interested in whether there are any interesting tricks for doing this very efficiently. I've seen some extremely clever bit fiddling tricks that can do something that looks fairly complicated in just a few instructions.
Here's a solution which uses only 8 operations. Note that this works even when i == j.
uint64_t swapbits(uint64_t n, size_t i, size_t j)
{
uint64_t x = ((n >> i) ^ (n >> j)) & 1; // x = 1 bit "toggle" flag
return n ^ ((x << i) | (x << j)); // apply toggle to bits i and j
}
Explanation: x is equal to 1 only if the original bits at indices i and j are different (10 or 01), and therefore need to be toggled. Otherwise it's zero and the bits are to remain unchanged (00 or 11). We then apply this toggle bit to the original bits (i.e. XOR it with the original bits) to get the required result.

Compact a hex number

Is there a clever (ie: branchless) way to "compact" a hex number. Basically move all the 0s all to one side?
eg:
0x10302040 -> 0x13240000
or
0x10302040 -> 0x00001324
I looked on Bit Twiddling Hacks but didn't see anything.
It's for a SSE numerical pivoting algorithm. I need to remove any pivots that become 0. I can use _mm_cmpgt_ps to find good pivots, _mm_movemask_ps to convert that in to a mask, and then bit hacks to get something like the above. The hex value gets munged in to a mask for a _mm_shuffle_ps instruction to perform a permutation on the SSE 128 bit register.
To compute mask for _pext:
mask = arg;
mask |= (mask << 1) & 0xAAAAAAAA | (mask >> 1) & 0x55555555;
mask |= (mask << 2) & 0xCCCCCCCC | (mask >> 2) & 0x33333333;
First do bit-or on pairs of bits, then on quads. Masks prevent shifted values from overflowing to other digits.
After computing mask this way or harold's way (which is probably faster) you don't need the full power of _pext, so if targeted hardware doesn't support it you can replace it with this:
for(int i = 0; i < 7; i++) {
stay_mask = mask & (~mask - 1);
arg = arg & stay_mask | (arg >> 4) & ~stay_mask;
mask = stay_mask | (mask >> 4);
}
Each iteration moves all nibbles one digit to the right if there is some space. stay_mask marks bits that are in their final positions. This uses somewhat less operations than Hacker's Delight solution, but might still benefit from branching.
Supposing we can use _pext_u32, the issue then is computing a mask that has an F for every nibble that isn't zero. I'm not sure what the best approach is, but you can compute the OR of the 4 bits of the nibble and then "spread" it back out to F's like this:
// calculate horizontal OR of every nibble
x |= x >> 1;
x |= x >> 2;
// clean up junk
x &= 0x11111111;
// spread
x *= 0xF;
Then use that as the mask of _pext_u32.
_pext_u32 can be emulated by this (taken from Hacker's Delight, figure 7.6)
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
But that's a bit of a disaster. It's probably better to just resort to branching code then.
uint32_t fun(uint32_t val) {
uint32_t retVal(0x00);
uint32_t sa(28);
for (int sb(28); sb >= 0; sb -= 4) {
if (val & (0x0F << sb)) {
retVal |= (0x0F << sb) << (sa - sb)
sa -= 4;
}
}
return retVal;
}
I think this (or something similar) is what you're looking for. Eliminating the 0 nibbles within a number. I've not debugged it, and it would only works on one side atm.
If your processor supports conditional instruction execution, you may get a benefit from this algorithm:
uint32_t compact(uint32_t orig_value)
{
uint32_t mask = 0xF0000000u; // Mask for isolating a hex digit.
uint32_t new_value = 0u;
for (unsigned int i = 0; i < 8; ++i) // 8 hex digits
{
if (orig_value & mask == 0u)
{
orig_value = orig_value << 4; // Shift the original value by 1 digit
}
new_value |= orig_value & mask;
mask = mask >> 4; // next digit
}
return new_value;
}
This looks like a good candidate for loop unrolling.
The algorithm assumes that when the original value is shifted left, zeros are shifted in, filling in the "empty" bits.
Edit 1:
On a processor that supports conditional execution of instructions, the shifting of the original value would be conditionally executed depending on the result of the ANDing of the original value and the mask. Thus no branching, only ignored instructions.
I came up with the following solution. Please take a look, maybe it will help you.
#include <iostream>
#include <sstream>
#include <algorithm>
using namespace std;
class IsZero
{
public:
bool operator ()(char c)
{
return '0' == c;
}
};
int main()
{
int a = 0x01020334; //IMPUT
ostringstream my_sstream;
my_sstream << hex << a;
string str = my_sstream.str();
int base_str_length = str.size();
cout << "Input hex: " << str << endl;
str.insert(remove_if(begin(str), end(str), IsZero()), count_if(begin(str), end(str), IsZero()), '0');
str.replace(begin(str) + base_str_length, end(str), "");
cout << "Processed hex: " << str << endl;
return 0;
}
Output:
Input hex: 1020334
Processed hex: 1233400

What is the fastest way to calculate the number of bits needed to store a number

I'm trying to optimize some bit packing and unpacking routines. In order to do the packing I need to calculate the number of bits needed to store integer values. Here is the current code.
if (n == -1) return 32;
if (n == 0) return 1;
int r = 0;
while (n)
{
++r;
n >>= 1;
}
return r;
Non-portably, use the bit-scan-reverse opcode available on most modern architectures. It's exposed as an intrinsic in Visual C++.
Portably, the code in the question doesn't need the edge-case handling. Why do you require one bit for storing 0? In any case, I'll ignore the edges of the problem. The guts can be done efficiently thus:
if (n >> 16) { r += 16; n >>= 16; }
if (n >> 8) { r += 8; n >>= 8; }
if (n >> 4) { r += 4; n >>= 4; }
if (n >> 2) { r += 2; n >>= 2; }
if (n - 1) ++r;
You're looking to determine the integer log base 2 of a number (the l=highest bit set). Sean Anderson's "Bit Twiddling Hacks" page has several methods ranging from the obvious counting bits in a loop to versions that use table lookup. Note that most of the methods demonstrated will need to be modified a bit to work with 64-bit ints if that kind of portability is important to you.
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
Just make sure that any shifting you're using to work out the highest bit set needs to be done' on an unsigned version of the number since a compiler implementation might or might not sign extend the >> operation on a signed value.
What you are trying to do is find the most significant bit. Some architectures have a special instruction just for this purpose. For those that don't, use a table lookup method.
Create a table of 256 entries, wherein each element identifies the upper most bit.
Either loop through each byte in the number, or use a few if-statements to break to find the highest order non-zero byte.
I'll let you take the rest from here.
Do a binary search instead of a linear search.
if ((n >> 16) != 0)
{
r += 16;
n >>= 16;
}
if ((n >> 8) != 0)
{
r += 8;
n >>= 8;
}
if ((n >> 4) != 0)
{
r += 4;
n >>= 4;
}
// etc.
If your hardware has bit-scan-reverse, an even faster approach would be to write your routine in assembly language. To keep your code portable, you could do
#ifdef ARCHITECTURE_WITH_BSR
asm // ...
#else
// Use the approach shown above
#endif
You would have to check the execution time to figure the granularity, but my guess is that doing 4 bits at a time, and then reverting to one bit at a time would make it faster. Log operations would probably be slower than logical/bit operations.
if (n < 0) return 32;
int r = 0;
while (n && 0x7FFFFFF0) {
r+=4;
n >>= 4; }
while (n) {
r++;
n >>= 1; }
return r;
number_of_bits = log2(integer_number)
rounded to the higher integer.