Set every nth bit in an integer without for loop - c++

Is there a way to set every nth bit in an integer without using a for loop?
For example, if n = 3, then the result should be ...100100100100. This is easy enough with a for loop, but I am curious if this can be done without one.
--
For my particular application, I need to do this with a custom 256-bit integer type, that has all the bit operations that a built-in integer has. I'm currently using lazily initialized tables (using for loops) and that is good enough for what I'm doing. This was mostly an exercise in bit-twidling for me, but I couldn't figure out how to do it in a few steps/instructions, and couldn't easily find anything online about this.

… I need to do this with a custom 256-bit integer type.
Set r to 256 % n.
Set d to ((uint256_t) 1 << n) - 1. Then the binary representation of d is a string of n 1 bits.
Set t to UINT256_MAX << r >> r. This removes the top r bits from UINT256_MAX. UINT256_MAX is of course 2256−1. This leaves t as a string of width-r 1 bits, and width-r is some multiple of n, say k*n.
Set t to t/d. As a string of k*n 1 bits divided by a string of n 1 bits, this produces a quotient that is 000…0001 repeated k times, where each 000…0001 is n-1 0 bits followed by one 1 bit.
Now t is the desired bit pattern except the highest desired bit may be missing if r is not zero. To add this bit, if needed, OR t with t << n.
Now t is the desired value.
Alternately:
Set t to 1.
OR t with t << n.
OR t with t << 2*n.
OR t with t << 4*n.
OR t with t << 8*n.
OR t with t << 16*n.
OR t with t << 32*n.
OR t with t << 64*n.
OR t with t << 128*n.
Those shifts must be defined (shifting by zero would suffice) or suppressed when the shift amount exceeds the integer width, 256 bits.

Related

parity of set bits after xor of two numbers

I found an observation by testing in C++.
Observation is ,
1 ) If two numbers where both numbers have odd number of set bits in it then its XOR will have even number of set bits in it.
2 ) If two numbers where both numbers have even number of set bits in it then its XOR will have even number of set bits in it.
1 ) If two numbers where one number has even number of set bits and another has odd number of set bits then its XOR will have odd number of set bits in it.
I could not prove it. I want to prove it. Please help me.
Code that i executed on my computer is
#include<bits/stdc++.h>
using namespace std;
int main(){
vector<int> vec[4];
for(int i=1;i<=100;i++){
for(int j=i+1;j<=100;j++){
int x=__builtin_popcount(i)%2;
int y=__builtin_popcount(j)%2;
int in=0;
in|=(x<<1);
in|=(y<<0);
int v=__builtin_popcount(i^j)%2;
vec[in].push_back(v);
}
}
for(int i=0;i<4;i++){
for(int j=0;j<vec[i].size();j++) cout<<vec[i][j] << " ";
cout << endl;
}
return 0;
}
It gives me
100 zeros in first line
100 ones in second line
100 ones in third line
100 zeros in fourth line
If there is a doubt in understanding the code then please tell me in comments.
This behavior mirrors an easy-to-prove arithmetical fact:
When you add two odd numbers, you get an even number,
When you add two even numbers, you get an even number,
When you add an odd number to an even number, you get an odd number.
With this fact in hand, consider the truth table of XOR, and note that for each of the four options in the table ({0, 0 => 0}, {0, 1 => 1}, {1, 0 => 1}, {1, 1, => 0}) the odd/even parity of the count of 1s remains invariant. In other words, if the input has an odd number of 1s, the output will have an odd number of 1s as well, and vice versa.
This observation explains why you observe the result: XORing two numbers with the counts of set bits of N and M will yield a number that has the same odd/even parity as N+M.
Thanks all who tried to answer.
We can give proof like this,
Suppose N is number of set bits in first number and M is set bits in second number.
Then set bits in XOR of these two numbers is N+M - 2 (Δ) where is delta is total number of bit positions where both of numbers have set bit. Now this expression explains every thing.
even + odd - even = odd
odd + odd - even = even
even + even - even = even
xor just clears out common bits. It doesn't matter how many bits are set, just how many bits are common.
With all bits common, the result is zero. With no bits in common, the result is the sum of set bits.
No conclusions based on parity of inputs unless you also account for parity of common bits.
A possible proof is based in the observation that xor is a conmutative opperator, so (xor digits of x) xor (xor digits of y) = xor of digits of (x xor y)

Get the low portion of a number of any of the built-in types [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
How would I create a function template which returns the low portion of a number of N bits?
For example, for an 8 bit number, get the least significant 4 bits, for a 16 bit number, get the least significant 8 bits.
To get the lower half of a built-in integer type you can try something like this:
#include <iostream>
#include <climits>
using std::cout;
using std::endl;
template<typename T>
constexpr T lowbits(T v) {
return v & (T(1) << CHAR_BIT * sizeof v / 2) - 1;
}
int main() {
cout << std::hex << (int)lowbits<int8_t>(0xde) << endl; // will print e
cout << std::hex << lowbits<int16_t>(0xdead) << endl; // will print ad
cout << std::hex << lowbits<int32_t>(0xdeadbeef) << endl; // will print beef
cout << std::hex << lowbits<int64_t>(0xbeefdeaddeadbeef) << endl; // will print deadbeef
}
Note that
return v & (T(1) << CHAR_BIT * sizeof v / 2) - 1;
is equivalent to:
return v & (
(static_cast<T>(1)
<<
(CHAR_BIT * (sizeof v) / 2)) // number of bits divided by 2
- 1
);
In essence you are creating a bit-mask (simply another integer) that has 0-bits for all higher bits and 1-bits for all lower bits.
If an integer type has N bits this is done by shifting a 1-bit into the Nth position and then subtracting 1 from it. The subtraction has the result that all bits below the 1 will be set.
And-ing this with the given value yields only the lower half of the value v.
You can easily generalize this approach to retrieving any number of lower bits by replacing CHAR_BIT * sizeof v/2 with the number of bits you want to retrieve.
To get only the higher bits you can simply negate the resulting mask using the ~ operator.
If you require arbitrary sized integers you can try finding the equivalent operations for this procedure in the GNU gmp library.
Let us define a variable called mask which is the pattern to mask off (or retain) some bits. The operation to get the least significant bits is:
result = value & mask;
For an example, test with value == 13 and mask == 7.
This works will all POD types, except for floating point. The least significant Q bits of a floating point, doesn't make sense (unless you really need to do this).
If you have no need for more bits than the largest internal integral type, you could use something like this:
template <typename T>
T low_bits(T data, size_t bit_count)
{
T mask = (1U << bit_count) - 1U;
return value & mask;
}
For a non-template solution, one could use a macro:
#define LOW_BITS(value, bit_count) \
(value & ((1U << bit_count) - 1U))
This lets the compiler figure out the code based on the data type of value.
A macro form of the expression: value & mask.
The thorn or issue comes into play when N > sizeof(*largest type*). In this case, the number can't be represented by internal data types, so one has to come up with a different solution.
The solution for N-bit depends on whether the multi-byte representation of the number is Big Endian or Little Endian. For Big Endian platforms, the least significant value will be at highest address, while on Little Endian platforms, the least significant is at the lowest address.
The solution I'm proposing treats the N-bit number as an array of bytes. A byte contains 8-bits (on most platforms), and bytes can be masked differently than multibyte quantities.
Here's the algorithm:
1. Copy the least significant bytes that are completely masked to the result variable.
2. Mask the next largest byte and copy result byte to result number.
3. Pad remaining bytes with 0.
As far as the function parameters go, you'll need:
1) Pointer to the memory location of the original number.
2) Pointer to the result number.
3) Pointer to the mask.
4) Size of the number, in bytes.
The algorithm can handle N-bit numbers, limited by the amount of memory on the platform.
Note: sorry about not providing code, but I need to get back to work. :-(

what's the purpose of this code? is it counting number of digits?

if ((16 << (int)(4*((num.length()-2)-i))) == 0)
What does it mean? Is it a bit manipulation? It could be written much more simpler if it's only counting the digits, this is why I thought it might be something different than what I know.
<< is the left shift operator. Shifting left by nis the same as multiplying by 2 n times. If you shift far enough, all set bits will "fall over the edge" and the result will be 0.
16 << n will become 0 if n > sizeof(int)* BITS_PER_CHAR - 4.
So the expression can be written as:
if ((sizeof(int) * BITS_PER_CHAR - 4) < (int)(4*((num.length()-2)-i))
BITS_PER_CHAR is 8 on any POSIX-compliant system. sizeof(int)*BITS_PER_CHAR is usually 32, but can be other values.

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.

Analysis of the usage of prime numbers in hash functions

I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).