I'm trying to toggle the most significant bit of an unsigned int based on a bool flag. This is my code for an hypothetical K = unit64_t:
This is the Item class:
template<typename K>
class Item {
public:
K first;
Item() = default;
explicit Item(const K &elem, const bool flag = false) {
first = elem & 0x3FFFFFFFFFFFFFFF;
first |= (flag * 0x8000000000000000);
}
};
Is there a way to do this fully generical? That it works for all kind of numeric K?
I tried with the 8 * sizeof(K) but it doesn't work.
An option using only bit operations:
template<typename T>
void Item(T& elem, bool flag = false) {
T mask = (T)1 << (sizeof(T) * 8 - 1);
elem = (elem & ~mask) | (flag ? mask : 0);
}
You can leverage std::bitset for this. Not sure how well this optimizes, but it should optimize well and it will work generically and requires no bitwise operation knowledge.
template <typename T>
void toggle_msb(T& val)
{
constexpr auto bit_width = sizeof(T) * CHAR_BIT;
std::bitset<bit_width> temp(val);
val = temp.flip(bit_width - 1).to_ullong();
}
Using bitwise operations, but without explicitly depending on the size of T:
template<typename T>
T set_top_bit(T value, bool state) {
constexpr T mask = T(~T(0)) >> 1;
value &= mask;
if (state) value |= ~mask;
return value;
}
T(~T(0)) gets a T with all bits set1; >> 1 throws out the bottom bit getting a 0 in from the top, so in mask we have a T with all bit set but the topmost. Notice that all this dance is purely formal—this is all evaluated at compile time.
The rest is pretty much like your code: mask out the top bit from value and OR it back in depending on state (~mask will be a T with only the top bit set).
Plain ~0 would result in an int set to -1, ~0U in an unsigned int with all bits set; to obtain a T with all bits set, we need to flip the bits of a T(0) (so, a 0 of our T type), and also to cast back to T later, because, if T is smaller than int, ~T(0) is actually equivalent to ~int(T(0)), so ~0, due to integer promotion rules.
From this great post, How do you set, clear, and toggle a single bit?, you can use this branchless version:
#include <climits>
template<typename T>
constexpr T set_top_bit_v2(T value, bool state) {
constexpr auto msb = (sizeof(T) * CHAR_BIT) - 1;
return (value & ~(T{1} << msb)) | (T{state} << msb);
}
Comparing the output with the version of Matteo Italia on Godbolt (here), despite the differences Clang seems to generate the same code, while GCC and MSVC seem to emit less instructions when using this version.
Related
I have two bit streams A[1..a] and B[1..b], where a is always smaller than b. Now, given an index c in B, I want to know if A matches the area B[c..c+a-1] (assume c+a-1<=b always hold).
I can't just use memcmp because A and B[c..c+a-1] are not necessarily byte-aligned.
So I have a custom function that compares A and B[c..c+a-1] bitwise, where B is encoded within a class that performs bit operations. This is my C++ code:
#include<cstddef>
#include<cstdint>
struct bitstream{
constexpr static uint8_t word_bits = 64;
constexpr static uint8_t word_shift = 6;
const static size_t masks[65];
size_t *B;
inline bool compare_chunk(const void* A, size_t a, size_t c) {
size_t n_words = a / word_bits;
size_t left = c & (word_bits - 1UL);
size_t right = word_bits - left;
size_t cell_i = c >> word_shift;
auto tmp_in = reinterpret_cast<const size_t *>(A);
size_t tmp_data;
//shift every cell in B[c..c+a-1] to compare it against A
for(size_t k=0; k < n_words - 1; k++){
tmp_data = (B[cell_i] >> left) & masks[right];
tmp_data |= (B[++cell_i] & masks[left]) << right;
if(tmp_data != tmp_in[k]) return false;
}
size_t read_bits = (n_words - 1) << word_shift;
return (tmp_in[n_words - 1] & masks[(a-read_bits)]) == read(c + read_bits, c+a-1);
}
inline size_t read(size_t i, size_t j) const{
size_t cell_i = i >> word_shift;
size_t i_pos = (i & (word_bits - 1UL));
size_t cell_j = j >> word_shift;
if(cell_i == cell_j){
return (B[cell_i] >> i_pos) & masks[(j - i + 1UL)];
}else{
size_t right = word_bits-i_pos;
size_t left = 1+(j & (word_bits - 1UL));
return ((B[cell_j] & masks[left]) << right) | ((B[cell_i] >> i_pos) & masks[right]);
}
}
};
const size_t bitstream::masks[65]={0x0,
0x1,0x3, 0x7,0xF,
0x1F,0x3F, 0x7F,0xFF,
0x1FF,0x3FF, 0x7FF,0xFFF,
0x1FFF,0x3FFF, 0x7FFF,0xFFFF,
0x1FFFF,0x3FFFF, 0x7FFFF,0xFFFFF,
0x1FFFFF,0x3FFFFF, 0x7FFFFF,0xFFFFFF,
0x1FFFFFF,0x3FFFFFF, 0x7FFFFFF,0xFFFFFFF,
0x1FFFFFFF,0x3FFFFFFF, 0x7FFFFFFF,0xFFFFFFFF,
0x1FFFFFFFF,0x3FFFFFFFF, 0x7FFFFFFFF,0xFFFFFFFFF,
0x1FFFFFFFFF,0x3FFFFFFFFF, 0x7FFFFFFFFF,0xFFFFFFFFFF,
0x1FFFFFFFFFF,0x3FFFFFFFFFF, 0x7FFFFFFFFFF,0xFFFFFFFFFFF,
0x1FFFFFFFFFFF,0x3FFFFFFFFFFF, 0x7FFFFFFFFFFF,0xFFFFFFFFFFFF,
0x1FFFFFFFFFFFF,0x3FFFFFFFFFFFF, 0x7FFFFFFFFFFFF,0xFFFFFFFFFFFFF,
0x1FFFFFFFFFFFFF,0x3FFFFFFFFFFFFF, 0x7FFFFFFFFFFFFF,0xFFFFFFFFFFFFFF,
0x1FFFFFFFFFFFFFF,0x3FFFFFFFFFFFFFF, 0x7FFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFF,
0x1FFFFFFFFFFFFFFF,0x3FFFFFFFFFFFFFFF, 0x7FFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF}
The function read belongs to the class that wraps B and reads an area of B of most 64 bits.
The code above works, but it seems to be the bottleneck of my application (I run it exhaustively over massive inputs).
Now, my question is: do you know if there is a technique to compare A and B[c..c+a-1] faster?
I know I could use SIMD instructions, but I don't think it will produce a significant improvement as B is encoded in 64-bit cells.
Here are some extra details:
A is usually short (maybe 20 or 30 64-bit cells), but there is not guarantee. It could also be arbitrarily large, although always smaller than B.
I can't make any assumption about A's encoding. It could be uint8_t, uint16_t, uint32_t or uint64_t. That is the reason I pass it as void* to the function.
Link to godbolt with the code above compiling example
Thanks!
A few things you can try:
as noted before, you can't just cast A to size_t*. You either need to go byte-by-byte, or check the beginning and end that's not 8-byte aligned separately
move the declaration of tmp_data inside the loop as a single 'size_t const tmp_data' assignment, refer to B[cell_i] and B[cell_i+1], and increment cell_i in the for statement. That way the compiler can do loop unrolling (at least it can detect that it can much more easily).
finally, if memory is not an issue, then you can keep 8 copies of B (each shifted by a bit to the right), and use the one where B[c] is the beginning of a new byte. Then you can use memcmp (which will presumably give you the fastest code).
Consider the follow variables:
std::uint8_t value;
const bool bits[8] = { true, false, false, true,
false, false, true, false };
If I was to print out the array of bools to the console
for (int i = 0; i < 7; i++ )
std::cout << bits[i];
it would give the following output:
10010010
simple enough and straight forward.
What I would like to do is to generate either a constexpr function, a function template, a lambda, or a combination of them that can run either during compile time or runtime depending on the context in which it is being used to where I could take each of these boolean values of 0s and 1s and store them into the variable value above. If the value is known at compile-time then I'd like for this assignment to be resolved. If the value isn't known at compile-time, then the value will be initialized to 0 until it is updated then it would be used in a runtime context.
However, there is one caveat that isn't obvious at first, but by indexing through the array, the 0th index of the array will be the LSB bit of the value and the 7th index will be the MSB. So the order of bits that you are seeing printed from the screen would have a hex value of 0x92 but the value to be stored needs to be 01001001 which would have the hex value of 0x49 or 73 in decimal and not 146.
The above are members in a class where one is the data value representation and the array of bools is the bit representation. I have a few constructors where one will set the data or value member directly and the other constructors will set the array of bools, but I need for both of these values to stay concurrent with each other through the life of the class object if one updates the other needs to be changed as well. Also, the array of bools is a member of a non-named union with a nameless struct of 8 individual bools as a single bit within a bit field. The class also has an index operator to access the individual bits as single boolean values of 0s or 1s.
Here is what my class looks like:
constexpr unsigned BIT_WIDTH(const unsigned bits = 8) { return bits; }
struct Register_8 {
union {
bool bits_[BIT_WIDTH()];
struct {
bool b0 : 1;
bool b1 : 1;
bool b2 : 1;
bool b3 : 1;
bool b4 : 1;
bool b5 : 1;
bool b6 : 1;
bool b7 : 1;
};
};
std::uint8_t data_;
Register_8() : data_{ 0 } {}
Register_8(std::uint8_t data) : data_{ data } {
}
Register_8(const bool bits[BIT_WIDTH()]) {
for (unsigned i = 0; i < 8; i++)
bits_[i] = bits[i];
}
Register_8(const bool a, const bool b, const bool c, const bool d,
const bool e, const bool f, const bool g, const bool h) {
b0 = a; b1 = b, b2 = c, b3 = d;
b4 = e, b5 = f, b6 = g, b7 = h;
}
const std::uint8_t operator[](std::uint8_t idx) {
// I know there is no bounds checking here, I'll add that later!
return bits_[idx];
}
};
So how can I make each of the values in bits[] to be the individual bits of value where bit[0] is the LSB of value? I would also like to do this in a context where it will not generate any UB! Or does there already exist an algorithm within the STL under c++17 that will do this for me? I don't have a C++20 compiler yet... I've tried including the std::uint8_t within the union but it doesn't work as I would like it too and I wouldn't expect it to work either!
I walked away for a little bit and came back to what I was working on... I think the short break had helped. The suggestion by user Nicol Bolas had also helped by letting me know that I can do it with a constexpr function. Now I don't have to worry about templates or lambdas for this part of the code.
Here is the function that I have came up with that I believe will assign the bits in the appropriate order.
constexpr unsigned BIT_WIDTH(const unsigned bits = CHAR_BIT) { return bits; }
constexpr std::uint8_t byte_from_bools(const bool bits[BIT_WIDTH()]) {
std::uint8_t ret = 0x00;
std::uint8_t pos = 0x00;
for (unsigned i = 0; i < BIT_WIDTH(); i++) {
ret |= static_cast<std::uint8_t>(bits[i]) << pos++; // << pos--;
}
return ret;
}
If there are any kind of optimizations that can be done or any bugs or code smells, please let me know...
Now, it's just a matter of extracting individual bits and assigning them to my bit-field members, and the track when either one changes to make sure both are updated in a concurrent fashion.
Is there a way to set, clear, test and flip a single bit as an atomic operation in c++? For example bitwise variants to "compare_and_swap".
Manipulating bits atomically requires a compare_exchange RMW to avoid touching other bits in the atomic variable.
Testing a bit is not a modifying operation, therefore a load() suffices.
You will have to add range error checking
template<typename T, typename OP>
T manipulate_bit(std::atomic<T> &a, unsigned n, OP bit_op)
{
static_assert(std::is_integral<T>::value, "atomic type not integral");
T val = a.load();
while (!a.compare_exchange_weak(val, bit_op(val, n)));
return val;
}
auto set_bit = [](auto val, unsigned n) { return val | (1 << n); };
auto clr_bit = [](auto val, unsigned n) { return val & ~(1 << n); };
auto tgl_bit = [](auto val, unsigned n) { return val ^ (1 << n); };
int main()
{
std::atomic<int> a{0x2216};
manipulate_bit(a, 3, set_bit); // set bit 3
manipulate_bit(a, 7, tgl_bit); // toggle bit 7
manipulate_bit(a, 13, clr_bit); // clear bit 13
bool isset = (a.load() >> 5) & 1; // testing bit 5
}
Flipping a bit in an integer is just a compare and exchange operation. That you're using it to test and flip a single bit doesn't change anything. So a simple compare_exchange_weak loop will do this.
to set a bit atomically use fetch_or(bit) (also |=)
to clear a bit atomically you can use fetch_and(~bit) (also &=)
to flip a bit atomically you can use fetch_xor(bit)
So basically i need to check if a certain sequence of bits occurs in other sequence of bits(32bits).
The function shoud take 3 arguments:
n right most bits of a value.
a value
the sequence where the n bits should be checked for occurance
The function has to return the number of bit where the desired sequence started. Example chek if last 3 bits of 0x5 occur in 0xe1f4.
void bitcheck(unsigned int source, int operand,int n)
{
int i,lastbits,mask;
mask=(1<<n)-1;
lastbits=operand&mask;
for(i=0; i<32; i++)
{
if((source&(lastbits<<i))==(lastbits<<i))
printf("It start at bit number %i\n",i+n);
}
}
Your loop goes too far, I'm afraid. It could, for example 'find' the bit pattern '0001' in a value ~0, which consists of ones only.
This will do better (I hope):
void checkbit(unsigned value, unsigned pattern, unsigned n)
{
unsigned size = 8 * sizeof value;
if( 0 < n && n <= size)
{
unsigned mask = ~0U >> (size - n);
pattern &= mask;
for(int i = 0; i <= size - n; i ++, value >>= 1)
if((value & mask) == pattern)
printf("pattern found at bit position %u\n", i+n);
}
}
I take you to mean that you want to take source as a bit array, and to search it for a bit sequence specified by the n lowest-order bits of operand. It seems you would want to perform a standard mask & compare; the only (minor) complication being that you need to scan. You seem already to have that idea.
I'd write it like this:
void bitcheck(uint32_t source, uint32_t operand, unsigned int n) {
uint32_t mask = ~((~0) << n);
uint32_t needle = operand & mask;
int i;
for(i = 0; i <= (32 - n); i += 1) {
if (((source >> i) & mask) == needle) {
/* found it */
break;
}
}
}
There are some differences in the details between mine and yours, but the main functional difference is the loop bound: you must be careful to ignore cases where some of the bits you compare against the target were introduced by a shift operation, as opposed to originating in source, lest you get false positives. The way I've written the comparison makes it clearer (to me) what the bound should be.
I also use the explicit-width integer data types from stdint.h for all values where the code depends on a specific width. This is an excellent habit to acquire if you want to write code that ports cleanly.
Perhaps:
if((source&(maskbits<<i))==(lastbits<<i))
Because:
finding 10 in 11 will be true for your old code. In fact, your original condition will always return true when 'source' is made of all ones.
On the Bit Twiddling Hacks website the following algorithm is provided to round up an integer to the next power of two:
unsigned int v; // compute the next highest power of 2 of 32-bit v
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
I would like to code a metaprogramming function that will compute the same operation:
recursively (for compile-time execution)
for any kind of integer (it should even work for possible awkward non-standard integers of any size like 15 bits, 65 bits...)
and here is the form of the expected function:
template <typename Type,
// Something here (like a recursion index)
class = typename std::enable_if<std::is_integral<Type>::value>::type,
class = typename std::enable_if<std::is_unsigned<Type>::value>::type>
constexpr Type function(const Type value)
{
// Something here
}
How to do that ?
Example: for value = 42 it should return 64
This ought to implement the algorithm you give:
template<typename T>
constexpr T roundup_helper( T value, unsigned maxb, unsigned curb ) {
return maxb<=curb
? value
: roundup_helper( ((value-1) | ((value-1)>>curb))+1, maxb, curb << 1 )
;
}
template<typename T,
typename = typename enable_if<is_integral<T>::value>::type,
typename = typename enable_if<is_unsigned<T>::value>::type>
constexpr T roundup( T value ) {
return roundup_helper( value, sizeof(T)*CHAR_BIT, 1 );
}
At least, it seems to work fine in my test program.
Alternatively, you can move the v-1 and v+1 out of the helper function like so:
template<typename T>
constexpr T roundup_helper( T value, unsigned maxb, unsigned curb ) {
return maxb<=curb
? value
: roundup_helper( value | (value>>curb), maxb, curb << 1 )
;
}
template<typename T,
typename = typename enable_if<is_integral<T>::value>::type,
typename = typename enable_if<is_unsigned<T>::value>::type>
constexpr T roundup( T value ) {
return roundup_helper( value-1, sizeof(T)*CHAR_BIT, 1 )+1;
}
Another possibility is to take advantage of default arguments and put it all in a single function:
template<typename T,
typename = typename enable_if<is_integral<T>::value>::type,
typename = typename enable_if<is_unsigned<T>::value>::type>
constexpr T roundup(
T value,
unsigned maxb = sizeof(T)*CHAR_BIT,
unsigned curb = 1
) {
return maxb<=curb
? value
: roundup( ((value-1) | ((value-1)>>curb))+1, maxb, curb << 1 )
;
}
This may not be what you can do unfortunately. But if by any chance you have a constexpr count leading zeros compiler intrinsic, the following is very efficient both at compile time, and at run time if you happen to give it run time arguments:
#include <climits>
template <class Int>
inline
constexpr
Int
clp2(Int v)
{
return v > 1 ? 1 << (sizeof(Int)*CHAR_BIT - __builtin_clz(v-1)) : v;
}
int
main()
{
static_assert(clp2(0) == 0, "");
static_assert(clp2(1) == 1, "");
static_assert(clp2(2) == 2, "");
static_assert(clp2(3) == 4, "");
static_assert(clp2(4) == 4, "");
static_assert(clp2(5) == 8, "");
static_assert(clp2(6) == 8, "");
static_assert(clp2(7) == 8, "");
static_assert(clp2(8) == 8, "");
static_assert(clp2(42) == 64, "");
}
I compiled the above with tip-of-trunk clang. It is not without its issues. You need to decide what you want to do with negative arguments. But many architectures and compilers have an intrinsic like this (shame it isn't standard C/C++ by now). And some of those may make the intrinsic constexpr.
Without such an intrinsic, I would fall back to something along the lines of Adam H. Peterson's algorithm. But the nice thing about this one is its simplicity and efficiency.
Although less efficient in the general, this algorithm will do the job rather concisely:
template <typename T,
typename = typename std::enable_if<std::is_integral<T>::value>::type,
typename = typename std::enable_if<std::is_unsigned<T>::value>::type>
constexpr T upperPowerOfTwo(T value, size_t pow = 0)
{
return (value >> pow) ? upperPowerOfTwo(value, pow + 1)
: T(1) << pow;
}
This also allows you to specify a minimum power of 2 - i.e. upperPowerOfTwo(1, 3) return 8.
The reason this is less efficient for most cases is that it makes O(sizeof(Type)*CHAR_BIT) calls whereas the algorithm you linked performs O(log(sizeof(Type)*CHAR_BIT)) operations. The caveat is that this algorithm will terminate after log(v) calls, so if v is small enough (i.e. < log(max value of v-type)) it will be faster.