Single bit manipulations with guaranteed atomicity - c++

Is there a way to set, clear, test and flip a single bit as an atomic operation in c++? For example bitwise variants to "compare_and_swap".

Manipulating bits atomically requires a compare_exchange RMW to avoid touching other bits in the atomic variable.
Testing a bit is not a modifying operation, therefore a load() suffices.
You will have to add range error checking
template<typename T, typename OP>
T manipulate_bit(std::atomic<T> &a, unsigned n, OP bit_op)
{
static_assert(std::is_integral<T>::value, "atomic type not integral");
T val = a.load();
while (!a.compare_exchange_weak(val, bit_op(val, n)));
return val;
}
auto set_bit = [](auto val, unsigned n) { return val | (1 << n); };
auto clr_bit = [](auto val, unsigned n) { return val & ~(1 << n); };
auto tgl_bit = [](auto val, unsigned n) { return val ^ (1 << n); };
int main()
{
std::atomic<int> a{0x2216};
manipulate_bit(a, 3, set_bit); // set bit 3
manipulate_bit(a, 7, tgl_bit); // toggle bit 7
manipulate_bit(a, 13, clr_bit); // clear bit 13
bool isset = (a.load() >> 5) & 1; // testing bit 5
}

Flipping a bit in an integer is just a compare and exchange operation. That you're using it to test and flip a single bit doesn't change anything. So a simple compare_exchange_weak loop will do this.

to set a bit atomically use fetch_or(bit) (also |=)
to clear a bit atomically you can use fetch_and(~bit) (also &=)
to flip a bit atomically you can use fetch_xor(bit)

Related

Fast bitwise comparison of unaligned bit streams

I have two bit streams A[1..a] and B[1..b], where a is always smaller than b. Now, given an index c in B, I want to know if A matches the area B[c..c+a-1] (assume c+a-1<=b always hold).
I can't just use memcmp because A and B[c..c+a-1] are not necessarily byte-aligned.
So I have a custom function that compares A and B[c..c+a-1] bitwise, where B is encoded within a class that performs bit operations. This is my C++ code:
#include<cstddef>
#include<cstdint>
struct bitstream{
constexpr static uint8_t word_bits = 64;
constexpr static uint8_t word_shift = 6;
const static size_t masks[65];
size_t *B;
inline bool compare_chunk(const void* A, size_t a, size_t c) {
size_t n_words = a / word_bits;
size_t left = c & (word_bits - 1UL);
size_t right = word_bits - left;
size_t cell_i = c >> word_shift;
auto tmp_in = reinterpret_cast<const size_t *>(A);
size_t tmp_data;
//shift every cell in B[c..c+a-1] to compare it against A
for(size_t k=0; k < n_words - 1; k++){
tmp_data = (B[cell_i] >> left) & masks[right];
tmp_data |= (B[++cell_i] & masks[left]) << right;
if(tmp_data != tmp_in[k]) return false;
}
size_t read_bits = (n_words - 1) << word_shift;
return (tmp_in[n_words - 1] & masks[(a-read_bits)]) == read(c + read_bits, c+a-1);
}
inline size_t read(size_t i, size_t j) const{
size_t cell_i = i >> word_shift;
size_t i_pos = (i & (word_bits - 1UL));
size_t cell_j = j >> word_shift;
if(cell_i == cell_j){
return (B[cell_i] >> i_pos) & masks[(j - i + 1UL)];
}else{
size_t right = word_bits-i_pos;
size_t left = 1+(j & (word_bits - 1UL));
return ((B[cell_j] & masks[left]) << right) | ((B[cell_i] >> i_pos) & masks[right]);
}
}
};
const size_t bitstream::masks[65]={0x0,
0x1,0x3, 0x7,0xF,
0x1F,0x3F, 0x7F,0xFF,
0x1FF,0x3FF, 0x7FF,0xFFF,
0x1FFF,0x3FFF, 0x7FFF,0xFFFF,
0x1FFFF,0x3FFFF, 0x7FFFF,0xFFFFF,
0x1FFFFF,0x3FFFFF, 0x7FFFFF,0xFFFFFF,
0x1FFFFFF,0x3FFFFFF, 0x7FFFFFF,0xFFFFFFF,
0x1FFFFFFF,0x3FFFFFFF, 0x7FFFFFFF,0xFFFFFFFF,
0x1FFFFFFFF,0x3FFFFFFFF, 0x7FFFFFFFF,0xFFFFFFFFF,
0x1FFFFFFFFF,0x3FFFFFFFFF, 0x7FFFFFFFFF,0xFFFFFFFFFF,
0x1FFFFFFFFFF,0x3FFFFFFFFFF, 0x7FFFFFFFFFF,0xFFFFFFFFFFF,
0x1FFFFFFFFFFF,0x3FFFFFFFFFFF, 0x7FFFFFFFFFFF,0xFFFFFFFFFFFF,
0x1FFFFFFFFFFFF,0x3FFFFFFFFFFFF, 0x7FFFFFFFFFFFF,0xFFFFFFFFFFFFF,
0x1FFFFFFFFFFFFF,0x3FFFFFFFFFFFFF, 0x7FFFFFFFFFFFFF,0xFFFFFFFFFFFFFF,
0x1FFFFFFFFFFFFFF,0x3FFFFFFFFFFFFFF, 0x7FFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFF,
0x1FFFFFFFFFFFFFFF,0x3FFFFFFFFFFFFFFF, 0x7FFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF}
The function read belongs to the class that wraps B and reads an area of B of most 64 bits.
The code above works, but it seems to be the bottleneck of my application (I run it exhaustively over massive inputs).
Now, my question is: do you know if there is a technique to compare A and B[c..c+a-1] faster?
I know I could use SIMD instructions, but I don't think it will produce a significant improvement as B is encoded in 64-bit cells.
Here are some extra details:
A is usually short (maybe 20 or 30 64-bit cells), but there is not guarantee. It could also be arbitrarily large, although always smaller than B.
I can't make any assumption about A's encoding. It could be uint8_t, uint16_t, uint32_t or uint64_t. That is the reason I pass it as void* to the function.
Link to godbolt with the code above compiling example
Thanks!
A few things you can try:
as noted before, you can't just cast A to size_t*. You either need to go byte-by-byte, or check the beginning and end that's not 8-byte aligned separately
move the declaration of tmp_data inside the loop as a single 'size_t const tmp_data' assignment, refer to B[cell_i] and B[cell_i+1], and increment cell_i in the for statement. That way the compiler can do loop unrolling (at least it can detect that it can much more easily).
finally, if memory is not an issue, then you can keep 8 copies of B (each shifted by a bit to the right), and use the one where B[c] is the beginning of a new byte. Then you can use memcmp (which will presumably give you the fastest code).

Set the most significant bit

I'm trying to toggle the most significant bit of an unsigned int based on a bool flag. This is my code for an hypothetical K = unit64_t:
This is the Item class:
template<typename K>
class Item {
public:
K first;
Item() = default;
explicit Item(const K &elem, const bool flag = false) {
first = elem & 0x3FFFFFFFFFFFFFFF;
first |= (flag * 0x8000000000000000);
}
};
Is there a way to do this fully generical? That it works for all kind of numeric K?
I tried with the 8 * sizeof(K) but it doesn't work.
An option using only bit operations:
template<typename T>
void Item(T& elem, bool flag = false) {
T mask = (T)1 << (sizeof(T) * 8 - 1);
elem = (elem & ~mask) | (flag ? mask : 0);
}
You can leverage std::bitset for this. Not sure how well this optimizes, but it should optimize well and it will work generically and requires no bitwise operation knowledge.
template <typename T>
void toggle_msb(T& val)
{
constexpr auto bit_width = sizeof(T) * CHAR_BIT;
std::bitset<bit_width> temp(val);
val = temp.flip(bit_width - 1).to_ullong();
}
Using bitwise operations, but without explicitly depending on the size of T:
template<typename T>
T set_top_bit(T value, bool state) {
constexpr T mask = T(~T(0)) >> 1;
value &= mask;
if (state) value |= ~mask;
return value;
}
T(~T(0)) gets a T with all bits set1; >> 1 throws out the bottom bit getting a 0 in from the top, so in mask we have a T with all bit set but the topmost. Notice that all this dance is purely formal—this is all evaluated at compile time.
The rest is pretty much like your code: mask out the top bit from value and OR it back in depending on state (~mask will be a T with only the top bit set).
Plain ~0 would result in an int set to -1, ~0U in an unsigned int with all bits set; to obtain a T with all bits set, we need to flip the bits of a T(0) (so, a 0 of our T type), and also to cast back to T later, because, if T is smaller than int, ~T(0) is actually equivalent to ~int(T(0)), so ~0, due to integer promotion rules.
From this great post, How do you set, clear, and toggle a single bit?, you can use this branchless version:
#include <climits>
template<typename T>
constexpr T set_top_bit_v2(T value, bool state) {
constexpr auto msb = (sizeof(T) * CHAR_BIT) - 1;
return (value & ~(T{1} << msb)) | (T{state} << msb);
}
Comparing the output with the version of Matteo Italia on Godbolt (here), despite the differences Clang seems to generate the same code, while GCC and MSVC seem to emit less instructions when using this version.

How do I flip part of number/bitset in C++ efficiently?

Consider I have a number 100101 of length 6. I wish to flip bits starting from position 2 to 5 so my answer will be 111011. I know that I can flip individual bits in a loop. But is there an efficient way of doing this without a for loop?
If I understand you correctly, try
namespace {
unsigned flip(unsigned a)
{
return a ^ 0b011110u;
}
} // namespace
Just adjust the constant to the actual bits you want to flip by having them as 1's in the constant.
On the otherhand, if you need just to update an individual variable you also use
unsigned value = 0b100101u;
value ^= 0b011110u;
assert(value == 0b111011u);
EDIT
And here is the same using std::bitset<6u> and C++98:
#include <bitset>
#include <cassert>
int main()
{
std::bitset<6u> const kFlipBit2to6WithXor(0x1Eu); // aka 0b011110u
std::bitset<6u> number(0x25u); // aka 0b100101u
number ^= kFlipBit2to6WithXor;
assert(number.to_ulong() == 0x3Bu); // aka 0b111011u
return 0;
}
0b1111 is 0b10000-1.
constexpr unsigned low_mask(unsigned x){
return (1u<<x)-1;
}
0b1100 is 0b1111-0b11.
constexpr unsigned mask(unsigned width, unsigned offset){
return low_mask(width+offset)-low_mask(offset);
}
then use xor to flip bits.
unsigned result = 0b100001 ^ mask(4,2);

Cheking a pattern of bits in a sequence

So basically i need to check if a certain sequence of bits occurs in other sequence of bits(32bits).
The function shoud take 3 arguments:
n right most bits of a value.
a value
the sequence where the n bits should be checked for occurance
The function has to return the number of bit where the desired sequence started. Example chek if last 3 bits of 0x5 occur in 0xe1f4.
void bitcheck(unsigned int source, int operand,int n)
{
int i,lastbits,mask;
mask=(1<<n)-1;
lastbits=operand&mask;
for(i=0; i<32; i++)
{
if((source&(lastbits<<i))==(lastbits<<i))
printf("It start at bit number %i\n",i+n);
}
}
Your loop goes too far, I'm afraid. It could, for example 'find' the bit pattern '0001' in a value ~0, which consists of ones only.
This will do better (I hope):
void checkbit(unsigned value, unsigned pattern, unsigned n)
{
unsigned size = 8 * sizeof value;
if( 0 < n && n <= size)
{
unsigned mask = ~0U >> (size - n);
pattern &= mask;
for(int i = 0; i <= size - n; i ++, value >>= 1)
if((value & mask) == pattern)
printf("pattern found at bit position %u\n", i+n);
}
}
I take you to mean that you want to take source as a bit array, and to search it for a bit sequence specified by the n lowest-order bits of operand. It seems you would want to perform a standard mask & compare; the only (minor) complication being that you need to scan. You seem already to have that idea.
I'd write it like this:
void bitcheck(uint32_t source, uint32_t operand, unsigned int n) {
uint32_t mask = ~((~0) << n);
uint32_t needle = operand & mask;
int i;
for(i = 0; i <= (32 - n); i += 1) {
if (((source >> i) & mask) == needle) {
/* found it */
break;
}
}
}
There are some differences in the details between mine and yours, but the main functional difference is the loop bound: you must be careful to ignore cases where some of the bits you compare against the target were introduced by a shift operation, as opposed to originating in source, lest you get false positives. The way I've written the comparison makes it clearer (to me) what the bound should be.
I also use the explicit-width integer data types from stdint.h for all values where the code depends on a specific width. This is an excellent habit to acquire if you want to write code that ports cleanly.
Perhaps:
if((source&(maskbits<<i))==(lastbits<<i))
Because:
finding 10 in 11 will be true for your old code. In fact, your original condition will always return true when 'source' is made of all ones.

Bits aren't being reset?

I am using Bit Scan Forward to detect set bits within a unit64_t, use each set bit index within my program, clear the set bit and then proceed to find the next set bit. However, when the initial uint64_t value is:
0000000000001000000000000000000000000000000000000000000000000000
The below code isn't resetting the 52nd bit, therefore it gets stuck in the while loop:
uint64_t bits64 = data;
//Detects index being 52
int32_t index = __builtin_ffsll(bits64);
while(0 != index){
//My logic
//Set bit isn't being cleared here
clearNthBitOf64(bits64, index);
//Still picks-up bit 52 as being set
index = __builtin_ffsll(bits64);
}
void clearNthBitOf64(uint64_t& input, const uint32_t n) {
input &= ~(1 << n);
}
From the docs:
— Built-in Function: int __builtin_ffs (int x)
Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.
You're simply off by one on your clear function, it should be:
clearNthBitOf64(bits64, index-1);
Also your clear function is overflowing. You need to ensure that what you're left shifting is of sufficient size:
void clearNthBitOf64(uint64_t& input, const uint32_t n) {
input &= ~(1ULL << n);
// ^^^
}
__builtin_ffsll "returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero." You need to adjust the left shift to ~(1ULL << (n - 1)) or change the function call to clearNthBitOf64(bits64, index - 1);