C++ increment std::atomic_int if nonzero - c++

I'm implementing a pointer / weak pointer mechanism using std::atomics for the reference counter (like this). For converting a weak pointer to a strong one I need to atomically
check if the strong reference counter is nonzero
if so, increment it
know whether something has changed.
Is there a way to do this using std::atomic_int? I think it has to be possible using one of the compare_exchange, but I can't figure it out.

Given the definition std::atomic<int> ref_count;
int previous = ref_count.load();
for (;;)
{
if (previous == 0)
break;
if (ref_count.compare_exchange_weak(previous, previous + 1))
break;
}
previous will hold the previous value. Note that compare_exchange_weak will update previous if it fails.

This should do it:
bool increment_if_non_zero(std::atomic<int>& i) {
int expected = i.load();
int to_be_loaded = expected;
do {
if(expected == 0) {
to_be_loaded = expected;
}
else {
to_be_loaded = expected + 1;
}
} while(!i.compare_exchange_weak(expected, to_be_loaded));
return expected;
}

Related

Atomically increment and assign to another atomic

Suppose I have some global:
std::atomic_int next_free_block;
and a number of threads each with access to a
std::atomic_int child_offset;
that may be shared between threads. I would like to allocate free blocks to child offsets in a contiguous manner, that is, I want to perform the following operation atomically:
if (child_offset != 0) child_offset = next_free_block++;
Obviously the above implementation does not work as multiple threads may enter the body of the if statement and then try to assign different blocks to child_offset.
I have also considered the following:
int expected = child_offset;
do {
if (expected == 0) break;
int updated = next_free_block++;
} while (!child_offset.compare_exchange_weak(&expected, updated);
But this also doesn't work because if the CAS fails, the side effect of incrementing next_free_block remains even if nothing is assigned to child_offset. This leaves gaps in the allocation of free blocks.
I am aware that I could do this with a mutex (or some kind of spin lock) around each child_offset and potentially DCLP, but I would like to know if this is possible to implement efficiently with atomic operations.
The use case for this is as follows: I have a large tree that I'm building in parallel. The tree is an array of the following:
struct tree_page {
atomic<uint32_t> allocated;
uint32_t child_offset[8];
uint32_t nodes[1015];
};
The tree is built level by level: first the nodes at depth 0 are created, then at depth 1, etc. A separate thread is dispatched for each non-leaf node at the previous step. If no more space is left in a page, a new page is allocated from the global next_free_page which points to the first unused page in the array of struct tree_page and is assigned to an element of child_ptr. A bit field is then set in the node word that indicates which element of the child_ptr array should be used to find the node's children.
The code I am trying to write looks like this:
int expected = allocated.load(relaxed), updated;
do {
updated = expected + num_children;
if (updated > NODES_PER_PAGE) {
expected = -1; break;
}
} while (!allocated.compare_exchange_weak(&expected, updated));
if (expected != -1) {
// successfully allocated in the same page
} else {
for (int i = 0; i < 8; ++i) {
// this is the operation I would like to be atomic
if (child_offset[i] == 0)
child_offset[i] = next_free_block++;
int offset = try_allocating_at_page(pages[child_offset[i]]);
if (offset != -1) {
// successfully allocated at child_offset i
// ...
break;
}
}
}
As far as I understood from you description you array of child_offset is filled with 0 initially and then filled with some concrete values concurrently by different threads.
In this case you can atomically "tag" value first and if you are successful assign valid value. Something like this:
constexpr int INVALID_VALUE = -1;
for (int i = 0; i < 8; ++i) {
int expected = 0;
// this is the operation I would like to be atomic
if (child_offset[i].compare_exchange_weak(expected, INVALID_VALUE)) {
child_offset[i] = next_free_block++;
}
// Not sure if this is needed in your environment, but just in case
if (child_offset[i] == INVALID_VALUE) continue;
...
}
This doesn't guarantee that all values in child_offset array will be in ascending order. But if you need that why not fill it without multithreading involved?

Is there a "not equal compare and exchange" or "fetch add on not equal" for C++?

Or any way to implement?
Let's have an atomic:
std::atomic<int> val;
val = 0;
Now I want to update val only if val is not zero.
if (val != 0) {
// <- Caveat if val becomes 0 here by another thread.
val.fetch_sub(1);
}
So maybe:
int not_expected = 0;
val.hypothetical_not_compare_exchange_strong(not_expected, val - 1);
Actually the above also will not work because val may get updated between val - 1 and the hypothetical function.
Maybe this:
int old_val = val;
if (old_val == 0) {
// val is zero, don't update val. some other logic.
} else {
int new_val = old_val - 1;
bool could_update = val.compare_exchange_strong(old_val, new_val);
if (!could_update) {
// repeat the above steps again.
}
}
Edit:
val is a counter variable, not related to destruction of an object though. It's supposed to be an unsigned (since count can never be negative).
From thread A: if type 2 is sent out, type 1 cannot be sent out unless type 2 counter is 0.
while(true) {
if counter_1 < max_type_1_limit && counter_2 == 0 && somelogic:
send_request_type1();
counter_1++;
if some logic && counter_2 == 0:
send_request_type2();
counter_2++;
}
thread B & C: handle response:
if counter_1 > 0:
counter_1--
// (provided that after this counter_1 doesn't reduce to negative)
else
counter_2--
The general way to implement not available atomic operations is using a CAS loop; in your case it would look like this:
/// atomically decrements %val if it's not zero; returns true if it
/// decremented, false otherwise
bool decrement_if_nonzero(std::atomic_int &val) {
int old_value = val.load();
do {
if(old_value == 0) return false;
} while(!val.compare_exchange_weak(old_value, old_value-1));
return true;
}
So, Thread B & C would be:
if(!decrement_if_nonzero(counter_1)) {
counter_2--
}
and thread A could use plain atomic loads/increments - thread A is the only one who increments the counters, so its check about counter_1 being under a certain threshold will always hold, regardless of what thread B and C do.
The only "strange" thing I see is the counter_2 fixup logic - in thread B & C it's decremented without checking for zero, while in thread A it's incremented only if it's zero - it looks like a bug. Did you mean to clamp it to zero in thread B/C as well?
That being said, atomics are great and all, but are trickier to get right, so if I were implementing this kind of logic I'd start out with a mutex, and then move to atomics if profiling pointed out that the mutex was a bottleneck.

How to safely compare two unsigned integer counters?

We have two unsigned counters, and we need to compare them to check for some error conditions:
uint32_t a, b;
// a increased in some conditions
// b increased in some conditions
if (a/2 > b) {
perror("Error happened!");
return -1;
}
The problem is that a and b will overflow some day. If a overflowed, it's still OK. But if b overflowed, it would be a false alarm. How to make this check bulletproof?
I know making a and b uint64_t would delay this false-alarm. but it still could not completely fix this issue.
===============
Let me clarify a little bit: the counters are used to tracking memory allocations, and this problem is found in dmalloc/chunk.c:
#if LOG_PNT_SEEN_COUNT
/*
* We divide by 2 here because realloc which returns the same
* pointer will seen_c += 2. However, it will never be more than
* twice the iteration value. We divide by two to not overflow
* iter_c * 2.
*/
if (slot_p->sa_seen_c / 2 > _dmalloc_iter_c) {
dmalloc_errno = ERROR_SLOT_CORRUPT;
return 0;
}
#endif
I think you misinterpreted the comment in the code:
We divide by two to not overflow iter_c * 2.
No matter where the values are coming from, it is safe to write a/2 but it is not safe to write a*2. Whatever unsigned type you are using, you can always divide a number by two while multiplying may result in overflow.
If the condition would be written like this:
if (slot_p->sa_seen_c > _dmalloc_iter_c * 2) {
then roughly half of the input would cause a wrong condition. That being said, if you worry about counters overflowing, you could wrap them in a class:
class check {
unsigned a = 0;
unsigned b = 0;
bool odd = true;
void normalize() {
auto m = std::min(a,b);
a -= m;
b -= m;
}
public:
void incr_a(){
if (odd) ++a;
odd = !odd;
normalize();
}
void incr_b(){
++b;
normalize();
}
bool check() const { return a > b;}
}
Note that to avoid the overflow completely you have to take additional measures, but if a and b are increased more or less the same amount this might be fine already.
The posted code actually doesn’t seem to use counters that may wrap around.
What the comment in the code is saying is that it is safer to compare a/2 > b instead of a > 2*b because the latter could potentially overflow while the former cannot. This particularly true of the type of a is larger than the type of b.
Note overflows as they occur.
uint32_t a, b;
bool aof = false;
bool bof = false;
if (condition_to_increase_a()) {
a++;
aof = a == 0;
}
if (condition_to_increase_b()) {
b++;
bof = b == 0;
}
if (!bof && a/2 + aof*0x80000000 > b) {
perror("Error happened!");
return -1;
}
Each a, b interdependently have 232 + 1 different states reflecting value and conditional increment. Somehow, more than an uint32_t of information is needed. Could use uint64_t, variant code paths or an auxiliary variable like the bool here.
Normalize the values as soon as they wrap by forcing them both to wrap at the same time. Maintain the difference between the two when they wrap.
Try something like this;
uint32_t a, b;
// a increased in some conditions
// b increased in some conditions
if (a or b is at the maximum value) {
if (a > b)
{
a = a-b; b = 0;
}
else
{
b = b-a; a = 0;
}
}
if (a/2 > b) {
perror("Error happened!");
return -1;
}
If even using 64 bits is not enough, then you need to code your own "var increase" method, instead of overload the ++ operator (which may mess your code if you are not careful).
The method would just reset var to '0' or other some meaningfull value.
If your intention is to ensure that action x happens no more than twice as often as action y, I would suggest doing something like:
uint32_t x_count = 0;
uint32_t scaled_y_count = 0;
void action_x(void)
{
if ((uint32_t)(scaled_y_count - x_count) > 0xFFFF0000u)
fault();
x_count++;
}
void action_y(void)
{
if ((uint32_t)(scaled_y_count - x_count) < 0xFFFF0000u)
scaled_y_count+=2;
}
In many cases, it may be desirable to reduce the constants in the comparison used when incrementing scaled_y_count so as to limit how many action_y operations can be "stored up". The above, however, should work precisely in cases where the operations remain anywhere close to balanced in a 2:1 ratio, even if the number of operations exceeds the range of uint32_t.

What is the most efficient way to return results from recursion?

There are 2 possible ways that I am familiar with while returning a boolean/integer value from a recursive function that defines is the operation carried out was a success or not.
Using static variables inside the recursive function. Changing values in the recursive calls and then returning the final value once everything is done.
Passing the result variable by reference to the recursive function and then manipulating its values in the function and then checking if the value corresponds to the result or not.
void Graph::findPath(string from, string to)
{
int result = 0;
if (from == to) cout<<"There is a path!"<<endl;
else
{
findPathHelper(from, to, result);
if (result) cout<<"There is a path!"<<endl;
else cout<<"There is not a path!"<<endl;
}
}
void Graph::findPathHelper(string from, string toFind, int &found)
{
for (vector<string>::iterator i = adjList[from].begin(); i != adjList[from].end(); ++i)
{
if (!(toFind).compare(*i))
{
found = 1;
break;
}
else
findPathHelper(*i, toFind, found);
}
}
Is there a better way to achieve this?
Thank You
I have changed your implementation to use a return value
bool Graph::findPathHelper(const string& from, const string& toFind)
{
for (vector<string>::iterator i = adjList[from].begin(); i != adjList[from].end(); ++i)
{
// I have assumed you comparison was incorrect - i.e. toFind == *i is that you want
// toFind == *i - The two strings are equal - Thus found
// or
// Recurse on *i - Have we found it from recursion
if (toFind == *i || findPathHelper(*i, toFind)) {
return true;
}
}
// We have searched everywhere in the recursion and exhausted the list
// and still have not found it - so return false
return false;
}
You can return a value in the recursive function and use that returned value for checking if it was success or not in subsequent calls.
Using static variable for this purpose may work but it's generally not a good IDEA and many consider it as bad practice.
Look into the below link which explains why we must avoid static or global variables and what kind of problems it could lead to during recursion.
http://www.cs.umd.edu/class/fall2002/cmsc214/Tutorial/recursion2.html
Note: I do not have enough reputation still to make a comment; and therefore i have posted this as answer.

Decrement atomic counter - but <only> under a condition

I want to realize something on this lines:
inline void DecrementPendingWorkItems()
{
if(this->pendingWorkItems != 0) //make sure we don't underflow and get a very high number
{
::InterlockedDecrement(&this->pendingWorkItems);
}
}
How can I do this so that both operations are atomic as a block, without using locks ?
You can just check the result of InterlockedDecrement() and if it happens to be negative (or <= 0 if that's more desirable) undo the decrement by calling InterlockedIncrement(). In otherwise proper code that should be just fine.
The simplest solution is just to use a mutex around the entire section
(and for all other accesses to this->pendingWorkItems). If for some
reason this isn't acceptable, then you'll probably need compare and
exchange:
void decrementPendingWorkItems()
{
int count = std::atomic_load( &pendingWorkItems );
while ( count != 0
&& ! std::atomic_compare_exchange_weak(
&pendingWorkItems, &count, count - 1 ) ) {
}
}
(This supposes that pendingWorkItems has type std::atomic_int.)
There is such a thing called "SpinLock". This is a very lightweight synchronisation.
This is the idea:
//
// This lock should be used only when operation with protected resource
// is very short like several comparisons or assignments.
//
class SpinLock
{
public:
__forceinline SpinLock() { body = 0; }
__forceinline void Lock()
{
int spin = 15;
for(;;) {
if(!InterlockedExchange(&body, 1)) break;
if(--spin == 0) { Sleep(10); spin = 29; }
}
}
__forceinline void Unlock() { InterlockedExchange(&body, 0); }
protected:
long body;
};
Actual numbers in the sample are not important. This lock is extremely efficient.
You can use InterlockedCompareExchange in a loop:
inline void DecrementPendingWorkItems() {
LONG old_items = this->pendingWorkingItems;
LONG items;
while ((items = old_items) > 0) {
old_items = ::InterlockedCompareExchange(&this->pendingWorkItems,
items-1, items);
if (old_items == items) break;
}
}
What the InterlockedCompareExchange function is doing is:
if pendingWorkItems matches items, then
set the value to items-1 and return items
else return pendingWorkItems
This is done atomically, and is also called a compare and swap.
Use an atomic CAS.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683560(v=vs.85).aspx
You can make it lock free, but not wait free.
As Kirill suggests this is similar to a spin lock in your case.
I think this does what you need, but I'd recommend thinking through all the possibilities before going ahead and using it as I have not tested it at all:
inline bool
InterlockedSetIfEqual(volatile LONG* dest, LONG exchange, LONG comperand)
{
return comperand == ::InterlockedCompareExchange(dest, exchange, comperand);
}
inline bool InterlockedDecrementNotZero(volatile LONG* ptr)
{
LONG comperand;
LONG exchange;
do {
comperand = *ptr;
exchange = comperand-1;
if (comperand <= 0) {
return false;
}
} while (!InterlockedSetIfEqual(ptr,exchange,comperand));
return true;
}
There remains the question as to why your pending work items should ever go below zero. You should really ensure that the number of increments matches the number of decrements and all will be fine. I'd perhaps add an assert or exception if this constraint is violated.