atomic swap with CAS (using gcc sync builtins)

atomic swap with CAS (using gcc sync builtins) - c++

Can the compare-and-swap function be used to swap variables atomically?
I'm using C/C++ via gcc on x86_64 RedHat Linux, specifically the __sync builtins.
Example:
int x = 0, y = 1;
y = __sync_val_compare_and_swap(&x, x, y);
I think this boils down to whether x can change between &x and x; for instance, if &x constitutes an operation, it might be possible for x to change between &x and x in the arguments. I want to assume that the comparison implicit above will always be true; my question is whether I can. Obviously there's the bool version of CAS, but then I can't get the old x to write into y.
A more useful example might be inserting or removing from the head of a linked list (gcc claims to support pointer types, so assume that's what elem and head are):
elem->next = __sync_val_compare_and_swap(&head, head, elem); //always inserts?
elem = __sync_val_compare_and_swap(&head, head, elem->next); //always removes?
Reference:
http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html

The operation might not actually store the new value into the destination because of a race with another thread that changes the value at the same moment you're trying to. The CAS primitive doesn't guarantee that the write occurs - only that the write occurs if the value is already what's expected. The primitive can't know what the correct behavior is if the value isn't what is expected, so nothing happens in that case - you need to fix up the problem by checking the return value to see if the operation worked.
So, your example:
elem->next = __sync_val_compare_and_swap(&head, head, elem); //always inserts?
won't necessarily insert the new element. If another thread inserts an element at the same moment, there's a race condition that might cause this thread's call to __sync_val_compare_and_swap() to not update head (but neither this thread's or the other thread's element is lost yet if you handle it correctly).
But, there's another problem with that line of code - even if head did get updated, there's a brief moment of time where head points to the inserted element, but that element's next pointer hasn't been updated to point to the previous head of the list. If another thread swoops in during that moment and tries to walk the list, bad things happen.
To correctly update the list change that line of code to something like:
whatever_t* prev_head = NULL;
do {
elem->next = head; // set up `elem->head` so the list will still be linked
// correctly the instant the element is inserted
prev_head = __sync_val_compare_and_swap(&head, elem->next, elem);
} while (prev_head != elem->next);
Or use the bool variant, which I think is a bit more convenient:
do {
elem->next = head; // set up `elem->head` so the list will still be linked
// correctly the instant the element is inserted
} while (!__sync_bool_compare_and_swap(&head, elem->next, elem));
It's kind of ugly, and I hope I got it right (it's easy to get tripped up in the details of thread-safe code). It should be wrapped in an insert_element() function (or even better, use an appropriate library).
Addressing the ABA problem:
I don't think the ABA problem is relevant to this "add an element to the head of a list" code. Let's say that a thread wants to add object X to the list and when it executes elem->next = head, head has value A1.
Then before the __sync_val_compare_and_swap() is executed, another set of threads comes along and:
removes A1 from the list, making head point to B
does whatever with object A1 and frees it
allocates another object, A2 that happens to to be at the same address as A1 was
adds A2 to the list so that head now points to A2
Since A1 and A2 have the same identifier/address, this is an instance of the ABA problem.
However, it doesn't matter in this case since the thread adding object X doesn't care that the head points to a different object than it started out with - all it cares about is that when X is queued:
the list is consistent,
no objects on the list have been lost, and
no objects other than X have been added to the list (by this thread)

Nope. The CAS instruction on x86 takes a value from a register, and compares/writes it against a value in memory.
In order to atomically swap two variables, it'd have to work with two memory operands.
As for whether x can change between &x and x? Yes, of course it can.
Even without the &, it could change.
Even in a function such as Foo(x, x), you could get two different values of x, since in order to call the function, the compiler has to:
take the value of x, and store it in the first parameter's position, according to the calling convention
take the value of x, and store it in the second parameter's position, according to the calling convention
between those two operations, another thread could easily modify the value of x.

It seems like you're looking for the interlocked-exchange primitive, not the interlocked-compare-exchange. That will unconditionally atomically swap the holding register with the target memory location.
However, you still have a problem with race conditions between assignments to y. Sometimes y is a local, in which case this will be safe, but if both x and y are shared you have a major problem and will need a lock to resolve it.

Related

Is there any elegant way of iterating through a list whose elements' positions can change?

I am currently running into a disgusting problem. Suppose there is a list aList of objects(whose type we call Object), and I want to iterate through it. Basically, the code would be like this:
for(int i = 0; i < aList.Size(); ++i)
{
aList[i].DoSth();
}
The difficult part here is, the DoSth() method could change the caller's position in the list! So two consequences could occur: first, the iteration might never be able to come to an end; second, some elements might be skipped (the iteration is not necessarily like above, since it might be a linked list). Of course, the first one is the major concern.
The problem must be solved with these constraints:
1) The possibility of doing position-exchanging operations cannot be excluded;
2) The position-exchanging operations can be delayed until the iteration finishes, if necessary and doable;
3) Since it happens quite often, the iteration can be modified only minimally (so actions like creating a copy of the list is not recommended).
The language I'm using is C++, but I think there are similar problems in JAVA and C#, etc.
The following are what I've tried:
a) Try forbidding the position-exchanging operations during the iteration. However, that involves too many client code files and it's just not practical to find and modify all of them.
b) Modify every single method(e.g., Method()) of Object that can change the position of itself and will be called by DoSth() directly or indirectly, in this way: first we can know that aList is doing the iteration, and we'll treat Method() accordingly. If the iteration is in progress, then we delay what Method() wants to do; otherwise, it does what it wants to right now. The question here is: what is the best (easy-to-use, yet efficient enough) way of delaying a function call here? The parameters of Method() could be rather complex. Moreover, this approach will involve quite a few functions, too!
c) Try modifying the iteration process. The real situation I encounter here is quite complex because it involves two layers of iterations: the first of them is a plain array iteration, while the second is a typical linked list iteration lying in a recursive function. The best I can do about the second layer of iteration for now, is to limit its iteration times and prevent the same element from being iterated more than once.
So I guess there could be some better way to tackle this problem? Maybe some awesome data structure will help?

Your question is a little light on detail, but from what you have written it seems that you are making the mistake of mixing concerns.
It is likely that your object can perform some action that causes it to either continue to exist or not. The decision that it should no longer exist is a separate concern to that of actually storing it in a container.
So let's split those concerns out:
#include <vector>
enum class ActionResult {
Dies,
Lives,
};
struct Object
{
ActionResult performAction();
};
using Container = std::vector<Object>;
void actions(Container& cont)
{
for (auto first = begin(cont), last = end(cont)
; first != last
; )
{
auto result = first->performAction();
switch(result)
{
case ActionResult::Dies:
first = cont.erase(first); // object wants to die so remove it
break;
case ActionResult::Lives: // object wants to live to continue
++first;
break;
}
}
}
If there are indeed only two results of the operation, lives and dies, then we could express this iteration idiomatically:
#include <algorithm>
// ...
void actions(Container& cont)
{
auto actionResultsInDeath = [](Object& o)
{
auto result = o.performAction();
return result == ActionResult::Dies;
};
cont.erase(remove_if(begin(cont), end(cont),
actionResultsInDeath),
end(cont));
}

Well, problem solved, at least in regard to the situation I'm interested in right now. In my situation, aList is really a linked list and the Object elements are accessed through pointers. If the size of aList is relatively small, then we have an elegant solution just like this:
Object::DoSthBig()
{
Object* pNext = GetNext();
if(pNext)
pNext->DoSthBig();
DoSth();
}
This has the underlying hypothesis that each pNext keeps being valid during the process. But if the element-deletion operation has already been dealt with discreetly, then everything is fine.
Of course, this is a very special example and is unable to be applied to other situations.

How to remove base element from the stack and return it in the same order without using push, pop, any method

I am programming a method called popButtom () in C ++ using stacks.
The method must do the following: ¨Eliminate the element of the base and leave the stack in the same order but without the element elminado¨, I can not use pop or push.
For example:
ini stack:
A
B
C
D
end stack:
A
B
C
I have programmed the following, but I do not know that I can have bad:
void popFull()
{
struct node *A, *B;
top1 = top;
while (top1 != NULL)
{
B = top1->ptr;
A = top1;
B->ptr = A;
top1 = B;
}
}
Regards
Mariam

So, I'll see what I can do to answer this, though it would be very helpful if you could include a more complete version of your code, because I'm not entirely sure what type of data structure some of your variables are because there are no declarations included. As well, could you clarify what you mean by "but I do not know that I can have bad:"? I think these changes would make your question easier to answer.
In any case, I'll try to answer your question by interpreting it as "How do I eliminate the element at the base and leave the stack in the same order, not using pop or push." (I assume this is some sort of assignment?)
To that end I'll propose several options. C++11 has another function that isn't push() or pop() which you can use by doing stack.emplace() which just adds an item to the top of the stack. It is functionally the same as stack.push but it might be a nice hack. It's obviously a bit of a technicality, and there actually is a difference (it's very nuanced though, here's a link if you're interested: C++: Stack's push() vs emplace()) but you might be able to get away with it.
Next, I'll say that if you cannot use stack.pop() or stack.push() this next option is a possibility, but only if you initialize stack with a container class of vector, because otherwise the items are not contiguous in memory and there is no guarantee that it will work. I'm referring to, of course, pointer arithmetic. Here: Copy std::stack into an std::vector is another answer that deals with this, but I'll give a brief overview of what they did. If you initialize your stack using a std::vector as in this example in the documention, you can then copy your stack to a vector, and then operate freely on that vector, then copy back to a stack.
Here's what I mean (keep in mind this only works if the container class is vector because it seem like you're just designing a function to take in an argument and not initialize your own).
//this is how it will have to have been initailized
//for this to be guarenteed to work
std::stack<int, std::vector<int>> myStack;
int* begin = &stack.top()+1;
int* end = being+stack.size();
std::vector stackContents(begin,end);
And hurray, smooth sailing from here, now you can remove the item freely using your method of choice on the vector. Then, when you've modified the vector, you can create another stack to return by doing the opposite:
std::stack<int, std::vector<int>> newStack (stackContents);
return newStack;
Obviously this is a major workaround, and in the real world pop() and push() are useful functions and are included for a reason. This might actually be a good time to touch on the idea that stacks are designed to be accessed from either end. That's why it's been categorized as Last In First Out, because the idea of order matters and trying to circumvent that order means that a stack wasn't the proper data structure to use in the first place. Either way, that's my two cents, and I hope this helps.

Some vector elements do not change

I am experiencing very strange behaviour, which I cannot explain. I hope someone might shed some light on it.
Code snippet first:
class TContour {
public:
typedef std::pair<int,int> TEdge; // an edge is defined by indices of vertices
typedef std::vector<TEdge> TEdges;
TEdges m_oEdges;
void splitEdge(int iEdgeIndex, int iMiddleVertexIndex) {
TEdge & oEdge = m_oEdges[iEdgeIndex];
m_oEdges.push_back(TEdge(oEdge.first, iMiddleVertexIndex));
oEdge = TEdge(oEdge.second, iMiddleVertexIndex); // !!! THE PROBLEM
};
void splitAllEdges(void) {
size_t iEdgesCnt = m_oEdges.size();
for (int i=0; i<iEdgesCnt; ++i) {
int iSomeVertexIndex = 10000; // some new value, not actually important
splitEdge(i, iSomeVertexIndex);
}
};
};
When I call splitAllEdges(), the original edges are changed and new edges are added (resulting in doubling the container size). Everything as expected, with an exception of 1 original edge, which does not change. Should that be of any interest, its index is 3 and value is [1,242]. All the other original edges change, but this one remains unchanged. Adding debug prints confirms that the edge is written with a different value, but m_oEdges contents does not change.
I have a simple workaround, replacing the problematic line with m_oEdges[iEdgeIndex] = TEdge(oEdge.end, iMiddleVertexIndex); does fix the issue. Though my concern is what is the cause for the unexpected behaviour. Might that be a compiler bug (hence what other issues do I have to expect?), or do I overlook some stupid bug in my code?
/usr/bin/c++ --version
c++ (Debian 4.9.2-10) 4.9.2
Switching from c++98 to c++11 did not change anything.

You're using an invalid reference after your push_back operation.
This:
TEdge & oEdge = m_oEdges[iEdgeIndex];
acquires the reference. Then this:
m_oEdges.push_back(TEdge(oEdge.start, iMiddleVertexIndex));
potentially resizes the vector, and in so doing, invalidates the oEdge reference. At which point this:
oEdge = TEdge(oEdge.end, iMiddleVertexIndex);
is no longer define behavior, as you're using a dangling reference. Reuse the index, not the reference, such as:
m_oEdges[iEdgeIndex] = TEdge(m_oEdges[iEdgeIndex].end, iMiddleVertexIndex);

Others have mentioned the invalidation of the reference, so I won't go into more details on that.
If performance is critical, you could explicitly reserve enough space in the original vector for the new edges before you start looping. This would avoid the problem, but would still be technically incorrect. i.e. it would work, but still be against the rules.
A safer, but slightly slower method would be to iterate through the vector, changing existing edges and generating new edges in a new vector (with sufficient space reserved beforehand for performance), and then at the end, append the new vector to the existing one.
The safest way (including being completely exception safe), would be to create a new vector (reserving double the size of the initial vector), iterate through the initial vector (without modifying any of its edges), pushing two new edges into the new vector for each old edge, and then right at the end vector.swap() the old vector with the new vector.
A big positive side-effect of this last approach is that your code either succeeds completely, or leaves the original edges unchanged. It maintains the integrity of the data even in the face of disaster.
P.S. I notice that you are doing:
TEdge(oEdge.first, iMiddleVertexIndex)
TEdge(oEdge.second, iMiddleVertexIndex)
If the rest of your code is sensitive to ring-orientation you probably want to reverse the parameters for the second edge. i.e.:
TEdge(oEdge.first, iMiddleVertexIndex)
TEdge(iMiddleVertexIndex, oEdge.second )

concurrent_vector invalid data

using : VC++ 2013
concurrency::concurrent_vector<datanode*> dtnodelst
Occasionally when I do dtnodelst->at(i) .... I am getting an invalid address (0XCDCD.. ofc)
which shouldn't be the case cause after I do push back, I never delete or remove any of the itms ( even if I delete it should have returned the deleted old address... but I am not ever deleting so that is not even the case )
dtnodelst itm = new dtnodelst ();
....
dtnodelst->push_back(itm);
any ideas on what might be happening ?
p.s. I am using windows thread pool. some times .. I can do 8million inserts and find and everything goes fine .... but sometimes even 200 inserts and finds will fail. I am kind of lost. any help would be awesomely appreciated!!
thanks and best regards
actual code as an fyi
p.s. am I missing something or is it pain in the ass to past code with proper formatting ? I remember it being auto align before ... -_-
struct datanode {
volatile int nodeval;
T val;
};
concurrency::concurrent_vector<datanode*> lst
inline T find(UINT32 key)
{
for (int i = 0; i < lst->size(); i++)
{
datanode* nd = lst->at(i);
//nd is invalid sometimes
if (nd)
if (nd->nodeval == key)
{
return (nd->val);
}
}
return NULL;
}
inline T insert_nonunique(UINT32 key, T val){
datanode* itm = new datanode();
itm->val = val;
itm->nodeval = key;
lst->push_back(itm);
_updated(lst);
return val;
}

The problem is using of concurrent_vector::size() which is not fully thread-safe as you can get reference to not yet constructed elements (where memory contains garbage). Microsoft PPL library (which provides it in concurrency:: namespace) uses Intel TBB implementation of concurrent_vector and TBB Reference says:
size_type size() const |
Returns: Number of elements in the vector. The result may include elements that are allocated but still under construction by concurrent calls to any of the growth methods.
Please see my blog for more explanation and possible solutions.
In TBB, the most reasonable solution is to use tbb::zero_allocator as underlying allocator of concurrent_vector in order to fill newly allocated memory with zeroes before size() will count it too.
concurrent_vector<datanode*, tbb::zero_allocator<datanode*> > lst;
Then, the condition if (nd) will filter out not-yet-ready elements.

volatile is no substitute for atomic<T>. Do not use volatile in some attempt to provide synchronization.
The whole idea of your find call doesn't make sense in a concurrent context. As soon as the function iterates over one value, it could be mutated by another thread to be the value you're looking for. Or it could be the value you want, but mutated to be some other value. Or as soon as it returns false, the value you're seeking is added. The return value of such a function would be totally meaningless. size() has all the same problems, which is a good part of why your implementation would never work.
Inspecting the state of concurrent data structures is a very bad idea, because the information becomes invalid the moment you have it. You should design operations that do not require knowing the state of the structure to execute correctly, or, block all mutations whilst you operate.

Is a method with no linearization points always not linearizable?

If you can definitely prove that a method has no linearization points, does it necessarily mean that that method is not linearizable? Also, as a sub question, how can you prove that a method has no linearizatioon points?

To build upon the answers described above, a method can be described as linearizable. As referenced in the book that djoker mentioned: http://www.amazon.com/dp/0123705916/?tag=stackoverfl08-20
on page 69, exercise 32, we see
It should be noted that enq() is indeed a method, that is described as possibily being linearizable/not linearizable.
Proving that there are linearizable points comes down to finding if there are examples that can break linearizability. If you make the assumption that various read/write memory operations in a method are linearizable, and then prove by contradiction that there are non-linearizable situations that result from such an assumption, you can declare that the previously mentioned read/write operation is not a valid linearization point.
Take, for example, the following enq()/deq() methods, assuming they are part of a standard queue implementation with head/tail pointers thaand a backing array "arr":
public terribleQueue(){
arr = new T[10];
tail = 0;
head = 0;
}
void enq(T x){
int slot = tail;
arr[slot] = x;
tail = tail + 1;
}
T deq(){
if( head == tail ) throw new EmptyQueueException();
T temp = arr[head];
head = head + 1;
return temp;
}
In this terrible implementation, we can easily prove, for example, that the first line of enq is not a valid linearization point, by assuming that it is a linearization point, and then finding an example displaying otherwise, as seen here:
Take the example two threads, A and B, and the example history:
A: enq( 1 )
A: slot = 0
B: enq( 2 )
B: slot = 0
(A and B are now past their linearization points, therefore we are not allowed to re-order them to fit our history)
A: arr[0] = 1
B: arr[0] = 2
A: tail = 1
B: tail = 2
C: deq()
C: temp = arr[0] = 2
C: head = 1
C: return 2
Now we see that because of our choice of linearization point (which fixes the order of A and B), this execution will be impossible make linearizable, because we cannot make C's deq return 1, no matter where we put it.
Kind of a long winded answer, but I hope this helps

If you can definitely prove that a method has no linearization points, does it necessarily
mean that that method is not linearizable?
Firstly, linearizability is not property of a method, it is property of execution sequence.
how can you prove that a method has no linearizatioon points?
It depends on the execution sequence whether we are able to find linearization point for the method
or not.
For example, we have the below sequence, for thread A on a FIFO queue. t1, t2, t3 are time
intervals.
A.enq(1) A.enq(2) A.deq(1)
t1 t2 t3
We can choose linearization points(lp) for first two enq methods as any points in time interval t1 and t2 respectively, and for deq any point in t3. The points that we choose are lp for these methods.
Now, consider a faulty implementation
A.enq(1) A.enq(2) A.deq(2)
t1 t2 t3
Linerizability allows lp to respect the real-time ordering. Therefore, lp of the methods should follow the time ordering i.e. t1 < t2 < t3. However, since our implementation is incorrect, we cannot clearly do this. Hence, we cannot find linearization point for the method A.deq(2), in turn our seq. too in not linerizable.
Hope this helps, if you need to know more you can read this book:
http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916

This answer is based on me reading about linearizability on wikipedia for the first time, and trying to map it to my existing understanding of memory consistency through happens-before relationships. So I may be misunderstanding the concept.
If you can definitely prove that a method has no linearization points, does
it necessarily mean that that method is not linearizable?
It is possible to have a scenario where shared, mutable state is concurrently operated on by multiple threads without any synchronization or visibility aids, and still maintain all invariants without risk of corruption.
However, those cases are very rare.
how can you prove that a method has no linearizatioon points?
As I understand linearization points, and I may be wrong here, they are where happens-before relationships are established between threads. If a method (recursively through every method it calls in turn) establishes no such relationships, then I would assert that it has no linearizatioon points.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

atomic swap with CAS (using gcc sync builtins) - c++

Related

Is there any elegant way of iterating through a list whose elements' positions can change?

How to remove base element from the stack and return it in the same order without using push, pop, any method

Some vector elements do not change

concurrent_vector invalid data

Is a method with no linearization points always not linearizable?

Categories

Resources