I'm wondering if I need to use std::atomic in the following case:
a (pointer to a) member variable is initialized in an object's constructor
at some point in the future, there is exactly one write by some thread
several other threads are reading it concurrently (reads happen both before and after the write)
if I'm only looking for the following type of consistency:
a thread sees either the initial value of the member variable or the value after the write
each thread eventually sees the value after write (provided it runs long enough)
If yes, which memory order should I use in load/store (out of memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst) to get as little overhead as possible?
As an example, suppose I want to implement a "static" singly-linked list which can only insert at tail and never delete or change any of the next pointers, i.e.:
Entry {
...
const Entry* next; // or std::atomic<const Entry*> next;
Entry() : next(NULL) { ... }
...
};
void Insert(Entry* tail, const Entry* e) {
tail->next = e; // assuming tail != NULL (i.e. we always have a dummy node)
}
Memory order only dictates which writes or reads to other variables than the atomic one are being seen by other threads. If you don't care about the other writes or reads in your thread in relation to your member variable, you can even use std::memory_order_relaxed.
To question how fast other threads see writes on your atomic variable, the standard says the following: (§ 29.3.13)
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
To decide what memory ordering you need, you need to provide more information about what you will use the member variable for. If you are using a atomic pointer field to reference a newly created object and that the object has fields you want to the readers to access, then you need to make sure synchronization is established. That means that store needs to be a release, and the loads probably should be acquires. Though depending on the details consume might work. At this point, there isn't really any advantage of using consume performance wise so I'd probably stick to acquire.
Try playing around with examples using CDSChecker to get an idea of what you need to do.
Related
I read the following article by Antony Williams and as I understood in addition to the atomic shared count in std::shared_ptr in std::experimental::atomic_shared_ptr the actual pointer to the shared object is also atomic?
But when I read about reference counted version of lock_free_stack described in Antony's book about C++ Concurrency it seems for me that the same aplies also for std::shared_ptr, because functions like std::atomic_load, std::atomic_compare_exchnage_weak are applied to the instances of std::shared_ptr.
template <class T>
class lock_free_stack
{
public:
void push(const T& data)
{
const std::shared_ptr<node> new_node = std::make_shared<node>(data);
new_node->next = std::atomic_load(&head_);
while (!std::atomic_compare_exchange_weak(&head_, &new_node->next, new_node));
}
std::shared_ptr<T> pop()
{
std::shared_ptr<node> old_head = std::atomic_load(&head_);
while(old_head &&
!std::atomic_compare_exchange_weak(&head_, &old_head, old_head->next));
return old_head ? old_head->data : std::shared_ptr<T>();
}
private:
struct node
{
std::shared_ptr<T> data;
std::shared_ptr<node> next;
node(const T& data_) : data(std::make_shared<T>(data_)) {}
};
private:
std::shared_ptr<node> head_;
};
What is the exact difference between this two types of smart pointers, and if pointer in std::shared_ptr instance is not atomic, why it is possible the above lock free stack implementation?
The atomic "thing" in shared_ptr is not the shared pointer itself, but the control block it points to. meaning that as long as you don't mutate the shared_ptr across multiple threads, you are ok. do note that copying a shared_ptr only mutates the control block, and not the shared_ptr itself.
std::shared_ptr<int> ptr = std::make_shared<int>(4);
for (auto i =0;i<10;i++){
std::thread([ptr]{ auto copy = ptr; }).detach(); //ok, only mutates the control block
}
Mutating the shared pointer itself, such as assigning it different values from multiple threads, is a data race, for example:
std::shared_ptr<int> ptr = std::make_shared<int>(4);
std::thread threadA([&ptr]{
ptr = std::make_shared<int>(10);
});
std::thread threadB([&ptr]{
ptr = std::make_shared<int>(20);
});
Here, we are mutating the control block (which is ok) but also the shared pointer itself, by making it point to a different values from multiple threads. This is not ok.
A solution to that problem is to wrap the shared_ptr with a lock, but this solution is not so scalable under some contention, and in a sense, loses the automatic feeling of the standard shared pointer.
Another solution is to use the standard functions you quoted, such as std::atomic_compare_exchange_weak. This makes the work of synchronizing shared pointers a manual one, which we don't like.
This is where atomic shared pointer comes to play. You can mutate the shared pointer from multiple threads without fearing a data race and without using any locks. The standalone functions will be members ones, and their use will be much more natural for the user. This kind of pointer is extremely useful for lock-free data structures.
N4162(pdf), the proposal for atomic smart pointers, has a good explanation. Here's a quote of the relevant part:
Consistency. As far as I know, the [util.smartptr.shared.atomic]
functions are the only atomic operations in the standard that
are not available via an atomic type. And for all types
besides shared_ptr, we teach programmers to use atomic types
in C++, not atomic_* C-style functions. And that’s in part because of...
Correctness. Using the free functions makes code error-prone
and racy by default. It is far superior to write atomic once on
the variable declaration itself and know all accesses
will be atomic, instead of having to remember to use the atomic_*
operation on every use of the object, even apparently-plain reads.
The latter style is error-prone; for example, “doing it wrong” means
simply writing whitespace (e.g., head instead of atomic_load(&head) ),
so that in this style every use of the variable is “wrong by default.” If you forget to
write the atomic_* call in even one place, your code will still
successfully compile without any errors or warnings, it will “appear
to work” including likely pass most testing, but will still contain a
silent race with undefined behavior that usually surfaces as intermittent
hard-to-reproduce failures, often/usually in the field,
and I expect also in some cases exploitable vulnerabilities.
These classes of errors are eliminated by simply declaring the variable atomic,
because then it’s safe by default and to write the same set of
bugs requires explicit non-whitespace code (sometimes explicit
memory_order_* arguments, and usually reinterpret_casting).
Performance. atomic_shared_ptr<> as a distinct type
has an important efficiency advantage over the
functions in [util.smartptr.shared.atomic] — it can simply store an
additional atomic_flag (or similar) for the internal spinlock
as usual for atomic<bigstruct>. In contrast, the existing standalone functions
are required to be usable on any arbitrary shared_ptr
object, even though the vast majority of shared_ptrs will
never be used atomically. This makes the free functions inherently
less efficient; for example, the implementation could require
every shared_ptr to carry the overhead of an internal spinlock
variable (better concurrency, but significant overhead per
shared_ptr), or else the library must maintain a lookaside data
structure to store the extra information for shared_ptrs that are
actually used atomically, or (worst and apparently common in
practice) the library must use a global spinlock.
Calling std::atomic_load() or std::atomic_compare_exchange_weak() on a shared_ptr is functionally equivalent to calling atomic_shared_ptr::load() or atomic_shared_ptr::atomic_compare_exchange_weak(). There shouldn't be any performance difference between the two. Calling std::atomic_load() or std::atomic_compare_exchange_weak() on a atomic_shared_ptr would be syntactically redundant and might or might not incur a performance penalty.
atomic_shared_ptr is an API refinement. shared_ptr already supports atomic operations, but only when using the appropriate atomic non-member functions. This is error-prone, because the non-atomic operations remain available and are too easy for an unwary programmer to invoke by accident. atomic_shared_ptr is less error-prone because it doesn't expose any non-atomic operations.
shared_ptr and atomic_shared_ptr expose different APIs, but they don't necessarily need to be implemented differently; shared_ptr already supports all the operations exposed by atomic_shared_ptr. Having said that, the atomic operations of shared_ptr are not as efficient as they could be, because it must also support non-atomic operations. Therefore there are performance reasons why atomic_shared_ptr could be implemented differently. This is related to the single responsibility principle. "An entity with several disparate purposes... often offers crippled interfaces for any of its specific purposes because the partial overlap among various areas of functionality blurs the vision needed for crisply implementing each." (Sutter & Alexandrescu 2005, C++ Coding Standards)
Consider a program with three threads A,B,C.
They have a shared global object G.
I want to use an atomic variable(i) inside G which is written by Thread B and Read by A.
My approach was:
declare i in G as:
std::atomic<int> i;
write it from thread B using a pointer to G as:
G* pG; //this is available inside A and B
pG->i = 23;
And read it from thread A using the same way.
int k = pG->i;
Is my approach correct if these threads try to access this variable simultaneously.?
Like JV says, it depends what your definition of "correct" is. See http://preshing.com/20120612/an-introduction-to-lock-free-programming/. If it doesn't need to synchronize with anything, you should use std::memory_order_relaxed stores instead of the default sequential consistency stores, so it compiles to more efficient asm (no memory barrier instructions).
But yes, accessing an atomic struct member through a pointer is fine, as long as the pointer itself is initialized before the threads start.
If the struct is a global, then don't use a pointer to it, just access the global directly. Having a separate variable that always points to the same global is an extra level of indirection for no benefit.
If you want to change the pointer, it also needs to be std::atomic<struct foo *> pG, and changing it gets complicated as far as deciding when it's safe to free the old data after changing it.
I've been working on a parser for commands (which are fancy wrappers around large arrays of data), and have a queue that unhandled commands reside on. If I need a command, I query it with code like this:
boost::optional<command> get_command() {
if (!has_command()) return boost::optional<command>(nullptr);
else {
boost::optional<command> comm(command_feed.front()); //command_feed is declared as a std::queue<command>
command_feed.pop();
return comm;
}
}
The problem is, these commands could be megabytes in size, under the right circumstances, and need to parse pretty quickly. My thought was that I could optimize the transferal to a move like so:
boost::optional<command> get_command() {
if (!has_command()) return boost::optional<command>(nullptr);
else {
boost::optional<command> comm(std::move(command_feed.front())); //command_feed is declared as a std::queue<command>
command_feed.pop();
return comm;
}
}
And it seems to work for this specific case, but can this be used as a general purpose solution to any properly maintained RAII object, or should I be doing something else?
Yes, this is perfectly safe:
std::queue<T> q;
// add stuff...
T top = std::move(q.front());
q.pop();
pop() doesn't have any preconditions on the first element in the q having a specified state, and since you're not subsequently using q.front() you don't have to deal with that object being invalidated any more.
Sounds like a good idea to do!
It depends on what the move constructor for your type does. If it leaves the original object in a state that can safely be destroyed, then all is well. If not, then you may be in trouble. Note that the comments about preconditions and valid states are about constraints on types defined in the standard library. Types that you define do not have those constraints, except to the extent that they use types from the standard library. So look at your move constructor to sort out what you can and can't do with a moved-from object.
Yes. As long as your std::queue's container template argument ensures that there are no preconditions on the state of its contained values for pop_front(); the default for std::queue is std::deque and that offers the guarantee.
As long as you ensure what I wrote on the previous paragraph, you are completely safe. You're about to remove that item from your queue, thus there is no reason not to move it out since you are taking ownership of that object.
moving an object may leave it in an invalid state. It's invariants are no longer guaranteed. You would be safe popping it from a non-intrusive queue.
The std::move itself does nothing other than tell the compiler, that it can select a comm routine that takes an r-value.
A well written comm routine, would then steal the representation from the old object for the new object. For instance, just copy the pointers to the new object, and zero the pointers in the old object (that way the old object destructor won't destroy the arrays).
if comm is not overloaded to do this there will not be any benefit to std::mov.
I have many overloaded functions in a class. In this case, should I declare the int32_t data as a member variable of the class, so I am not declaring it over and over in each function? The Fill function is always setting a value through reference, so I don't think I should need to declare it every time in every function.
There is about 20 more of these functions not listed here:
void TransmitInfo(TypeA &dp, Id &tc)
{
//do lots of other work here
int32_t data;
while (dp.Fill(data)) //Fill accepts a reference variable, "data" gets populated
{
Transmit(tc, data);
}
}
void TransmitInfo(TypeB &dp, Id &tc)
{
//do lots of other work here
int32_t data;
while (dp.Fill(data))
{
Transmit(tc, data);
}
}
void TransmitInfo(TypeC &dp, Id &tc)
{
//do lots of other work here
int32_t data;
while (dp.Fill(data))
{
Transmit(tc, data);
}
}
Scope is not the only thing to consider when choosing where to declare a variable. Just as important are the lifetime of the variable and when it is created.
When you declare a variable inside a function, it is created whenever that function is called, several times if need be (recursion!). And it's destroyed when that function exits. These creations/destructions are noops for the CPU in the case of simple types as int32_t.
When you declare it inside the class, you get only one copy of the variable per object you create. If one of your function calls another (or itself), they will both use the same variable. You also increase the size of your objects; your variable will consume memory even when it's not used.
So, the bottom line is: Use the different kinds of variables for the purposes they were designed for.
If a function needs to remember something while it runs, it's a function variable.
If a function needs to remember something across its invocations, it's a static function variable.
If an object needs to remember something across its member invocations, it's a member variable.
If a class needs to remember something across all objects, it's a static class variable.
Anything else leads to chaos.
You should refrain from using a member variable for any type of temporary data. The reason for this is that it guarantees that your code is not thread safe, and in this day-and-age of parallel computing, that is a major disadvantage. The cost of allocating an int32_t is extremely small as to be negligible so thus it is often better to allocate inside the function to maintain thread safety. Before a single int allocation becomes noticeable you will have to allocate it well over a million times, and even then the total loss will be in microseconds.
If your experiencing such difficulty with optimization that you have to resort to such a high degree of micro-optimization then you likely should try and rework your algorithm to create a better scaling as opposed to spending massive amounts of time optimizing something that is not a choke point. (You would also be better off using a good concurrent algorithm, as opposed to shaving picoseconds off of a serial algorithm.)
Absolutely do not do this. If it's only a temporary for the life of a function then keep it local.
Else you'll cause more problems than you solve; e.g. Multithreading and serialisation.
Leave such micro-optimisations to the compiler.
The new keyword hands you back a pointer to the object created, which means you keep having to deference it - I'm just afraid performance may suffer.
E.g. a common situation I'm facing:
class cls {
obj *x; ...
}
// Later, in some member function:
x = new obj(...);
for (i ...) x->bar[i] = x->foo(i + x->baz); // much dereferencing
I'm not overly keen on reference variables either as I have many *x's (e.g. *x, *y, *z, ...) and having to write &x_ref = *x, &y_ref = *y, ... at the start of every function quickly becomes tiresome and verbose.
Indeed, is it better to do:
class cls {
obj x; ... // not pointer
}
x_ptr = new obj(...);
x = *x_ptr; // then work with x, not pointer;
So what's the standard way to work with variables created by new?
There's no other way to work with objects created by new. The location of the unnamed object created by new is always a run-time value. This immediately means that each and every access to such an object will always unconditionally require dereferencing. There's no way around it. That is what "dereferencing" actually is, by definition - accessing through a run-time address.
Your attempts to "replace" pointers with references by doing &x_ref = *x at the beginning of the function are meaningless. They achieve absolutely nothing. References in this context are just syntactic sugar. They might reduce the number of * operators in your source code (and might increase the number of & operators), but they will not affect the number of physical dereferences in the machine code. They will lead to absolutely the same machine code containing absolutely the same amount of physical dereferencing and absolutely the same performance.
Note that in contexts where dereferencing occurs repeatedly many times, a smart compiler might (and will) actually read and store the target address in a CPU register, instead of re-reading it each time from memory. Accessing data through an address stored in a CPU register is always the fastest, i.e. it is even faster than accessing data through compile-time address embedded into the CPU instruction. For this reason, repetitive dereferencing of manageable complexity might not have any negative impact on performance. This, of course, depends significantly on the quality of the compiler.
In situations when you observe significant negative impact on performance from repetitive dereferencing, you might try to cache the target value in a local buffer, use the local buffer for all calculations and then, when the result is ready, store it through the original pointer. For example, if you have a function that repeatedly accesses (reads and/or writes) data through a pointer int *px, you might want to cache the data in an ordinary local variable x
int x = *px;
work with x throughout the entire function and at the end do
*px = x;
Needless to say, this only makes sense when the performance impact from copying the object is low. And of course, you have to be careful with such techniques in aliased situations, since in this case the value of *px is not maintained continuously. (Note again, that in this case we use an ordinary variable x, not a reference. Your attempts to replace single-level pointers with references achieve nothing at all.)
Again, this sort of "data cashing" optimization can also be implicitly performed by the compiler, assuming the compiler has good understanding of the data aliasing relationships present in the code. And this is where C99-style restrict keyword can help it a lot. But that's a different topic.
In any case, there's no "standard" way to do that. The best approach depends critically on your knowledge of data flow relationships that exist in each specific piece of your code.
Instantiate the object without the new keyword, like this:
obj x;
Or if your constructor for obj takes parameters:
obj x(...);
This will give you an object instead of a pointer thereto.
You have to decide whether you want to allocate your things on heap or on stack. Thats completely your decision based on your requirements. and there is no performance degradation with dereferencing. You may allocate your cls in heap that will stay out of scope and keep instances of obj in stack
class cls {
obj x;//default constructor of obj will be called
}
and if obj doesn't have a default constructor you need to call appropiate constructor in cls constructor