How can std::atomic_compare_exchange_* etc. be used with arbitrary pointers? - c++

InterlockedCompareExchange in Windows, as well as __sync_val_compare_and_swap in gcc take pointers, and so I can pass in any address e.g. pointing into a shared memory block into those functions.
For non-x86 architectures, I might have to ensure memory alignment for correctness, and for x86 (and maybe others), I might want to ensure cache-line alignment for performance, although correctness should not be an issue (-> x86 LOCK prefix).
Trying to get rid of some platform-dependent stuff in my code (Windows VC++ vs. GCC), I took a look at C++11's atomic_compare_exchange_weak and friends. But they all work on a variable of type std::atomic<T>*.
Is there a way to use arbitrary pointers with C++11's atomic functions? It doesn't look like a simple cast to std::atomic is gonna solve this.

Short answer: they can't. This is necessary for portability of the language since C++ does not want to require that every platform to have lock-free support for a specific set of data sizes. Using std::atomic<T> makes it easy for the library to transparently provide lock-free atomicity for some Ts and use a lock for others.
On the bright side, replacing T with atomic<T> in your codebase provides documentation of exactly what objects are used for synchronization, and provides protection against accidental non-atomic access to those objects.
Long answer: reinterpret_cast<std::atomic<decltype(t)>&>(t).store(value) may actually work on some implementations during the right phase of the moon, but it's the purest evil.

Related

Why there is no no-synchronous smart pointer in C++? [duplicate]

This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.
2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).
With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.
1. I'm wondering if there is a non-atomic version of std::shared_ptr available
Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).
2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?
This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.
Arguments against included:
Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.
Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:
Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.
The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.
Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.
The final word from the LWG in Rapperswil that day was:
Reject CH 20. No consensus to make a change at this time.
Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.
While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.
If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.
With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.
I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.
This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.
[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.
[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.
My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why).
One could just as easily ask why there isn't an intrusive pointer, or any number of other possible variations of shared pointers one could have.
The design of shared_ptr, handed down from Boost, has been to create a minimum standard lingua-franca of smart pointers. That, generally speaking, you can just pull this down off the wall and use it. It's something that would be used generally, across a wide variety of applications. You can put it in an interface, and odds are good people will be willing to use it.
Threading is only going to get more prevalent in the future. Indeed, as time passes, threading will generally be one of the primary means to achieve performance. Requiring the basic smart pointer to do the bare minimum needed to support threading facilitates this reality.
Dumping a half-dozen smart pointers with minor variations between them into the standard, or even worse a policy-based smart pointer, would have been terrible. Everyone would pick the pointer they like best and forswear all others. Nobody would be able to communicate with anyone else. It'd be like the current situations with C++ strings, where everyone has their own type. Only far worse, because interoperation with strings is a lot easier than interoperation between smart pointer classes.
Boost, and by extension the committee, picked a specific smart pointer to use. It provided a good balance of features and was widely and commonly used in practice.
std::vector has some inefficiencies compared to naked arrays in some corner cases too. It has some limitations; some uses really want to have a hard limit on the size of a vector, without using a throwing allocator. However, the committee didn't design vector to be everything for everyone. It was designed to be a good default for most applications. Those for whom it can't work can just write an alternative that suites their needs.
Just as you can for a smart pointer if shared_ptr's atomicity is a burden. Then again, one might also consider not copying them around so much.
Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.
I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost
mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4
atomic(11) 13.6 12.4 13.0
mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5
atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

A base class to automatically provide getter and setter functions C++

My use case is like.
I have some plain fields, which need a basic thread safe getter and setter.
For some fields a custom getter and setter is required.
What will be the best approach for this problem?
The standard library provides atomic for simple thread-safe getting and setting of values. http://en.cppreference.com/w/cpp/atomic/atomic
Be aware that proper thread safety almost always involves a lot more than wrapping a few members in std::atomic and calling it a day.
To directly answer the question, I don't believe there is a single good answer to this.
Unlike C#, C++ doesn't appear to have automatic get, set generation built into the language, and although there might be a complicated way of doing this "organically" (templates, perhaps?); any solution you might come across might end up being some sort of tool built into an IDE or utility program, such as Eclipse or Code::Blocks.
Additionally, C++ in its C++03 form is not usually inherently thread-safe; support for multi-threading, thread management, async, and memory safety were added on in later versions (C++11 and on), but due to the nature of C++, I don't believe C++ will ever be inherently thread-safe (at least for now).
While std::atomic<type whatever> can be a viable option, you might also want to keep in mind that it is indeed an "add-on" to the language and incurs a time cost, like most other constructs with multi-threading (even atomic operations at the processor level are somewhat slower). You could go the route of mutexes, but again, we're going into multi-threading territory, which may be beyond the scope of your question.

Is there a non-atomic equivalent of std::shared_ptr? And why isn't there one in <memory>?

This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.
2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).
With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.
1. I'm wondering if there is a non-atomic version of std::shared_ptr available
Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).
2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?
This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.
Arguments against included:
Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.
Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:
Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.
The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.
Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.
The final word from the LWG in Rapperswil that day was:
Reject CH 20. No consensus to make a change at this time.
Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.
While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.
If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.
With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.
I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.
This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.
[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.
[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.
My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why).
One could just as easily ask why there isn't an intrusive pointer, or any number of other possible variations of shared pointers one could have.
The design of shared_ptr, handed down from Boost, has been to create a minimum standard lingua-franca of smart pointers. That, generally speaking, you can just pull this down off the wall and use it. It's something that would be used generally, across a wide variety of applications. You can put it in an interface, and odds are good people will be willing to use it.
Threading is only going to get more prevalent in the future. Indeed, as time passes, threading will generally be one of the primary means to achieve performance. Requiring the basic smart pointer to do the bare minimum needed to support threading facilitates this reality.
Dumping a half-dozen smart pointers with minor variations between them into the standard, or even worse a policy-based smart pointer, would have been terrible. Everyone would pick the pointer they like best and forswear all others. Nobody would be able to communicate with anyone else. It'd be like the current situations with C++ strings, where everyone has their own type. Only far worse, because interoperation with strings is a lot easier than interoperation between smart pointer classes.
Boost, and by extension the committee, picked a specific smart pointer to use. It provided a good balance of features and was widely and commonly used in practice.
std::vector has some inefficiencies compared to naked arrays in some corner cases too. It has some limitations; some uses really want to have a hard limit on the size of a vector, without using a throwing allocator. However, the committee didn't design vector to be everything for everyone. It was designed to be a good default for most applications. Those for whom it can't work can just write an alternative that suites their needs.
Just as you can for a smart pointer if shared_ptr's atomicity is a burden. Then again, one might also consider not copying them around so much.
Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.
I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost
mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4
atomic(11) 13.6 12.4 13.0
mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5
atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

Single threaded shared pointer for simple inclusion in large project

For a piece of multiplatform c++ code I am writing, I need a shared pointer. Currently the project does not use boost, and pulling it in would be extremely difficult or impossible from an administrative view. However, I can use select C++11 features, including shared pointers.
There is a problem with the standard shared pointers, they guarantee thread safety. That means that on some platforms/compilers, like GCC ( http://tinyurl.com/GCCSharedPtrLockPolicy ) atomics and mutexes will be needlessly used, but at least I can check and work around issues incurred by this. Then for other platforms ( http://tinyurl.com/msvscSharedPtr ) there does not even appear to be a way to check, what thread safety mechanisms are used. The original boost pointer provides only the most basic of thread safety guarantees ( http://tinyurl.com/SharedPtrThreadSafety ).
My core issue here is that on some platforms Atomics can cause costly synchronizations between CPU caches and unneeded Mutexes can cause calls to the OS that that may delay for not entirely related reasons. This code will be multi-threaded, but we have other synchronization methods for moving data between threads. A thread-safe shared pointer is simply not needed or wanted.
Normally, I would prefer to benchmark and make my decision, but because of the platforms this will run on, and be ported too, I cannot practically do so. I would need to test on some of the less popular platforms, where less optimized compilers tend to exist, but I do not have that ability currently.
I will try to make a push to get Boost pointers, but that is unlikely, what are my other options for when that fails? In the mean time I will research trying to get just the Shared_ptr out of boost, but I do not think that will be easy.
I could roll my own. This seems like a terrible idea, why would I have to re-invent something this basic.
If there is a library with that is simple and has liberal enough licensing, then I could simply copy their shared_ptr code and simplify rolling my own.
Edit: Pulling in anything from boost other than header only libraries has been struck out. I will be researching Loki as one of the answerers suggested. If that fails and no answers materialize here, I will roll my own :( .
I'd take a look at the one in Loki. Loki is considerably smaller than boost, and the smart pointer implementation in Loki is highly configurable.
boost shared_ptr supports single threading usage
you can #define the macro BOOST_SP_DISABLE_THREADS on a project-wide basis to switch to ordinary non-atomic reference count updates
citation from boost shared_ptr
gcc has a class __shared_ptr which takes a lock-policy. shared_ptr derives off this.
One of the policies is _S_single, which is for single threaded code (ie: non locked/non atomic reference count)
In c++11 you can use template aliases which will allow you to use the non-standard __shared_ptr class which shared_ptr derives off
template<typename T>
using st_ptr = __shared_ptr<T, __gnu_cxx::_S_single>;
If you don't have a conformant compiler yet you can roll your own by just inheriting off __shared_ptr and exposing the interface (in fact this is how gcc currently does it because of the fact that template aliases were not available prior to 4.7)
Look in bits/shared_ptr.h to see how shared_ptr derives off __shared_ptr - it will be trivial to roll your own.
It is quite possible to write/adapt your own smart pointer class for use in legacy projects that the newer libraries do not support. I have written my own for just such a reason and it works well with MSVC++ 4.2 and onwards.
See ootips.org/yonat/4dev/smart-pointers.html which is what I based mine on. Definitely a possibility if you want a small solution. Just the header and .cpp file required.
The key point to watch out for is the lack of the explicit keyword in older compilers. Another is that you may want to allow implicit conversion to the raw pointer to allow your APIs to remain less affected (we did this) and in that case you should also take care to prevent conversion to other types of pointers as well.

Interlocked ops vs XXX::atomic on Win32

What are the advantages and disadvantages of using Interlocked winapi functions instead of any library provides atomic operations on Win32 platform?
Portability is not an issue.
If portability is not a concern then you're basically down to deciding whom you trust more to get this right. A library is generally designed to provide portability. It otherwise has a tough time competing with an OS provided implementation that's been battle-hardened for over 15 years.
Check this thread to see an example of how the obvious implementation is not in fact the best.
The Interlocked winapi functions work on old processors even when there is no CPU support for locked operations. 386 and maybe 486, not really a issue today unless you still support Win9x and older NT.
It would likely depend up on the specific atomic library in question.
A good library with a specific back-end would likely end up with the same implementation of a couple of ASM instructions to issue an x86 lock instruction and do their work. And assuming the library itself is portable, subsequently make your code portable.
A naive atomic implementation might do something heavier like use a mutex to protect a normal variable. I don't know of any that do - just making the point for argument.
As such, given your stated non-portability requirements, using the Win32 functions should be fine. Alternately, go ahead with an Atomic version, but perhaps look at the actual implementation.