Why there is no no-synchronous smart pointer in C++? [duplicate]

Why there is no no-synchronous smart pointer in C++? [duplicate] - c++

This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.
2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).
With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.

1. I'm wondering if there is a non-atomic version of std::shared_ptr available
Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).
2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?
This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.
Arguments against included:
Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.
Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:
Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.
The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.
Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.
The final word from the LWG in Rapperswil that day was:
Reject CH 20. No consensus to make a change at this time.

Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.
While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.
If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.
With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.
I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.
This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.
[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.
[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.

My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why).
One could just as easily ask why there isn't an intrusive pointer, or any number of other possible variations of shared pointers one could have.
The design of shared_ptr, handed down from Boost, has been to create a minimum standard lingua-franca of smart pointers. That, generally speaking, you can just pull this down off the wall and use it. It's something that would be used generally, across a wide variety of applications. You can put it in an interface, and odds are good people will be willing to use it.
Threading is only going to get more prevalent in the future. Indeed, as time passes, threading will generally be one of the primary means to achieve performance. Requiring the basic smart pointer to do the bare minimum needed to support threading facilitates this reality.
Dumping a half-dozen smart pointers with minor variations between them into the standard, or even worse a policy-based smart pointer, would have been terrible. Everyone would pick the pointer they like best and forswear all others. Nobody would be able to communicate with anyone else. It'd be like the current situations with C++ strings, where everyone has their own type. Only far worse, because interoperation with strings is a lot easier than interoperation between smart pointer classes.
Boost, and by extension the committee, picked a specific smart pointer to use. It provided a good balance of features and was widely and commonly used in practice.
std::vector has some inefficiencies compared to naked arrays in some corner cases too. It has some limitations; some uses really want to have a hard limit on the size of a vector, without using a throwing allocator. However, the committee didn't design vector to be everything for everyone. It was designed to be a good default for most applications. Those for whom it can't work can just write an alternative that suites their needs.
Just as you can for a smart pointer if shared_ptr's atomicity is a burden. Then again, one might also consider not copying them around so much.

Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.

I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost
mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4
atomic(11) 13.6 12.4 13.0
mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5
atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

Related

A base class to automatically provide getter and setter functions C++

My use case is like.
I have some plain fields, which need a basic thread safe getter and setter.
For some fields a custom getter and setter is required.
What will be the best approach for this problem?

The standard library provides atomic for simple thread-safe getting and setting of values. http://en.cppreference.com/w/cpp/atomic/atomic
Be aware that proper thread safety almost always involves a lot more than wrapping a few members in std::atomic and calling it a day.

To directly answer the question, I don't believe there is a single good answer to this.
Unlike C#, C++ doesn't appear to have automatic get, set generation built into the language, and although there might be a complicated way of doing this "organically" (templates, perhaps?); any solution you might come across might end up being some sort of tool built into an IDE or utility program, such as Eclipse or Code::Blocks.
Additionally, C++ in its C++03 form is not usually inherently thread-safe; support for multi-threading, thread management, async, and memory safety were added on in later versions (C++11 and on), but due to the nature of C++, I don't believe C++ will ever be inherently thread-safe (at least for now).
While std::atomic<type whatever> can be a viable option, you might also want to keep in mind that it is indeed an "add-on" to the language and incurs a time cost, like most other constructs with multi-threading (even atomic operations at the processor level are somewhat slower). You could go the route of mutexes, but again, we're going into multi-threading territory, which may be beyond the scope of your question.

How can std::atomic_compare_exchange_* etc. be used with arbitrary pointers?

InterlockedCompareExchange in Windows, as well as __sync_val_compare_and_swap in gcc take pointers, and so I can pass in any address e.g. pointing into a shared memory block into those functions.
For non-x86 architectures, I might have to ensure memory alignment for correctness, and for x86 (and maybe others), I might want to ensure cache-line alignment for performance, although correctness should not be an issue (-> x86 LOCK prefix).
Trying to get rid of some platform-dependent stuff in my code (Windows VC++ vs. GCC), I took a look at C++11's atomic_compare_exchange_weak and friends. But they all work on a variable of type std::atomic<T>*.
Is there a way to use arbitrary pointers with C++11's atomic functions? It doesn't look like a simple cast to std::atomic is gonna solve this.

Short answer: they can't. This is necessary for portability of the language since C++ does not want to require that every platform to have lock-free support for a specific set of data sizes. Using std::atomic<T> makes it easy for the library to transparently provide lock-free atomicity for some Ts and use a lock for others.
On the bright side, replacing T with atomic<T> in your codebase provides documentation of exactly what objects are used for synchronization, and provides protection against accidental non-atomic access to those objects.
Long answer: reinterpret_cast<std::atomic<decltype(t)>&>(t).store(value) may actually work on some implementations during the right phase of the moon, but it's the purest evil.

Is there a non-atomic equivalent of std::shared_ptr? And why isn't there one in <memory>?

This is a bit of a two part question, all about the atomicity of std::shared_ptr:
1.
As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.
2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).
With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.

1. I'm wondering if there is a non-atomic version of std::shared_ptr available
Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).
2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?
This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.
Arguments against included:
Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.
Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:
Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.
The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.
Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.
The final word from the LWG in Rapperswil that day was:
Reject CH 20. No consensus to make a change at this time.

Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.
While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.
If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.
With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.
I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.
This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.
[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.
[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.

Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.

I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from
http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html
as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:
test run setup boost(1.49) std with make_shared modified boost
mt-unsafe(11) 11.9 9/11.5(-pthread on) 8.4
atomic(11) 13.6 12.4 13.0
mt-unsafe(12) 113.5 85.8/108.9(-pthread on) 81.5
atomic(12) 126.0 109.1 123.6
Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.
From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.
I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

Single threaded shared pointer for simple inclusion in large project

For a piece of multiplatform c++ code I am writing, I need a shared pointer. Currently the project does not use boost, and pulling it in would be extremely difficult or impossible from an administrative view. However, I can use select C++11 features, including shared pointers.
There is a problem with the standard shared pointers, they guarantee thread safety. That means that on some platforms/compilers, like GCC ( http://tinyurl.com/GCCSharedPtrLockPolicy ) atomics and mutexes will be needlessly used, but at least I can check and work around issues incurred by this. Then for other platforms ( http://tinyurl.com/msvscSharedPtr ) there does not even appear to be a way to check, what thread safety mechanisms are used. The original boost pointer provides only the most basic of thread safety guarantees ( http://tinyurl.com/SharedPtrThreadSafety ).
My core issue here is that on some platforms Atomics can cause costly synchronizations between CPU caches and unneeded Mutexes can cause calls to the OS that that may delay for not entirely related reasons. This code will be multi-threaded, but we have other synchronization methods for moving data between threads. A thread-safe shared pointer is simply not needed or wanted.
Normally, I would prefer to benchmark and make my decision, but because of the platforms this will run on, and be ported too, I cannot practically do so. I would need to test on some of the less popular platforms, where less optimized compilers tend to exist, but I do not have that ability currently.
I will try to make a push to get Boost pointers, but that is unlikely, what are my other options for when that fails? In the mean time I will research trying to get just the Shared_ptr out of boost, but I do not think that will be easy.
I could roll my own. This seems like a terrible idea, why would I have to re-invent something this basic.
If there is a library with that is simple and has liberal enough licensing, then I could simply copy their shared_ptr code and simplify rolling my own.
Edit: Pulling in anything from boost other than header only libraries has been struck out. I will be researching Loki as one of the answerers suggested. If that fails and no answers materialize here, I will roll my own :( .

I'd take a look at the one in Loki. Loki is considerably smaller than boost, and the smart pointer implementation in Loki is highly configurable.

boost shared_ptr supports single threading usage
you can #define the macro BOOST_SP_DISABLE_THREADS on a project-wide basis to switch to ordinary non-atomic reference count updates
citation from boost shared_ptr

gcc has a class __shared_ptr which takes a lock-policy. shared_ptr derives off this.
One of the policies is _S_single, which is for single threaded code (ie: non locked/non atomic reference count)
In c++11 you can use template aliases which will allow you to use the non-standard __shared_ptr class which shared_ptr derives off
template<typename T>
using st_ptr = __shared_ptr<T, __gnu_cxx::_S_single>;
If you don't have a conformant compiler yet you can roll your own by just inheriting off __shared_ptr and exposing the interface (in fact this is how gcc currently does it because of the fact that template aliases were not available prior to 4.7)
Look in bits/shared_ptr.h to see how shared_ptr derives off __shared_ptr - it will be trivial to roll your own.

It is quite possible to write/adapt your own smart pointer class for use in legacy projects that the newer libraries do not support. I have written my own for just such a reason and it works well with MSVC++ 4.2 and onwards.
See ootips.org/yonat/4dev/smart-pointers.html which is what I based mine on. Definitely a possibility if you want a small solution. Just the header and .cpp file required.
The key point to watch out for is the lack of the explicit keyword in older compilers. Another is that you may want to allow implicit conversion to the raw pointer to allow your APIs to remain less affected (we did this) and in that case you should also take care to prevent conversion to other types of pointers as well.

C++0x lambda vs blocks

I was exploring C++0x today, and I encountered the new lambda feature. My question is how are these different (in terms of use) from blocks and why might one prefer one over the other?
Thanks.

there is a a short syntax with C++0x lambdas to take every variable in
scope by reference. ([&]) The type of a lambda is also unspecified,
allowing potentially more optimal code.
Now, when you look at Apple blocks, it will require __block specifiers
added to variables you want to modify (the very fact that this is
required suggests the whole system is defective). Variables are taken
by reference but then by value when the block exits the scope (and the
copied context necessarily lives on the heap, it seems). A weird
semantic that will only lead to broken designs, but probably makes
people that love GC happy. Without saying this probably has quite the
efficiency cost, of course, since this requires special indirections.
It is claimed the C++0x lambdas syntax would break compatibility with
C programs, but I don't think that is true. There are probably other
problems to integrate it with C, though, mainly the fact that C can't
really deal with unspecified types and build type erasure.
Apple blocks is really just an ObjC feature they try to generalize to
other languages. For C++, the system designed for that language is
just so much better.
EDIT:
To properly give credit, I took this information from http://www.rhinocerus.net/forum/language-c-moderated/558214-blocks-vs-c-lambdas.html a long time ago. That link is dead now; however, the original discussion appears to be archived here, thanks to #stefan for finding it.

I think it basically comes down to a question of your starting point. If you're starting from Objective-C, and writing C++ (Objective-C++) primarily (or exclusively) as an adjunct to Objective-C, then using blocks throughout all the code may make sense, simply to retain as much commonality as possible across the code base. Even if (for example) a project used some pieces written in Objective-C and others in C++, it could make sense to use blocks in both retain as much similarity throughout the code base as possible.
Unless you're using them outside of C++, however, I see little reason to prefer blocks over C++ lambdas. In what I'd guess to be the most common use (a predicate or action in an algorithm) the only noticeable difference between the two would be that one starts with ^ and the other with [].
Older versions of Objective C++
Before the ARC, there were internal differences in the implementation of blocks and lambdas that were likely to affect some more advanced uses. For example, blocks worked vaguely like C strings, so you used Block_copy to copy one, Block_release to free the copy, and so on. On the other hand, in C++ this is all automated so the copy ctor automatically uses Block_copy and the dtor Block_release as needed. At the same time, it did involve a bit more "magic", so (for example) when you copy a block, the copy is always allocated dynamically, regardless of how the source was allocated.
If, for one reason or another, you're stuck with using an older (I'm tempted to say "ancient") compiler or maintaining older code (and don't want to update the codebase as a whole) the memory management difference may be worth taking into account.

Mike Ash provides a detailed comparison. Blocks and lambdas differ in their syntax, their data type, the way they capture variables, the way they behave when copied, and their performance.
How they relate to C/C++/Objective-C:
I will refer to Apple's blocks extension as "Objective-C blocks" even
though this is not entirely correct. They are actually an addition to
C (and can even be used in C++), with some extra behaviors to make
them more useful in Objective-C. However, they are deeply intertwined
with Objective-C in their implementation, and "C blocks" is vague, so
I think that "Objective-C blocks" is the best way to refer to them
here.
C++0x lambdas are part of C++ only and can't be used from C.
Presumably they can be used in Objective-C++ if the compiler supports
C++0x.
A very high-level summary of the differences:
Objective-C blocks are somewhat simpler to write and to use,
especially in the case of using them for asynchronous or background
tasks where the block has to be copied and kept alive beyond the
lifetime of the scope where it was created. C++0x lambdas ultimately
provide more flexibility and potential speed, but at the cost of
considerable added complexity.

As of recent clang versions (3.2, 3.3rc and 3.4svn) they are interchangable in Objective-C(++) code. In C++ you have to use lambda, but in Objective-C(++) if you have
C++ support in your libobjc.
Apple's libobjc.B.dylib have it for sure. If you are using GNUstep, you need to either compile libobjc2 (and only libobjc2) with cmake and linking against libsupc++ (or whatever C++ ABI library you use) or link your project against libobjcxx as well
Blocks runtime should exist.
It is part of libSystem.dylib on OS X which libc is linked against so it is not much an issue there. You can use LLVM compiler-rt for this or use libobjc2. I personally recommend you use libobjc2 as it provided a Blocks runtime that is compatible with the rest of GNUstep, which is also called for.
Foundation kit.
This is due to how clang handle the ABI of interchanging C++ lambda and Objective-C blocks. clang do so with NSAutoreleasePool which is part of Foundation.
then you can safely interchange parts.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js